Embeddings
Embeddings are vector representations of text that capture semantic meaning. They're essential for building:
- Semantic search — Find documents based on meaning, not just keyword matching
- RAG (Retrieval-Augmented Generation) — Retrieve relevant context for your AI agents
- Similarity detection — Find similar documents, detect duplicates, or cluster content
- Classification — Use embeddings as features for downstream ML models
Pydantic AI provides a unified interface for generating embeddings across multiple providers.
Quick Start
The Embedder class is the high-level interface for generating embeddings:
from pydantic_ai import Embedder
embedder = Embedder('openai:text-embedding-3-small')
async def main():
# Embed a search query
result = await embedder.embed_query('What is machine learning?')
print(f'Embedding dimensions: {len(result.embeddings[0])}')
#> Embedding dimensions: 1536
# Embed multiple documents at once
docs = [
'Machine learning is a subset of AI.',
'Deep learning uses neural networks.',
'Python is a programming language.',
]
result = await embedder.embed_documents(docs)
print(f'Embedded {len(result.embeddings)} documents')
#> Embedded 3 documents
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
Queries vs Documents
Some embedding models optimize differently for queries and documents. Use
embed_query() for search queries and
embed_documents() for content you're indexing.
Embedding Result
All embed methods return an EmbeddingResult containing the embeddings along with useful metadata.
For convenience, you can access embeddings either by index (result[0]) or by the original input text (result['Hello world']).
from pydantic_ai import Embedder
embedder = Embedder('openai:text-embedding-3-small')
async def main():
result = await embedder.embed_query('Hello world')
# Access embeddings - each is a sequence of floats
embedding = result.embeddings[0] # By index via .embeddings
embedding = result[0] # Or directly via __getitem__
embedding = result['Hello world'] # Or by original input text
print(f'Dimensions: {len(embedding)}')
#> Dimensions: 1536
# Check usage
print(f'Tokens used: {result.usage.input_tokens}')
#> Tokens used: 2
# Calculate cost (requires `genai-prices` to have pricing data for the model)
cost = result.cost()
print(f'Cost: ${cost.total_price:.6f}')
#> Cost: $0.000000
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
Providers
OpenAI
OpenAIEmbeddingModel works with OpenAI's embeddings API and any OpenAI-compatible provider.
Install
To use OpenAI embedding models, you need to either install pydantic-ai, or install pydantic-ai-slim with the openai optional group:
pip install "pydantic-ai-slim[openai]"
uv add "pydantic-ai-slim[openai]"
Configuration
To use OpenAIEmbeddingModel with the OpenAI API, go to platform.openai.com and follow your nose until you find the place to generate an API key. Once you have the API key, you can set it as an environment variable:
export OPENAI_API_KEY='your-api-key'
You can then use the model:
from pydantic_ai import Embedder
embedder = Embedder('openai:text-embedding-3-small')
async def main():
result = await embedder.embed_query('Hello world')
print(len(result.embeddings[0]))
#> 1536
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
See OpenAI's embedding models for available models.
Dimension Control
OpenAI's text-embedding-3-* models support dimension reduction via the dimensions setting:
from pydantic_ai import Embedder
from pydantic_ai.embeddings import EmbeddingSettings
embedder = Embedder(
'openai:text-embedding-3-small',
settings=EmbeddingSettings(dimensions=256),
)
async def main():
result = await embedder.embed_query('Hello world')
print(len(result.embeddings[0]))
#> 1536
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
OpenAI-Compatible Providers
Since OpenAIEmbeddingModel uses the same provider system as OpenAIChatModel, you can use it with any OpenAI-compatible provider:
# Using Azure OpenAI
from openai import AsyncAzureOpenAI
from pydantic_ai import Embedder
from pydantic_ai.embeddings.openai import OpenAIEmbeddingModel
from pydantic_ai.providers.openai import OpenAIProvider
azure_client = AsyncAzureOpenAI(
azure_endpoint='https://your-resource.openai.azure.com',
api_version='2024-02-01',
api_key='your-azure-key',
)
model = OpenAIEmbeddingModel(
'text-embedding-3-small',
provider=OpenAIProvider(openai_client=azure_client),
)
embedder = Embedder(model)
# Using any OpenAI-compatible API
model = OpenAIEmbeddingModel(
'your-model-name',
provider=OpenAIProvider(
base_url='https://your-provider.com/v1',
api_key='your-api-key',
),
)
embedder = Embedder(model)
For providers with dedicated provider classes (like OllamaProvider or AzureProvider), you can use the shorthand syntax:
from pydantic_ai import Embedder
embedder = Embedder('azure:text-embedding-3-small')
embedder = Embedder('ollama:nomic-embed-text')
See OpenAI-compatible Models for the full list of supported providers.
Cohere
CohereEmbeddingModel provides access to Cohere's embedding models, which offer multilingual support and various model sizes.
Install
To use Cohere embedding models, you need to either install pydantic-ai, or install pydantic-ai-slim with the cohere optional group:
pip install "pydantic-ai-slim[cohere]"
uv add "pydantic-ai-slim[cohere]"
Configuration
To use CohereEmbeddingModel, go to dashboard.cohere.com/api-keys and follow your nose until you find the place to generate an API key. Once you have the API key, you can set it as an environment variable:
export CO_API_KEY='your-api-key'
You can then use the model:
from pydantic_ai import Embedder
embedder = Embedder('cohere:embed-v4.0')
async def main():
result = await embedder.embed_query('Hello world')
print(len(result.embeddings[0]))
#> 1024
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
See the Cohere Embed documentation for available models.
Cohere-Specific Settings
Cohere models support additional settings via CohereEmbeddingSettings:
from pydantic_ai import Embedder
from pydantic_ai.embeddings.cohere import CohereEmbeddingSettings
embedder = Embedder(
'cohere:embed-v4.0',
settings=CohereEmbeddingSettings(
dimensions=512,
cohere_truncate='END', # Truncate long inputs instead of erroring
cohere_max_tokens=256, # Limit tokens per input
),
)
Sentence Transformers (Local)
SentenceTransformerEmbeddingModel runs embeddings locally using the sentence-transformers library. This is ideal for:
- Privacy — Data never leaves your infrastructure
- Cost — No API charges for high-volume workloads
- Offline use — No internet connection required after model download
Install
To use Sentence Transformers embedding models, you need to install pydantic-ai-slim with the sentence-transformers optional group:
pip install "pydantic-ai-slim[sentence-transformers]"
uv add "pydantic-ai-slim[sentence-transformers]"
Usage
from pydantic_ai import Embedder
# Model is downloaded from Hugging Face on first use
embedder = Embedder('sentence-transformers:all-MiniLM-L6-v2')
async def main():
result = await embedder.embed_query('Hello world')
print(len(result.embeddings[0]))
#> 384
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
See the Sentence-Transformers pretrained models documentation for available models.
Device Selection
Control which device to use for inference:
from pydantic_ai import Embedder
from pydantic_ai.embeddings.sentence_transformers import (
SentenceTransformersEmbeddingSettings,
)
embedder = Embedder(
'sentence-transformers:all-MiniLM-L6-v2',
settings=SentenceTransformersEmbeddingSettings(
sentence_transformers_device='cuda', # Use GPU
sentence_transformers_normalize_embeddings=True, # L2 normalize
),
)
Using an Existing Model Instance
If you need more control over model initialization:
from sentence_transformers import SentenceTransformer
from pydantic_ai import Embedder
from pydantic_ai.embeddings.sentence_transformers import (
SentenceTransformerEmbeddingModel,
)
# Create and configure the model yourself
st_model = SentenceTransformer('all-MiniLM-L6-v2', device='cpu')
# Wrap it for use with Pydantic AI
model = SentenceTransformerEmbeddingModel(st_model)
embedder = Embedder(model)
Settings
EmbeddingSettings provides common configuration options that work across providers.
Settings can be specified at the embedder level (applied to all calls) or per-call:
from pydantic_ai import Embedder
from pydantic_ai.embeddings import EmbeddingSettings
# Default settings for all calls
embedder = Embedder(
'openai:text-embedding-3-small',
settings=EmbeddingSettings(dimensions=512),
)
async def main():
# Override for a specific call
result = await embedder.embed_query(
'Hello world',
settings=EmbeddingSettings(dimensions=256),
)
print(len(result.embeddings[0]))
#> 1536
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
Token Counting
You can check token counts before embedding to avoid exceeding model limits:
from pydantic_ai import Embedder
embedder = Embedder('openai:text-embedding-3-small')
async def main():
text = 'Hello world, this is a test.'
# Count tokens in text
token_count = await embedder.count_tokens(text)
print(f'Tokens: {token_count}')
#> Tokens: 7
# Check model's maximum input tokens (returns None if unknown)
max_tokens = await embedder.max_input_tokens()
print(f'Max tokens: {max_tokens}')
#> Max tokens: 1024
(This example is complete, it can be run "as is" — you'll need to add asyncio.run(main()) to run main)
Testing
Use TestEmbeddingModel for testing without making API calls:
from pydantic_ai import Embedder
from pydantic_ai.embeddings import TestEmbeddingModel
async def test_my_rag_system():
embedder = Embedder('openai:text-embedding-3-small')
test_model = TestEmbeddingModel()
with embedder.override(model=test_model):
result = await embedder.embed_query('test query')
# TestEmbeddingModel returns deterministic embeddings
assert result.embeddings[0] == [1.0] * 8
# Check what settings were used
assert test_model.last_settings is not None
Instrumentation
Enable OpenTelemetry instrumentation for debugging and monitoring:
import logfire
from pydantic_ai import Embedder
logfire.configure()
# Instrument a specific embedder
embedder = Embedder('openai:text-embedding-3-small', instrument=True)
# Or instrument all embedders globally
Embedder.instrument_all()
See the Debugging and Monitoring guide for more details on using Logfire with Pydantic AI.
Building Custom Embedding Models
To integrate a custom embedding provider, subclass EmbeddingModel:
from collections.abc import Sequence
from pydantic_ai.embeddings import EmbeddingModel, EmbeddingResult, EmbeddingSettings
from pydantic_ai.embeddings.result import EmbedInputType
class MyCustomEmbeddingModel(EmbeddingModel):
@property
def model_name(self) -> str:
return 'my-custom-model'
@property
def system(self) -> str:
return 'my-provider'
async def embed(
self,
inputs: str | Sequence[str],
*,
input_type: EmbedInputType,
settings: EmbeddingSettings | None = None,
) -> EmbeddingResult:
inputs, settings = self.prepare_embed(inputs, settings)
# Call your embedding API here
embeddings = [[0.1, 0.2, 0.3] for _ in inputs] # Placeholder
return EmbeddingResult(
embeddings=embeddings,
inputs=inputs,
input_type=input_type,
model_name=self.model_name,
provider_name=self.system,
)
Use WrapperEmbeddingModel if you want to wrap an existing model to add custom behavior like caching or logging.