Version Notice
This documentation is ahead of the last release by 1 commit. You may see documentation for features not yet supported in the latest release v0.0.29 2025-02-27.
Image and Audio Input
Some LLMs are now capable of understanding both audio and image content.
Image Input
Info
Some models do not support image input. Please check the model's documentation to confirm whether it supports image input.
If you have a direct URL for the image, you can use ImageUrl
:
from pydantic_ai import Agent, ImageUrl
agent = Agent(model='openai:gpt-4o')
result = agent.run_sync(
[
'What company is this logo from?',
ImageUrl(url='https://iili.io/3Hs4FMg.png'),
]
)
print(result.data)
#> This is the logo for Pydantic, a data validation and settings management library in Python.
If you have the image locally, you can also use BinaryContent
:
import httpx
from pydantic_ai import Agent, BinaryContent
image_response = httpx.get('https://iili.io/3Hs4FMg.png') # Pydantic logo
agent = Agent(model='openai:gpt-4o')
result = agent.run_sync(
[
'What company is this logo from?',
BinaryContent(data=image_response.content, media_type='image/png'), # (1)!
]
)
print(result.data)
#> This is the logo for Pydantic, a data validation and settings management library in Python.
- To ensure the example is runnable we download this image from the web, but you can also use
Path().read_bytes()
to read a local file's contents.
Audio Input
Info
Some models do not support audio input. Please check the model's documentation to confirm whether it supports audio input.
You can provide audio input using either AudioUrl
or BinaryContent
. The process is analogous to the examples above.