-
Notifications
You must be signed in to change notification settings - Fork 0
Multimodal
gitpavleenbali edited this page Feb 17, 2026
·
2 revisions
Process images, audio, and video with AI agents.
See Multimodal-Module for full documentation.
from pyai.multimodal import ImageContent, AudioContent
# Image analysis
image = ImageContent.from_file("photo.jpg")
description = image.describe()
# Audio transcription
audio = AudioContent.from_file("recording.mp3")
text = audio.transcribe()- Image understanding and analysis
- Audio transcription
- Video frame analysis
- Multi-modal conversations
- Format conversion
| Type | Formats |
|---|---|
| Image | PNG, JPG, GIF, WebP |
| Audio | MP3, WAV, M4A, FLAC |
| Video | MP4, MOV, AVI |
- Multimodal-Module - Full module documentation
- ImageContent - Image processing
- AudioContent - Audio processing
- VideoContent - Video processing
Intelligence, Embedded.