-
Notifications
You must be signed in to change notification settings - Fork 0
VideoContent
gitpavleenbali edited this page Feb 17, 2026
·
2 revisions
The Video class handles video data for multimodal AI interactions.
from pyai.multimodal import Videovideo = Video.from_file("recording.mp4")video = Video.from_url("https://example.com/video.mp4")with open("video.mp4", "rb") as f:
video = Video.from_bytes(f.read(), format="mp4")| Property | Type | Description |
|---|---|---|
duration |
float | Duration in seconds |
width |
int | Frame width in pixels |
height |
int | Frame height in pixels |
fps |
float | Frames per second |
format |
str | Video format |
size_bytes |
int | File size |
frame_count |
int | Total number of frames |
Extract frames from video:
# Extract frames at intervals
frames = video.extract_frames(interval=1.0) # Every 1 second
# Extract specific number of frames
frames = video.extract_frames(count=10) # 10 evenly spaced frames
# Extract at specific timestamps
frames = video.extract_frames(timestamps=[0.0, 5.0, 10.0])Extract audio track:
audio = video.extract_audio()
audio.save("audio.mp3")Trim video:
# Trim to segment
trimmed = video.trim(start=10.0, end=30.0)
# First 60 seconds
trimmed = video.trim(end=60.0)Resize video:
resized = video.resize(width=640, height=480)Save to file:
video.save("output.mp4")
video.save("output.webm", format="webm")from pyai import ask
from pyai.multimodal import Video
video = Video.from_file("presentation.mp4")
# Extract key frames for analysis
frames = video.extract_frames(count=5)
response = ask(
"Describe what's happening in this video",
images=frames
)from pyai.multimodal import MultimodalContent, Video
content = MultimodalContent()
content.add_text("Summarize this video lecture:")
content.add_video(Video.from_file("lecture.mp4"))
response = agent.run(content)video = Video.from_file("surveillance.mp4")
for frame in video.extract_frames(interval=5.0):
analysis = ask("What do you see?", images=[frame])
print(f"Frame {frame.timestamp}s: {analysis}")| Format | Read | Write | Notes |
|---|---|---|---|
| MP4 | β | β | Most common |
| MOV | β | β | QuickTime |
| WebM | β | β | Web optimized |
| AVI | β | β | Legacy format |
| MKV | β | β | Read only |
| GIF | β | β | Animated |
thumbnail = video.get_thumbnail(time=5.0)
thumbnail.save("thumbnail.jpg")metadata = video.get_metadata()
print(f"Duration: {metadata['duration']}")
print(f"Codec: {metadata['codec']}")
print(f"Bitrate: {metadata['bitrate']}")# Convert to web-friendly format
web_video = video.convert(
format="mp4",
codec="h264",
quality="medium"
)| Provider | Video Input | Notes |
|---|---|---|
| OpenAI GPT-4o | β | Via frame extraction |
| Google Gemini | β | Native video support |
| Anthropic Claude | Via frame extraction |
- Multimodal-Module - Module overview
- ImageContent - Image handling
- AudioContent - Audio handling
Intelligence, Embedded.