Skip to content

VideoContent

gitpavleenbali edited this page Feb 17, 2026 · 2 revisions

VideoContent

The Video class handles video data for multimodal AI interactions.

Import

from pyai.multimodal import Video

Creating Video

From File

video = Video.from_file("recording.mp4")

From URL

video = Video.from_url("https://example.com/video.mp4")

From Bytes

with open("video.mp4", "rb") as f:
    video = Video.from_bytes(f.read(), format="mp4")

Properties

Property Type Description
duration float Duration in seconds
width int Frame width in pixels
height int Frame height in pixels
fps float Frames per second
format str Video format
size_bytes int File size
frame_count int Total number of frames

Methods

extract_frames()

Extract frames from video:

# Extract frames at intervals
frames = video.extract_frames(interval=1.0)  # Every 1 second

# Extract specific number of frames
frames = video.extract_frames(count=10)  # 10 evenly spaced frames

# Extract at specific timestamps
frames = video.extract_frames(timestamps=[0.0, 5.0, 10.0])

extract_audio()

Extract audio track:

audio = video.extract_audio()
audio.save("audio.mp3")

trim()

Trim video:

# Trim to segment
trimmed = video.trim(start=10.0, end=30.0)

# First 60 seconds
trimmed = video.trim(end=60.0)

resize()

Resize video:

resized = video.resize(width=640, height=480)

save()

Save to file:

video.save("output.mp4")
video.save("output.webm", format="webm")

Using with Agents

Video Analysis

from pyai import ask
from pyai.multimodal import Video

video = Video.from_file("presentation.mp4")

# Extract key frames for analysis
frames = video.extract_frames(count=5)

response = ask(
    "Describe what's happening in this video",
    images=frames
)

With MultimodalContent

from pyai.multimodal import MultimodalContent, Video

content = MultimodalContent()
content.add_text("Summarize this video lecture:")
content.add_video(Video.from_file("lecture.mp4"))

response = agent.run(content)

Frame-by-Frame Analysis

video = Video.from_file("surveillance.mp4")

for frame in video.extract_frames(interval=5.0):
    analysis = ask("What do you see?", images=[frame])
    print(f"Frame {frame.timestamp}s: {analysis}")

Format Support

Format Read Write Notes
MP4 βœ… βœ… Most common
MOV βœ… βœ… QuickTime
WebM βœ… βœ… Web optimized
AVI βœ… βœ… Legacy format
MKV βœ… ❌ Read only
GIF βœ… βœ… Animated

Video Processing

Get Thumbnail

thumbnail = video.get_thumbnail(time=5.0)
thumbnail.save("thumbnail.jpg")

Get Metadata

metadata = video.get_metadata()
print(f"Duration: {metadata['duration']}")
print(f"Codec: {metadata['codec']}")
print(f"Bitrate: {metadata['bitrate']}")

Convert Format

# Convert to web-friendly format
web_video = video.convert(
    format="mp4",
    codec="h264",
    quality="medium"
)

Provider Support

Provider Video Input Notes
OpenAI GPT-4o βœ… Via frame extraction
Google Gemini βœ… Native video support
Anthropic Claude ⚠️ Via frame extraction

See Also

🧠 PYAI Wiki

Home


πŸš€ Getting Started


πŸ’‘ Core Concepts


🎯 One-Liner APIs


πŸ€– Agent Framework


πŸ”— Multi-Agent


πŸ› οΈ Tools & Skills


🏒 Enterprise


πŸŽ™οΈ Voice


πŸ–ΌοΈ Multimodal


πŸ“Š Vector DB


🌐 OpenAPI


πŸ”Œ Plugins


🀝 A2A Protocol


πŸ”’ Security


πŸ“š Reference


Intelligence, Embedded.

Clone this wiki locally