YouTube Gemini Processor

A CLI tool to process videos and audio using Google's Gemini API. Supports YouTube URLs, local video/audio files, Google Cloud Storage URIs, and the Gemini Files API. Extracts comprehensive content including transcripts, visual descriptions, and structured segment analysis.

Features

Multiple input types: YouTube URLs, local files, GCS URIs, Files API references
Full transcript extraction with timestamps
Visual content descriptions (slides, diagrams, charts, demos)
Segment detection with optional video splitting via ffmpeg
Multiple analysis modes: comprehensive, concise, transcript-only, segments
Video processing controls: custom FPS, clip ranges, media resolution
Batch processing support
Multiple output formats: Markdown, JSON
Custom prompts for specialized analysis
Run anywhere with uvx - no installation required

Quick Start with uvx

No installation needed. Run directly with uvx:

# Set your API key
export GEMINI_API_KEY="your-api-key"

# Process a YouTube video
uvx --from git+ssh://git@github.com/asivura/youtube-gemini-processor.git yt-process "https://www.youtube.com/watch?v=VIDEO_ID"

# Process a local video file
uvx --from git+ssh://git@github.com/asivura/youtube-gemini-processor.git yt-process ./video.mp4

Installation

# Install from GitHub
uv pip install git+ssh://git@github.com/asivura/youtube-gemini-processor.git

# Or clone and install locally
git clone git@github.com:asivura/youtube-gemini-processor.git
cd youtube-gemini-processor
uv pip install -e .

Authentication

The tool supports two authentication methods: Gemini API key and Vertex AI.

Option 1: Gemini API Key

Get a key from Google AI Studio and set it as an environment variable:

export GEMINI_API_KEY="your-api-key"
yt-process "https://www.youtube.com/watch?v=VIDEO_ID"

You can also pass it directly:

yt-process "./video.mp4" --api-key "your-api-key"

The tool also checks the GOOGLE_API_KEY environment variable as a fallback.

Option 2: Vertex AI

Vertex AI uses Google Cloud Application Default Credentials (ADC). This is useful when working within a GCP environment or when you need access to GCS-hosted videos.

Setup:

# 1. Install the gcloud CLI (https://cloud.google.com/sdk/docs/install)

# 2. Authenticate with Application Default Credentials
gcloud auth application-default login

# 3. Set your GCP project (pick one method)
export YT_PROCESS_PROJECT="your-gcp-project"
# or pass it via --project flag

Usage:

# Use --vertex flag
yt-process "./video.mp4" --vertex --project your-gcp-project

# Or set the environment variable to auto-enable Vertex AI
export GOOGLE_GENAI_USE_VERTEXAI=true
export YT_PROCESS_PROJECT="your-gcp-project"
yt-process "./video.mp4"

Vertex AI environment variables:

Variable	Description
`YT_PROCESS_PROJECT`	GCP project ID (preferred, tool-specific)
`GOOGLE_CLOUD_PROJECT`	GCP project ID (fallback)
`YT_PROCESS_LOCATION`	GCP region (preferred, tool-specific)
`GOOGLE_CLOUD_LOCATION`	GCP region (fallback, default: `global`)
`GOOGLE_GENAI_USE_VERTEXAI`	Set to `true` to auto-enable Vertex AI without `--vertex` flag

Note: The Gemini Files API (used for local file uploads, --upload-only, --list-files) is not available with Vertex AI. To process local files with Vertex AI, upload them to GCS first (see GCS URIs below).

Authentication priority

--api-key flag
--vertex flag (Vertex AI with ADC)
GEMINI_API_KEY or GOOGLE_API_KEY environment variables
GOOGLE_GENAI_USE_VERTEXAI=true environment variable

Input Types

The tool accepts four types of input:

YouTube URLs

Pass any YouTube URL directly. Supports standard, short, embed, and Shorts URLs:

yt-process "https://www.youtube.com/watch?v=VIDEO_ID"
yt-process "https://youtu.be/VIDEO_ID"
yt-process "https://www.youtube.com/shorts/VIDEO_ID"

Only public YouTube videos are supported.

Local Video and Audio Files

Process video or audio files from your local filesystem. The file is uploaded via the Gemini Files API automatically:

yt-process ./presentation.mp4
yt-process /path/to/meeting.mov
yt-process ~/Videos/demo.webm
yt-process ./voice-memo.mp3 -m transcript

Supported video formats: .mp4, .mpeg, .mov, .avi, .webm, .wmv, .flv, .mkv, .3gp Supported audio formats: .mp3, .m4a, .wav, .flac, .ogg, .aac, .aiff, .aif

--fps and --media-resolution are video-only options. Audio inputs still support --clip for start/end offsets.

Files API References

Upload a video once and reuse the reference for multiple analyses. Files expire after 48 hours.

# Upload only - prints the file reference
yt-process ./video.mp4 --upload-only

# Reuse the reference (no re-upload)
yt-process files/abc123 -m comprehensive
yt-process files/abc123 -m segments
yt-process files/abc123 -m transcript

# List uploaded files
yt-process --list-files

# Delete a file
yt-process --delete-file files/abc123

GCS URIs

Process videos or audio stored in Google Cloud Storage. Requires Vertex AI authentication:

# Upload to GCS first
gcloud storage cp "./video.mp4" gs://your-bucket/

# Process via Vertex AI
yt-process "gs://your-bucket/video.mp4" --vertex --project your-gcp-project

# Audio transcription via Vertex AI
gcloud storage cp "./voice-memo.mp3" gs://your-bucket/
yt-process "gs://your-bucket/voice-memo.mp3" \
  --mode transcript --vertex --project your-gcp-project

Usage

Analysis Modes

# Comprehensive (default) - full transcript, visuals, summary, glossary
yt-process "./video.mp4" -m comprehensive

# Concise - quick summary with key points and slide content
yt-process "./video.mp4" -m concise

# Transcript only - transcript with visual markers
yt-process "./video.mp4" -m transcript

# Segments - identify logical sections with timestamps
yt-process "./video.mp4" -m segments

Output Formats

# Markdown (default)
yt-process "./video.mp4" -o analysis.md

# JSON
yt-process "./video.mp4" -f json -o analysis.json

Custom Prompts

# Custom analysis prompt (overrides --mode)
yt-process "./video.mp4" --prompt "Focus on technical implementation details and code examples shown"

# Extract specific information
yt-process "https://www.youtube.com/watch?v=VIDEO_ID" --prompt "List all tools, libraries, and frameworks mentioned"

Model Selection

yt-process "./video.mp4" --model gemini-3.1-pro-preview   # Default, best quality
yt-process "./video.mp4" --model gemini-3-pro-preview     # Previous 3 Pro revision
yt-process "./video.mp4" --model gemini-3-flash-preview   # Faster, lower cost
yt-process "./video.mp4" --model gemini-2.5-pro           # Previous gen, high quality
yt-process "./video.mp4" --model gemini-2.5-flash         # Previous gen, fast
yt-process "./video.mp4" --model gemini-2.0-flash         # Oldest supported

Video Processing Options

Frame Rate (`--fps`)

Override Gemini's default 1 FPS frame sampling. Higher values capture more detail but increase token usage.

yt-process ./video.mp4 --fps 2      # More visual detail
yt-process ./video.mp4 --fps 0.5    # Fewer frames, lower cost

Clip (`--clip`)

Process only a portion of the video:

yt-process ./video.mp4 --clip 1:30-5:00        # MM:SS format
yt-process ./video.mp4 --clip 90-300            # Raw seconds
yt-process ./video.mp4 --clip 0:01:30-0:05:00   # HH:MM:SS format

Media Resolution (`--media-resolution`)

Control the resolution at which video frames are processed:

Value	Tokens/Frame	Use Case
`low`	~66	Long videos, cost optimization
`medium`	(intermediate)	Balanced
`high`	~258 (default)	Detailed visual analysis

yt-process ./video.mp4 --media-resolution low

Combining Options

# Analyze a specific clip at low resolution with custom FPS
yt-process files/abc123 --clip 0:00-10:00 --fps 0.5 --media-resolution low

# Detailed analysis of a short segment
yt-process ./video.mp4 --clip 5:00-5:30 --fps 5 --media-resolution high

Segments Mode

Identify logical sections of a video with timestamps, speakers, and summaries. Use --split to also split the video into separate files via ffmpeg.

# Identify segments
yt-process "./video.mp4" -m segments

# Identify and split into files (requires ffmpeg)
yt-process "./video.mp4" -m segments --split

# Split YouTube video by chapters (auto-detected from description)
yt-process "https://www.youtube.com/watch?v=VIDEO_ID" --split

# Output segments as JSON
yt-process "./video.mp4" -m segments -f json -o segments.json

Recommended model for segments: Use gemini-3.1-pro-preview (the default). Smaller models tend to truncate and only cover the first portion of long videos.

Batch Processing

Create a file with inputs (one per line). Supports YouTube URLs, local files, and GCS URIs:

# inputs.txt
https://www.youtube.com/watch?v=VIDEO_ID_1
./presentation.mp4
gs://my-bucket/meeting.mp4
https://www.youtube.com/watch?v=VIDEO_ID_2

yt-process --batch inputs.txt -o ./output_dir/

CLI Reference

Usage: yt-process [OPTIONS] [INPUT]

Options:
  -b, --batch PATH                File containing URLs or file paths (one per line)
  -o, --output PATH               Output file or directory (for batch mode)
  -f, --format [markdown|json]    Output format (default: markdown)
  -m, --mode [comprehensive|concise|transcript|segments]
                                  Analysis mode (default: comprehensive)
  -p, --prompt TEXT               Custom prompt (overrides --mode)
  --model [gemini-3-pro-preview|gemini-3.1-pro-preview|gemini-3-flash-preview|gemini-2.5-pro|gemini-2.5-flash|gemini-2.0-flash]
                                  Gemini model (default: gemini-3.1-pro-preview)
  --api-key TEXT                  Gemini API key (or set GEMINI_API_KEY)
  --vertex                       Use Vertex AI authentication
  --project TEXT                  GCP project for Vertex AI
  --location TEXT                 GCP location for Vertex AI
  --fps FLOAT                    Frame sampling rate (default: 1 FPS)
  --clip TEXT                    Process a clip: START-END (e.g., 1:30-5:00)
  --media-resolution [low|medium|high]
                                  Frame resolution (default: high)
  --split                        Split video into segments (segments mode) or by chapters (YouTube URLs)
  --workers INTEGER               Number of parallel workers for --split with YouTube URLs (default: 4)
  --upload-only                  Upload file and print reference without processing
  --list-files                   List files uploaded to the Files API
  --delete-file TEXT             Delete a Files API reference
  -v, --verbose                  Verbose output
  --version                      Show version
  --help                         Show this message and exit

API Limits

Tier	Daily Limit
Free	8 hours of YouTube video/day
Paid	No limit

Shell Alias (Recommended)

Add to your ~/.bashrc or ~/.zshrc:

alias yt-process='uvx --from git+ssh://git@github.com/asivura/youtube-gemini-processor.git yt-process'

Then use simply:

yt-process "./video.mp4"
yt-process "https://www.youtube.com/watch?v=VIDEO_ID"

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
src/youtube_gemini_processor		src/youtube_gemini_processor
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

YouTube Gemini Processor

Features

Quick Start with uvx

Installation

Authentication

Option 1: Gemini API Key

Option 2: Vertex AI

Authentication priority

Input Types

YouTube URLs

Local Video and Audio Files

Files API References

GCS URIs

Usage

Analysis Modes

Output Formats

Custom Prompts

Model Selection

Video Processing Options

Frame Rate (--fps)

Clip (--clip)

Media Resolution (--media-resolution)

Combining Options

Segments Mode

Batch Processing

CLI Reference

API Limits

Shell Alias (Recommended)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Frame Rate (`--fps`)

Clip (`--clip`)

Media Resolution (`--media-resolution`)

Packages