YouTube HyperFrames Agent Toolkit

A reusable agentic editing toolkit for turning a recorded YouTube talking-head video into a HyperFrames project with smart component reuse.

It handles:

FFmpeg rough cuts using silence detection;
optional transcript-based filler/retake cuts;
optional Kimi K2.6 edit-director suggestions;
HyperFrames HTML output with captions, hook cards, zooms, callouts, progress bars, flashes, grain/grid overlays;
a component system that helps agents reuse local snippets, install official HyperFrames registry blocks/components, and create custom reusable components during editing.

The style presets are inspired by high-retention Indian AI/startup creator editing patterns. They are not meant to impersonate any creator or clone a protected identity.

Requirements

Python 3.11+
Node.js 22+
FFmpeg
HyperFrames CLI via npx hyperframes

Install HyperFrames skills for your coding agent:

npx skills add heygen-com/hyperframes

Install

cd yt-hyperframes-agent
python -m venv .venv
source .venv/bin/activate
pip install -e .

Optional AI/transcription support:

pip install -e '.[ai]'

Quick start

yt-hyperframes build ./my-recording.mp4 \
  --style creator_hybrid \
  --project ./out/my-edit

cd ./out/my-edit
npx hyperframes preview
npx hyperframes render --output output.mp4

With transcript + Kimi K2.6 (multimodal video understanding):

export MOONSHOT_API_KEY="..."

yt-hyperframes build ./my-recording.mp4 \
  --transcript ./transcript.srt \
  --ai kimi \
  --style vaibhav_sisinty \
  --project ./out/ai-growth-edit

The Kimi provider defaults to kimi-k2.6 (Moonshot's native multimodal model released April 2026; supports mp4/mov/webm/mpeg/avi video input). Override with KIMI_MODEL=kimi-k2.5 if you don't have K2.6 access yet. Videos under ~20MB are sent inline as base64; larger files are uploaded via the Moonshot Files API and referenced as ms://<file-id>.

Generate transcript with OpenAI Whisper API:

export OPENAI_API_KEY="..."

yt-hyperframes build ./my-recording.mp4 \
  --transcribe openai \
  --ai kimi \
  --style varun_mayya \
  --project ./out/tech-edit

Long-form recordings (15+ minutes)

For 25–30 minute recordings, run a three-step agentic workflow instead of one-shot build:

# 1. Transcribe (cached, reusable)
npx hyperframes transcribe recording.mp4 --model medium.en

# 2. Generate the edit plan only (no FFmpeg render yet)
yt-hyperframes plan recording.mp4 \
  --transcript transcript.json \
  --style varun_mayya \
  --output plan.raw.json

# 3. Have your agent (Claude Code, Codex, Cursor) read plan.raw.json and prune
#    keep_segments down to the segments worth keeping. The agent saves the
#    result as plan.json. Then render only what survived:
yt-hyperframes render-from-plan plan.json --project ./out/edit

The toolkit does long-form-aware work automatically:

Density scaling — for output > 90s, zooms_per_minute and max_captions_per_minute are tightened progressively. A 30-minute edit drops from 660 zooms / 1320 captions to 165 / 330.
Auto chapter cards — for varun_mayya and clean_podcast styles, chapter captions are emitted every 120s / 180s with titles pulled from the transcript. Agents can rewrite titles in plan.json before rendering.
Targeted Kimi vision — --ai kimi defaults to KIMI_VISION_MODE=targeted, which only sends windows with long silence density, retake language, complex tool/product transcript content, or low-confidence visual references. Set KIMI_VISION_MODE=full to restore the old full-video behavior.
Chunked Kimi inference — when Kimi needs long-source vision, the proxy is sliced into ~4-minute windows (8s overlap) and inference runs in parallel (up to 3 workers). Per-window outputs are time-rebased and merged. Without chunking a 30-min recording would either OOM the model context or spend all of it on visual tokens.

Agent prompt for a 30-minute recording

You're editing a 30-minute talking-head recording at recording.mp4 into a
~12-minute vertical YouTube edit in the varun_mayya style.

1. Run: npx hyperframes transcribe recording.mp4 --model medium.en
2. Run: yt-hyperframes plan recording.mp4 --transcript transcript.json
        --style varun_mayya --ai none --output plan.raw.json
3. Read plan.raw.json. The transcript is in the transcript field. For each
   keep segment, decide whether the speech inside is part of the spine of
   the video. Drop tangents, redundant explanations, hesitations, and any
   weak segment. Target output_duration: 10–14 minutes total.
4. Read the captions field. Rewrite chapter card titles to be more specific
   and skim-friendly using the actual transcript content of each chapter.
5. Save the edited plan as plan.json.
6. Run: yt-hyperframes render-from-plan plan.json --project out/edit
7. Run: cd out/edit && npx hyperframes lint
   Fix any errors, then: npx hyperframes preview to review.
8. Render: npx hyperframes render --output final.mp4 --quality high

Component-first editing

Each generated project now contains:

hyperframes.json                  registry + paths config read by `npx hyperframes add`
components/manifest.json          selected local + official registry component decisions
components/usage-rules.yaml       agent decision rules
components/local/<name>/          registry-compatible local reusable snippets
components/custom/<name>/         custom components created during later edits
compositions/components/<name>.html installed local snippets for easy reuse
compositions/<block>.html         local reusable block compositions

List available component templates:

yt-hyperframes components

Create a reusable custom component inside a generated project:

cd ./out/my-edit
yt-hyperframes component create \
  --project . \
  --name branded-stat-card \
  --type component \
  --description "Reusable stat card for my channel" \
  --tags stat,brand,callout

Use official HyperFrames registry items when helpful:

npx hyperframes add yt-lower-third --dir . --json --no-clipboard
npx hyperframes add grain-overlay --dir . --json --no-clipboard
npx hyperframes add whip-pan --dir . --json --no-clipboard

The generator writes these install commands into components/manifest.json based on the chosen style and transcript.

Style presets

yt-hyperframes styles

Built-ins:

creator_hybrid — bold vertical creator edit, yellow/cyan emphasis, fast zooms, official transition suggestions.
varun_mayya — cleaner tech-founder explainer, dark UI, cyan/purple callouts, charts/flowchart suggestions.
vaibhav_sisinty — high-energy AI/growth/news shorts, red/yellow urgency, flash/glitch/whip-pan suggestions.
clean_podcast — landscape long-form edit with lower flashiness and lower-third/outro components.
product_demo — UI-forward SaaS/tutorial walkthrough with split-screen b-roll, cursor/prompt moments, and product reveal components.
cinematic_story — warmer narrative creator edit with slower punch-ins, light leaks, and fewer bigger emphasis beats.
minimal_tutorial — quiet desktop tutorial style with readable lower thirds, restrained zooms, and structure-first diagrams/data components.

Agent workflow

Give Claude Code, Codex, Cursor, or another coding agent this prompt from inside a generated project:

Use /hyperframes and /hyperframes-registry. Read components/manifest.json, components/usage-rules.yaml, index.html, and data/edit-plan.json. Reuse local components first, install official registry items only when useful, and create custom components with yt-hyperframes component create when a visual pattern repeats. Run npx hyperframes lint before rendering.

Pipeline

Probe source video duration with FFprobe.
Detect long silences with FFmpeg silencedetect.
Load optional transcript from SRT or Whisper verbose JSON.
Detect filler words and retake phrases when word timestamps exist.
Optionally ask Kimi K2.6 for semantic/video edit suggestions.
Merge cuts into an edit decision list.
Generate component recommendations from style, transcript, effects, and AI suggestions.
Render a flattened rough cut with FFmpeg.
Transcode to the target canvas.
Generate HyperFrames HTML plus a component manifest and local component library.

Safety and factuality guardrails

Captions and callouts are grounded in the transcript. If a transcript is missing, the toolkit relies more on gap removal and visual timing, and generates fewer text overlays.

Troubleshooting

yt-hyperframes error: Required binary 'ffmpeg' was not found on PATH.

Install FFmpeg and verify:

ffmpeg -version

MOONSHOT_API_KEY is not set.

Run without Kimi:

yt-hyperframes build input.mp4 --ai none

or export your Kimi API key.

HyperFrames render cannot fetch GSAP from CDN.

Ask your agent to replace the CDN script in index.html with a local assets/gsap.min.js reference.

License

MIT for this scaffold. HyperFrames itself is separate and published by HeyGen under its own license.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.codex-plugin		.codex-plugin
component-library		component-library
docs		docs
examples		examples
hf_youtube_agent		hf_youtube_agent
scripts		scripts
skills/youtube-hyperframes-editor		skills/youtube-hyperframes-editor
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
VALIDATION.md		VALIDATION.md
architecture.md		architecture.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube HyperFrames Agent Toolkit

Requirements

Install

Quick start

Long-form recordings (15+ minutes)

Agent prompt for a 30-minute recording

Component-first editing

Style presets

Agent workflow

Pipeline

Safety and factuality guardrails

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YouTube HyperFrames Agent Toolkit

Requirements

Install

Quick start

Long-form recordings (15+ minutes)

Agent prompt for a 30-minute recording

Component-first editing

Style presets

Agent workflow

Pipeline

Safety and factuality guardrails

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages