A reusable agentic editing toolkit for turning a recorded YouTube talking-head video into a HyperFrames project with smart component reuse.
It handles:
- FFmpeg rough cuts using silence detection;
- optional transcript-based filler/retake cuts;
- optional Kimi K2.6 edit-director suggestions;
- HyperFrames HTML output with captions, hook cards, zooms, callouts, progress bars, flashes, grain/grid overlays;
- a component system that helps agents reuse local snippets, install official HyperFrames registry blocks/components, and create custom reusable components during editing.
The style presets are inspired by high-retention Indian AI/startup creator editing patterns. They are not meant to impersonate any creator or clone a protected identity.
- Python 3.11+
- Node.js 22+
- FFmpeg
- HyperFrames CLI via
npx hyperframes
Install HyperFrames skills for your coding agent:
npx skills add heygen-com/hyperframescd yt-hyperframes-agent
python -m venv .venv
source .venv/bin/activate
pip install -e .Optional AI/transcription support:
pip install -e '.[ai]'yt-hyperframes build ./my-recording.mp4 \
--style creator_hybrid \
--project ./out/my-edit
cd ./out/my-edit
npx hyperframes preview
npx hyperframes render --output output.mp4With transcript + Kimi K2.6 (multimodal video understanding):
export MOONSHOT_API_KEY="..."
yt-hyperframes build ./my-recording.mp4 \
--transcript ./transcript.srt \
--ai kimi \
--style vaibhav_sisinty \
--project ./out/ai-growth-editThe Kimi provider defaults to kimi-k2.6 (Moonshot's native multimodal model
released April 2026; supports mp4/mov/webm/mpeg/avi video input). Override with
KIMI_MODEL=kimi-k2.5 if you don't have K2.6 access yet. Videos under ~20MB
are sent inline as base64; larger files are uploaded via the Moonshot Files API
and referenced as ms://<file-id>.
Generate transcript with OpenAI Whisper API:
export OPENAI_API_KEY="..."
yt-hyperframes build ./my-recording.mp4 \
--transcribe openai \
--ai kimi \
--style varun_mayya \
--project ./out/tech-editFor 25–30 minute recordings, run a three-step agentic workflow instead of one-shot build:
# 1. Transcribe (cached, reusable)
npx hyperframes transcribe recording.mp4 --model medium.en
# 2. Generate the edit plan only (no FFmpeg render yet)
yt-hyperframes plan recording.mp4 \
--transcript transcript.json \
--style varun_mayya \
--output plan.raw.json
# 3. Have your agent (Claude Code, Codex, Cursor) read plan.raw.json and prune
# keep_segments down to the segments worth keeping. The agent saves the
# result as plan.json. Then render only what survived:
yt-hyperframes render-from-plan plan.json --project ./out/editThe toolkit does long-form-aware work automatically:
- Density scaling — for output > 90s,
zooms_per_minuteandmax_captions_per_minuteare tightened progressively. A 30-minute edit drops from 660 zooms / 1320 captions to 165 / 330. - Auto chapter cards — for
varun_mayyaandclean_podcaststyles, chapter captions are emitted every 120s / 180s with titles pulled from the transcript. Agents can rewrite titles inplan.jsonbefore rendering. - Targeted Kimi vision —
--ai kimidefaults toKIMI_VISION_MODE=targeted, which only sends windows with long silence density, retake language, complex tool/product transcript content, or low-confidence visual references. SetKIMI_VISION_MODE=fullto restore the old full-video behavior. - Chunked Kimi inference — when Kimi needs long-source vision, the proxy is sliced into ~4-minute windows (8s overlap) and inference runs in parallel (up to 3 workers). Per-window outputs are time-rebased and merged. Without chunking a 30-min recording would either OOM the model context or spend all of it on visual tokens.
You're editing a 30-minute talking-head recording at recording.mp4 into a
~12-minute vertical YouTube edit in the varun_mayya style.
1. Run: npx hyperframes transcribe recording.mp4 --model medium.en
2. Run: yt-hyperframes plan recording.mp4 --transcript transcript.json
--style varun_mayya --ai none --output plan.raw.json
3. Read plan.raw.json. The transcript is in the transcript field. For each
keep segment, decide whether the speech inside is part of the spine of
the video. Drop tangents, redundant explanations, hesitations, and any
weak segment. Target output_duration: 10–14 minutes total.
4. Read the captions field. Rewrite chapter card titles to be more specific
and skim-friendly using the actual transcript content of each chapter.
5. Save the edited plan as plan.json.
6. Run: yt-hyperframes render-from-plan plan.json --project out/edit
7. Run: cd out/edit && npx hyperframes lint
Fix any errors, then: npx hyperframes preview to review.
8. Render: npx hyperframes render --output final.mp4 --quality high
Each generated project now contains:
hyperframes.json registry + paths config read by `npx hyperframes add`
components/manifest.json selected local + official registry component decisions
components/usage-rules.yaml agent decision rules
components/local/<name>/ registry-compatible local reusable snippets
components/custom/<name>/ custom components created during later edits
compositions/components/<name>.html installed local snippets for easy reuse
compositions/<block>.html local reusable block compositions
List available component templates:
yt-hyperframes componentsCreate a reusable custom component inside a generated project:
cd ./out/my-edit
yt-hyperframes component create \
--project . \
--name branded-stat-card \
--type component \
--description "Reusable stat card for my channel" \
--tags stat,brand,calloutUse official HyperFrames registry items when helpful:
npx hyperframes add yt-lower-third --dir . --json --no-clipboard
npx hyperframes add grain-overlay --dir . --json --no-clipboard
npx hyperframes add whip-pan --dir . --json --no-clipboardThe generator writes these install commands into components/manifest.json based on the chosen style and transcript.
yt-hyperframes stylesBuilt-ins:
creator_hybrid— bold vertical creator edit, yellow/cyan emphasis, fast zooms, official transition suggestions.varun_mayya— cleaner tech-founder explainer, dark UI, cyan/purple callouts, charts/flowchart suggestions.vaibhav_sisinty— high-energy AI/growth/news shorts, red/yellow urgency, flash/glitch/whip-pan suggestions.clean_podcast— landscape long-form edit with lower flashiness and lower-third/outro components.product_demo— UI-forward SaaS/tutorial walkthrough with split-screen b-roll, cursor/prompt moments, and product reveal components.cinematic_story— warmer narrative creator edit with slower punch-ins, light leaks, and fewer bigger emphasis beats.minimal_tutorial— quiet desktop tutorial style with readable lower thirds, restrained zooms, and structure-first diagrams/data components.
Give Claude Code, Codex, Cursor, or another coding agent this prompt from inside a generated project:
Use
/hyperframesand/hyperframes-registry. Readcomponents/manifest.json,components/usage-rules.yaml,index.html, anddata/edit-plan.json. Reuse local components first, install official registry items only when useful, and create custom components withyt-hyperframes component createwhen a visual pattern repeats. Runnpx hyperframes lintbefore rendering.
- Probe source video duration with FFprobe.
- Detect long silences with FFmpeg
silencedetect. - Load optional transcript from SRT or Whisper verbose JSON.
- Detect filler words and retake phrases when word timestamps exist.
- Optionally ask Kimi K2.6 for semantic/video edit suggestions.
- Merge cuts into an edit decision list.
- Generate component recommendations from style, transcript, effects, and AI suggestions.
- Render a flattened rough cut with FFmpeg.
- Transcode to the target canvas.
- Generate HyperFrames HTML plus a component manifest and local component library.
Captions and callouts are grounded in the transcript. If a transcript is missing, the toolkit relies more on gap removal and visual timing, and generates fewer text overlays.
yt-hyperframes error: Required binary 'ffmpeg' was not found on PATH.
Install FFmpeg and verify:
ffmpeg -versionMOONSHOT_API_KEY is not set.
Run without Kimi:
yt-hyperframes build input.mp4 --ai noneor export your Kimi API key.
HyperFrames render cannot fetch GSAP from CDN.
Ask your agent to replace the CDN script in index.html with a local assets/gsap.min.js reference.
MIT for this scaffold. HyperFrames itself is separate and published by HeyGen under its own license.