A Claude Code skill that tightens long-form recordings — removes silences, ums, fillers, and dead air while preserving laughs and comedic pauses.
More accurate than cutting from Whisper alone. Cuts are driven by Montreal Forced Aligner (MFA) word boundaries (~10–20ms precision, with true inter-word silences as explicit intervals) rather than Whisper timestamps (±100–300ms, with pauses embedded inside word durations). Whisper is used only to produce the transcript text; MFA aligns it to the audio. The result: tighter cuts, no clipped word onsets/tails, and reliable silence detection.
Point it at any video file and it will:
- Make a proxy — transcodes HEVC/4K → 1080p H.264 with hardware decode so every subsequent step runs in seconds, not minutes.
- Transcribe with Whisper —
tiny.en, for the transcript text only. - Force-align with MFA — snaps the transcript to the audio at ~10–20ms precision; this is the timing engine for every cut.
- Plan the cuts — applies a tone-aware silence heuristic (aggressive / balanced / sentimental / documentary), strips fillers, drops false starts and retakes. Shows you the plan before touching the video.
- Render — uses ffmpeg
trim+concat(notselect, which misaligns audio) for a clean cut.
Runs entirely on your machine. No cloud APIs. Built for Apple Silicon — fast.
Most silence-cutters (auto-editor, opus, etc.) are either too aggressive (they cut your laughs) or too slow (the wrong ffmpeg flags). This skill was tuned against a 7-minute 4K HEVC interview that took 5+ minutes to even decode in other tools. With the right flags (-hwaccel videotoolbox, -preset fast, hardware-friendly trim/concat) the same source cuts in ~30 seconds.
Specifically, this skill is opinionated about three things most tools get wrong:
- Laughs aren't silences. Before any silence cut, it checks audio amplitude. If a "silent" window has audible content, it stays.
- Comedic pauses are content. A 4-second pause after a punchline isn't dead air — it's the joke. The skill compresses dead air aggressively but preserves comedic timing.
- Audio sync matters. Most
ffmpeg -vf selectcut tutorials produce videos with drifting audio. This skill usestrim+concat, which keeps everything in sync.
- macOS (uses VideoToolbox for hardware-accelerated decode — works on Linux/Windows if you remove
-hwaccel videotoolboxflags) - Claude Code
ffmpegwithlibx264(brew install ffmpeg)whisper(pip install openai-whisper) — for the transcript text- Montreal Forced Aligner — the alignment / timing engine. The skill installs this for you on first run (creates a
mfaconda env and downloads the English model; installs miniforge via Homebrew if conda is missing). No manual setup needed. If Homebrew isn't available, the skill falls back to Whisper word timestamps + ffmpegsilencedetect(and says so in the plan).
git clone https://github.com/louisedesadeleer/cut-video.git ~/.claude/skills/cut-videoRestart Claude Code and /cut-video is available as a slash command.
In Claude Code:
/cut-video
Then paste a video file path when asked. The skill will:
- Make a 1080p H.264 proxy (skip if source is already small/H.264)
- Transcribe the audio with Whisper (transcript text only)
- Force-align the transcript to the audio with MFA — the timing source for every cut
- Print a proposed cut list — timestamps, what gets cut, what stays, total compression
- Wait for your "go"
- Render the cleaned version and open it for review
When asked, pick one:
aggressive(default) — tightest pacing; ~40–60% runtime reduction, mid-sentence cuts, fillers and hedges slicedbalanced— aggressive on dead air, preserves comedic beatssentimental— gentler everywhere; longer pauses preserved for emotional momentsdocumentary— only cuts genuine dead air; keeps thinking pauses intact
- It doesn't add zooms, layouts, b-roll, memes, captions, or motion graphics. Pair it with clipify for short-form output, or your editor of choice for the rest.
Built by @louisedesadeleer. PRs welcome.