Skip to content

vibe edit reframe --track: local subject tracking (MediaPipe / YOLO / SAM-2) #205

@kiyeonjeon21

Description

@kiyeonjeon21

Why

ROADMAP Phase 4 lists "Real-Time Subject Tracking — local MediaPipe / YOLO / SAM-2 for fast-moving subject reframing, replacing today's Claude Vision keyframe approach (vibe edit reframe --track)" as an open item.

Today's path is slow and expensive (Vision API per keyframe). A local model would unlock per-frame tracking without per-call cost.

State of code (2026-04-29)

  • packages/cli/src/commands/edit-cmd.ts:478-560vibe edit reframe command
  • edit-cmd.ts:556"Analyzing frames for subject tracking..." — extracts keyframes, sends to Claude Vision for subject location
  • Output: ROI hints used by FFmpeg crop+resize

Limitations of current approach:

  1. Cost grows linearly with keyframe count
  2. Sparse sampling means jittery output on fast-moving subjects
  3. Requires ANTHROPIC_API_KEY for what should be a local-first operation

Scope (sketch — design doc needed)

  • Pick a tracker: MediaPipe (face/pose, fastest), YOLO (general object detection), SAM-2 (segmentation, slowest but most accurate). Could be a --tracker <mediapipe|yolo|sam2> flag.
  • Distribution: native deps are painful for a npm install -g story. Options:
    • WASM build (MediaPipe has one; YOLO via ONNX Runtime Web)
    • Optional native dep with Claude-Vision fallback
    • Separate @vibeframe/tracker-native package, soft-imported
  • Pipeline: dense per-frame ROI → smoothed crop curve → FFmpeg crop+scale
  • Backwards compat: keep current Claude Vision path as --tracker claude or default if local model not installed
  • Tests: golden ROI snapshot for a known fixture clip
  • Cost reporting (--describe): local trackers should report cost: 0 and a wall-clock estimate

Out of scope

  • Tracking subjects across cuts (scene boundary detection is vibe analyze scene's job)
  • Multi-subject tracking with assignment / re-id

Recommendation

Big enough to need a docs/design/ doc + plan PR series. Probably the highest-effort item on the Phase 4 list. Marked help wanted — this is a great fit for an OSS contributor with ML/vision background.

Reference

  • ROADMAP.md Phase 4 "Open items in Phase 4 (v0.61+ candidates)"
  • Current implementation: packages/cli/src/commands/edit-cmd.ts:478-560

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or enhancementhelp wantedExtra attention is needed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions