Skip to content

fix(producer): rebuildExtractedFramesFromPlanDir off-by-one in framePaths key indexing#1730

Merged
jrusso1020 merged 2 commits into
heygen-com:mainfrom
Tzuhany:fix/distributed-framepaths-offbyone
Jun 26, 2026
Merged

fix(producer): rebuildExtractedFramesFromPlanDir off-by-one in framePaths key indexing#1730
jrusso1020 merged 2 commits into
heygen-com:mainfrom
Tzuhany:fix/distributed-framepaths-offbyone

Conversation

@Tzuhany

@Tzuhany Tzuhany commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Closes #1731

Hit a 1-frame black flash at every <video> boundary when rendering through the distributed chunk-lambda path. Local single-process renders were fine, so it didn't show up until things hit lambda.

It's rebuildExtractedFramesFromPlanDir in renderChunk.ts:

// FrameLookupTable indexes frames 1-based.
framePaths.set(i + 1, join(outputDir, frameName));

That comment isn't right — getFrameAtTime in engine/videoFrameExtractor.ts:958 looks up with Math.floor(localTime * fps + 1e-9), which is 0-based. Every video's first-paint frame (localTime === 0frameIndex === 0) looks up framePaths.get(0) and gets undefined. The vid then drops out of activePayloads, inject never fires for it, and the BeginFrame screenshot shows just the body background plus any persistent overlays you have on top — Y≈22 with sparse highlights from text/logo, basically a single-frame "almost black" flash at every vid start.

Every other place in the codebase that builds or reads this Map is 0-based — engine/videoFrameExtractor.ts:317 and engine/extractionCache.ts:204 both set(i, ...), the consumer is 0-based, and renderOrchestrator.test.ts asserts framePaths.get(0) is the first frame. renderChunk is the only 1-based one, which is why local tests don't catch it.

Verified against a 3-vid back-to-back composition and a single-vid render split across 4 workers in chunk-lambda. Before: per-frame signalstats shows YAVG ≈ 16-22 at every first-paint frame. After: blackdetect only picks up legitimate source content (fade-ins / hard cuts in the source mp4s).

FWIW the diagnostic log we added on the framePaths.get miss made it pretty obvious:

vid=v4 wantIdx=0 totalFrames=121 keys=[1..121] hasKey0=false hasKey1=true
vid=v5 wantIdx=0 totalFrames=121 keys=[1..121] hasKey0=false hasKey1=true
vid=v6 wantIdx=0 totalFrames=121 keys=[1..121] hasKey0=false hasKey1=true

@Tzuhany

Tzuhany commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

cc @jrusso1020

@miguel-heygen miguel-heygen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one-line fix is the right direction: packages/producer/src/services/distributed/renderChunk.ts:197 now matches the extractor/consumer's 0-based framePaths contract instead of the old filename-numbering assumption. I also reproduced the failure locally through the real plan -> renderChunk -> assemble distributed-simulated path.

blocker: packages/producer/src/services/distributed/renderChunk.ts:197 changes the production distributed frame lookup contract, but this PR adds no regression coverage. The bug existed because distributed reconstruction drifted from the extractor/engine contract, and the existing suite did not assert that rebuilt framePaths resolves key 0 at a video clip's first paint. Please add a focused test before merge. Good shapes: a unit test for rebuildExtractedFramesFromPlanDir/FrameLookupTable asserting t=start returns the first extracted frame, or a small distributed-simulated fixture with back-to-back videos that fails on the previous i + 1 indexing.

CI note: current PR checks are still pending/failing, so I did not treat CI as merge-ready either.

Verdict: REQUEST CHANGES
Reasoning: The fix is correct, but the exact distributed-only contract that caused the production bug needs to be pinned so this cannot regress silently.

@jrusso1020

Copy link
Copy Markdown
Collaborator

Added the regression test you asked for at 3df8a8f — pins the 0-based framePaths.get(0) contract on rebuildExtractedFramesFromPlanDir via a focused unit test that mkdtemps a fake video-frames/<videoId>/ dir, calls the function directly, and asserts the first frame resolves at key 0 (with a multi-video variant that also asserts every video's first-paint frame resolves — same shape as the original repro).

Verified the test fails against the previous i + 1 indexing — when I revert the fix and re-run, both assertions fail with the exact symptom:

expect(received).toBeDefined()
Received: undefined
    at .../rebuildExtractedFrames.test.ts:91:21

Test lives in its own file (packages/producer/src/services/distributed/rebuildExtractedFrames.test.ts) so it stays Chrome-free and runs in ~10ms — the existing renderChunk.test.ts pays a multi-second Chrome smoke probe in its module-level beforeAll, and the regression check doesn't need a browser. Bundled two small drive-by changes: exported rebuildExtractedFramesFromPlanDir (was module-local) so the test can call it directly, and updated its doc comment that still said "1-based framePaths" → "0-based" with a pointer to the consumer at videoFrameExtractor.ts:getFrameAtTime.

— Jerrai

@Tzuhany

Tzuhany commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

No more changes from my side.

Copy link
Copy Markdown
Collaborator

Merge activity

  • Jun 26, 3:47 AM UTC: Graphite couldn't merge this PR because it failed for an unknown reason (Fast-forward merges are not supported for forked repositories. Please create a branch in the target repository in order to merge).

@jrusso1020

Copy link
Copy Markdown
Collaborator

@Tzuhany would you be able to sign your commit? we can't merge this at the moment without that

otherwise I can recreate the fix myself in a new branch

@Tzuhany

Tzuhany commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Setting up signing now — give me ~5 min and I'll force-push a signed version of my commit.

Tzuhany and others added 2 commits June 26, 2026 11:57
…aths key indexing

In distributed chunk-lambda render mode, every <video>'s first-paint
frame (the moment a vid first becomes visible on the composition
timeline) renders as PRISTINE Y=16 black. For a 3-vid back-to-back
composition (v1: 0-4s, v2: 4-8s, v3: 8-12s at 30fps), frames 0, 121,
242 are all PRISTINE black; the render then either stays black for 1
frame, or shows body bg + persistent overlays only (Y~22 with sparse
highlights from text/logo). The symptom only reproduces in distributed
mode — local single-process renders are unaffected.

Root cause: rebuildExtractedFramesFromPlanDir builds the framePaths Map
with 1-based keys, but the consumer (getFrameAtTime at
engine/videoFrameExtractor.ts:958) computes a 0-based frame index via
Math.floor(localTime * fps + 1e-9). For each vid's first-paint frame
(localTime === 0 → frameIndex === 0), framePaths.get(0) returns
undefined; the vid is silently dropped from activePayloads,
videoFrameInjector doesn't fire, syncVideoFrameVisibility hides
everything, and BeginFrame screenshots an empty composition.

Every other site in the codebase builds/consumes framePaths with
0-based keys:

  - engine/videoFrameExtractor.ts:317     framePaths.set(index, ...)
  - engine/extractionCache.ts:204          framePaths.set(idx, ...)
  - engine/videoFrameExtractor.test.ts:264/1066  framePaths.set(i, ...)
  - engine/videoFrameExtractor.ts:958 (consumer) Math.floor 0-based
  - producer/renderOrchestrator.test.ts:287/315/349  framePaths.get(0)

Only producer/distributed/renderChunk.ts:198 was 1-based, with a
stale comment claiming FrameLookupTable indexes frames 1-based —
which the surrounding evidence contradicts. This is why local tests
pass while distributed-lambda renders always had cold black at each
vid first paint.

Verified locally against a 3-vid composition and a single-vid 4-worker
case in a Lambda render fleet. Before fix: every vid first-paint frame
is PRISTINE Y=16 black. After fix: all frames are valid source content,
blackdetect reports zero black regions outside legitimate source video
content (intentional fade-ins / hard cuts in source mp4).
…ths contract

Regression guard for the off-by-one fix in HF#1730. The pre-fix code
indexed framePaths 1-based while the consumer (getFrameAtTime in
engine/videoFrameExtractor.ts:958) reads 0-based, dropping every
<video>'s first-paint frame in distributed chunk-lambda renders.

Asserts framePaths.get(0) resolves to the first extracted frame, and
framePaths.get(N-1) resolves to the last — pre-fix the keys were
shifted to [1..N], so get(0) returned undefined and get(N) resolved.
Verified to fail against the previous i+1 indexing.

Also exports rebuildExtractedFramesFromPlanDir (was module-local) so the
test can call it directly. Pure logic worth testing in isolation — the
bug only reproduces under distributed mode and the existing
renderChunk.test.ts already pays a multi-second Chrome smoke probe in
its module-level beforeAll, so the regression check lives in its own
file (rebuildExtractedFrames.test.ts) and runs Chrome-free in ~10ms.

The function's doc comment said "1-based framePaths" — updated to
"0-based" with a pointer to the consumer site and the bug context.

Per Miguel's REQUEST_CHANGES on HF#1730.

— Jerrai (https://claude.com/claude-code)
@Tzuhany Tzuhany force-pushed the fix/distributed-framepaths-offbyone branch from 3df8a8f to 15e6d13 Compare June 26, 2026 03:57
@Tzuhany

Tzuhany commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Done — force-pushed both commits re-signed (your authorship on the test commit is preserved, just the signature is mine since I rebased through them). Both show verified on the commit list now.

@jrusso1020 @miguel-heygen — the new workflow runs are sitting in action_required (fork PR safety gate). Could one of you click Approve to let the regression-shards run?

@jrusso1020 jrusso1020 merged commit 7db84fc into heygen-com:main Jun 26, 2026
44 checks passed
@miguel-heygen miguel-heygen mentioned this pull request Jun 26, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Distributed lambda render: 1-frame black flash at every <video> first-paint (off-by-one in renderChunk framePaths)

3 participants