Skip to content

fix(render): avoid empty WAAPI scans and llvmpipe auto GPU#1775

Merged
miguel-heygen merged 1 commit into
mainfrom
fix/1715-waapi-empty-scan-regression
Jun 28, 2026
Merged

fix(render): avoid empty WAAPI scans and llvmpipe auto GPU#1775
miguel-heygen merged 1 commit into
mainfrom
fix/1715-waapi-empty-scan-regression

Conversation

@miguel-heygen

@miguel-heygen miguel-heygen commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • fix the historical screenshot-path regression from v0.6.42 -> v0.6.43 by avoiding per-frame document.getAnimations() scans when WAAPI discovery found no animations
  • keep late-created WAAPI animations deterministic by hooking Element.prototype.animate and tracking created animations lazily, with cleanup/revert so transient animations do not permanently disable the empty-WAAPI fast path
  • remove the sibling CSS-adapter per-seek el.getAnimations() scan by caching discovered CSS animations per element
  • fix current browserGpuMode=auto misclassification of software WebGL (llvmpipe/SwiftShader/lavapipe/Microsoft Basic Render Driver/etc.) as hardware by inspecting WEBGL_debug_renderer_info

Fixes #1715.

RCA

The issue report compared a screenshot-fallback environment (Chromium 149 bundled: /usr/bin/chromium) where BeginFrame is unavailable, so render uses renderSeek + Page.captureScreenshot.

I normalized the attached #1715/#1653 artifact into a standalone GSAP fixture (/tmp/hf-1715/run-gsap) and bisected the forced screenshot path:

Version / mode Result
v0.6.42 forced screenshot ~11.9s
v0.6.43 forced screenshot ~21.2s
v0.6.44-v0.6.50 forced screenshot ~17-22s
v0.7.5 forced screenshot captureAvgMs=156

The bad boundary is PR #1069 (fix(core): correct WAAPI rediscovery seek baselines). Pre-#1069, the WAAPI adapter already called document.getAnimations() during each seek. #1069 added baseline bookkeeping around the same global scan, increasing the per-call cost enough that an empty-WAAPI GSAP composition regressed badly when renderSeek invoked the adapter once per frame. Disabling document.getAnimations in v0.6.43/v0.7.5 restored the old ~11-12s wall time, which isolated the root cost to the empty global WAAPI scan plus the new baseline work around it.

A sibling pattern existed in the CSS adapter: it called el.getAnimations() per tracked element on every seek/play/pause. This PR now caches those CSS animations during discover() instead of re-querying Chromium on each seek.

There is also a current default-path regression: browserGpuMode=auto only checked whether WebGL context creation succeeded. On CPU-only Linux this succeeds through llvmpipe, so auto chose hardware flags (--use-angle=gl-egl) even though the renderer was software:

{
  "vendor": "Google Inc. (Mesa/X.org)",
  "renderer": "ANGLE (Mesa/X.org, llvmpipe (LLVM 12.0.0 256 bits), OpenGL ES 3.2)"
}

On 0.7.17 forced screenshot:

Mode Capture / total
default browserGpuMode=auto before fix ~11.0s capture / ~13.2s total
explicit --no-browser-gpu ~7.8s capture / ~10.0s total
this PR default browserGpuMode=auto ~8.0s capture / ~10.1s total

Verification

  • bun --filter @hyperframes/core test -- src/runtime/adapters/waapi.test.ts src/runtime/adapters/css.test.ts
  • bun --filter @hyperframes/engine test -- src/services/browserManager.test.ts
  • bun --filter @hyperframes/core typecheck
  • bun --filter @hyperframes/engine typecheck
  • bun run build
  • pre-commit hook: format, lint, fallow, typecheck all green
  • Final local repro against commit 02c0c542:
    • PRODUCER_FORCE_SCREENSHOT=true, CHS 150, default browserGpuMode=auto, --fps 15 --workers 4
    • log: browserGpuMode auto → software (WebGL renderer vendor="Google Inc. (Mesa/X.org)" renderer="ANGLE (Mesa/X.org, llvmpipe ...)")
    • capture: 7993ms, total: 10087ms

Note

bun --filter @hyperframes/engine test full-suite fails on this workstation in unrelated VFR extractor tests because the local system ffmpeg lacks -fps_mode (Unrecognized option 'fps_mode'). The changed engine test file passes, and CI's ffmpeg image should cover the full suite.

@vanceingalls vanceingalls left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: LGTM with one caveat

HEAD: 02c0c54 · CI: all required green (regression × 8 shards, Player perf, Windows render, preview parity, CodeQL, Fallow, typecheck, lint, build, test). Prior reviewers: none at read-time.

The PR is a careful, two-pronged repair on a real regression. The bisect to #1069 is sound, the WAAPI snapshot-and-skip dodges the getAnimations() global scan tax convincingly, and the GPU auto-probe finally reads WEBGL_debug_renderer_info instead of trusting llvmpipe-backed context creation. CI signal is broad and positive. The note below is a documented gap, not a blocker.


Challenge 1 — "Any other root cause hidden in #1715 beyond these two mechanisms?"

I went hunting and turned up one strong candidate that the PR does not address, plus a second worth flagging.

Found: the CSS adapter walks el.getAnimations() per tracked element, per seek. See packages/core/src/runtime/adapters/css.ts seek() (lines 89–106 at HEAD 02c0c54): for every entry collected at discover() (any element whose computed style has a non-none animationName), each seek calls getAnimationsForElement(entry.el)el.getAnimations(). That is the per-element flavour of the same Chromium API whose global flavour Magi just exorcised. On a fixture with many CSS-animated elements but no WAAPI animations, you would expect a comparable per-frame tax on the screenshot path — just paid at the element level rather than document level. The reporter's fixture is a "no WAAPI animations" GSAP composition (likely no CSS animations either, hence Magi's specific 11.9 → 21.2s bisect), so this is not what surfaced in #1715, but it is exactly the same anti-pattern one layer down and would absolutely show up on a CSS-heavy template. Worth a tracking issue.

The 2.2s residual gap (Magi's 7.8s --no-browser-gpu vs 10.1s this-PR auto): the most likely culprit is the auto probe itself — probeHardwareWebGlInfo now launches a second headless Chromium, opens a page, runs page.evaluate to extract WEBGL_debug_renderer_info, and closes the browser. The cache deduplicates concurrent callers per process (good), but the cold-start probe cost is a real ~200ms-2s tax that the explicit --no-browser-gpu path skips entirely. Magi's own benchmarks suggest the cache amortizes this across 150 frames (8.0s capture vs 7.8s explicit-software is only 200ms), and the remaining 2.1s of the 10.1s total is plausibly fixed setup / shutdown / ffmpeg encode unrelated to the regression. Not a blocker.

Other per-seek scans audited and acquitted: GSAP, AnimeJS, Three, Lottie, Mapbox, D3 adapters all use locally-cached registries built at discover() rather than re-querying a global registry per seek. Only WAAPI (now fixed) and CSS (open) had the pathological pattern.

Challenge 2 — "Does the WAAPI lazy tracking miss a valid authoring pattern?"

Yes — three of them, though the practical impact is small.

The hook is installed exclusively on Element.prototype.animate (waapi.ts:69–86). The following WAAPI-creating paths bypass it entirely:

  1. new Animation(new KeyframeEffect(target, keyframes, timing)) + .play() — the direct-constructor path. Rare in authored code; common in animation libraries built on raw WAAPI primitives.
  2. CSS animation: declarations on elements added/mutated mid-render. These create CSSAnimation instances by the style engine without ever calling Element.prototype.animate. However, this case is mostly covered downstream: the CSS adapter's seek() walks el.getAnimations() per tracked entry and would catch them (see Challenge 1). The gap is narrow: an element added to the DOM after the CSS adapter's discover() whose computed style has a CSS animation — that element is in neither registry until the next discover() cycle (which fires on clip transitions but not mid-clip).
  3. CSS transition: declarations triggered by style mutations. Same shape as (2) — CSSTransition is style-engine-created, not via .animate(). The CSS adapter only tracks animationName, not transitions, so transitions on mid-render style mutations fall through entirely.

For (1), the documented "Best-effort only" comment in installAnimateHook is honest about the limit. For (2)/(3), the trade is acceptable on the headless-render path (compositions are largely pre-authored), but worth a // known gap: line and ideally a tracking issue. Not a blocker — the PR makes things strictly better than the per-seek scan for the common case, and the rare missed-animation case fails as "animation runs at wall-clock speed" rather than "frame is wrong" because the WAAPI animation's own time keeps advancing.

One additional dispatch-chain note: pause() now iterates the cached animations Set rather than document.getAnimations() (line 130). Pre-PR, pause() paused everything the browser had; post-PR it pauses only what the adapter has tracked. For an empty-discover composition that subsequently created animations via routes (1)/(2)/(3), the unpause-at-shutdown semantics change. Probably fine for the screenshot capture path (renderer tears down anyway), but worth a thought for the studio playback adapter that may also drive this.

Challenge 3 — "Should GPU auto-detection classify any other software renderers?"

The current list (isSoftwareWebGlRenderer, browserManager.ts:23–32) covers swiftshader, llvmpipe, softpipe, software rasterizer, plus the empty-string sentinel. Cross-referencing against real-world Chromium ANGLE renderer strings:

  • Microsoft Basic Render Driver (Windows software fallback when the GPU driver is unavailable / RDP / VM) — missing. Appears as ANGLE (Microsoft, Microsoft Basic Render Driver Direct3D11 vs_5_0 ps_5_0). Not caught by any current substring.
  • VMware SVGA 3D / vmware (VMware guest GPU passthrough; performs like software for most workloads but renders as hardware-ish) — debatable whether this should be classified software, but if the goal is "is this a real GPU", VMware SVGA on a non-3D-accelerated host is software in practice.
  • Mesa GLSL / Mesa Offscreen — the bare Mesa software-rendering paths that don't go through llvmpipe naming.
  • Parallels Display Adapter (Parallels Desktop guest) — similar to VMware case.
  • lavapipe (Vulkan software rasterizer via Mesa, when surfaced through ANGLE-Vulkan) — emerging, not yet ubiquitous in Chromium ANGLE strings but on the horizon.
  • virgl (virtio-gpu guest passthrough) — passes WebGL to host GPU; legitimately hardware-ish when host has a GPU. Skip.

Concrete recommendation: add microsoft basic render driver and mesa offscreen to the substring list. The first is a real Windows headless / RDP / disabled-GPU fallback and would silently land in the "hardware" bucket today, defeating the very point of the PR. The others are judgment calls.

Also worth noting: the probe defaults to software on any failure (probeAutoBrowserGpuMode catch-all on line ~289). That's the safer-by-default direction, but it means a transiently slow probe Chromium (timeout 30s default) silently misclassifies a real-GPU host as software. The cache is per-process, so the misclassification persists for the worker's lifetime. Logging captures this — fine.


Tiny nits (non-blocking)

  • Object.defineProperty(proto, "__hfOriginalAnimate", { value, configurable: true }) is descriptor-only and doesn't set writable — defaults to false. Re-running the installer in the same realm (e.g., studio hot-reload) would defineProperty over a non-writable slot, which the configurable: true saves — but the if (proto.__hfOriginalAnimate) return guard already short-circuits. Belt + suspenders fine.
  • The // best-effort only catch at waapi.ts:84 silently swallows install failure. Consider routing through swallow("runtime.adapters.waapi.installHook", err) to match the rest of the file's diagnostic discipline.
  • isSoftwareWebGlRenderer short-circuits on empty vendor and empty renderer; the more defensible rule is "renderer is empty" alone (vendor is often empty even on real hardware behind masking extensions). Minor — current logic is correct for the probe context where both come from the same getParameter source.

Summary table

Magi's challenge Verdict Evidence
Hidden root cause? One sibling pattern found CSS adapter calls el.getAnimations() per entry per seek (adapters/css.ts:99)
WAAPI lazy tracking gaps? Three real gaps; small practical impact new Animation(), CSS animations on mid-render elements, CSS transitions all bypass Element.prototype.animate
Other software renderers missing? Yes — Microsoft Basic Render Driver Real-world Windows headless / RDP / disabled-GPU fallback string

Ship it. File a follow-up for the CSS-adapter per-seek scan (same anti-pattern, deferred) and add microsoft basic render driver to the renderer list.

Review by Via

@miguel-heygen miguel-heygen force-pushed the fix/1715-waapi-empty-scan-regression branch from 02c0c54 to a900d5e Compare June 28, 2026 14:08

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: LGTM, layering on Via — three remaining additions

HEAD: a900d5e4 · Prior reviews: @vanceingalls posted "LGTM with one caveat" against 02c0c54 at 14:06 UTC; the author has since pushed a900d5e4 that addresses Via's Challenge 1 (CSS adapter sibling pattern → cached at discover) and Challenge 3 (added microsoft basic render driver, mesa offscreen, lavapipe to the software-renderer list). Verified at HEAD: css.ts:88 caches getAnimationsForElement(rawEl) per entry, and browserManager.ts:30-32 extends the substring set. CI green.

I did a blind RCA pass first (per James' instructions) and converged with Via on all three of his challenges. This review surfaces what the joint blind pass turned up that his didn't, focused on the gaps not yet addressed at HEAD.


1. Bisect framing in the PR body is misleading — worth a one-line correction

"It made the WAAPI adapter call document.getAnimations() on every seek."

Pre-#1069 seek at 8b776cb^:packages/core/src/runtime/adapters/waapi.ts:11:

seek: (ctx) => {
  if (!document.getAnimations) return;
  const timeMs = Math.max(0, (Number(ctx.time) || 0) * 1000);
  for (const animation of document.getAnimations()) { ... }
}

document.getAnimations() was already called once per seek pre-#1069. What PR #1069 actually added was the WeakMap<Animation, {compositionTimeMs, animationTimeMs}> + ensureBaseline + normalizeInitialAnimationTimeMs machinery wrapped around the same call (packages/core/src/runtime/adapters/waapi.ts:42-57 at #1069 merge). The empty-list global-scan tax existed pre-#1069; what regressed at v0.6.42 → v0.6.43 is the per-call cost growing (WeakMap traffic + ensureBaseline branching), not the call frequency. Bisect target is correct, RCA wording is off. Worth a one-line correction in the PR description so the next reader doesn't form the wrong mental model of where the per-frame tax actually lived.

2. animations: Set<Animation> never shrinks → soft leak + reactivation latch is one-shot

waapi.ts:8 declares const animations = new Set<Animation>(); and the only writer is trackAnimations (waapi.ts:59-64). No deletion path. Two compounding effects:

Soft leak. Strong-ref Set holds finished/cancelled animations forever; sibling baselines: WeakMap (waapi.ts:9-15) correctly allows GC but the Set defeats that. Headless render workers exit per-job so this doesn't matter for production rendering; studio / preview / long-lived sessions accumulate.

One-shot latch. Skip-scan condition is if (!didDiscover || animations.size > 0) (waapi.ts:106). Once any animation enters, animations.size > 0 forever (per the leak), so the per-frame snapshotAnimations() global-scan resumes for the rest of the render. A composition with one transient el.animate({...}, 100) followed by 2000 empty frames pays the full pre-PR per-frame document.getAnimations() tax on those 2000 frames. The fix is PREEMPT for "no WAAPI animations ever" compositions and NARROW for "transient WAAPI animations" compositions — the latter regresses back to the pre-PR cost shape mid-render.

Suggested fix at trackAnimations:

const trackAnimations = (items: Animation[], compositionTimeMs: number) => {
  for (const animation of items) {
    if (!animations.has(animation)) {
      try {
        animation.addEventListener("finish", () => animations.delete(animation), { once: true });
        animation.addEventListener("cancel", () => animations.delete(animation), { once: true });
      } catch { /* best-effort */ }
    }
    animations.add(animation);
    ensureBaseline(animation, compositionTimeMs);
  }
};

Animation extends EventTarget; both finish and cancel events are spec. With this, the latch toggles back when the Set drains and the skip-scan re-engages for the trailing empty frames.

Not a blocker (the #1715 fixture is the PREEMPT case and the fix nails it), but the perf claim should be scoped to "compositions with no WAAPI activity" rather than "all screenshot renders."

3. No revert() on the WAAPI adapter — Element.prototype.animate patch never unwinds

init.ts:2817-2825 iterates state.deterministicAdapters at runtime teardown and calls adapter.revert?.(). CSS, Three, Lottie, anime.js, typegpu all implement it. The WAAPI adapter (this PR) does not, despite installing a global Element.prototype.animate wrapper at waapi.ts:81-86.

In production render workers this doesn't matter — process exits. In studio / preview / hot-reloaded compositions, the prototype patch stays on Element.prototype.animate forever and the closure it captures keeps the entire prior adapter instance (Set, WeakMap, lastSeekTimeMs) alive across runtime tear-downs. The if (proto.__hfOriginalAnimate) return guard at waapi.ts:74 prevents re-wrapping, but the first adapter instance is now permanently retained.

Suggested shape:

revert: () => {
  if (animateHookInstalled && typeof Element !== "undefined") {
    const proto = Element.prototype as Element & {
      animate?: Element["animate"];
      __hfOriginalAnimate?: Element["animate"];
    };
    if (proto.__hfOriginalAnimate) {
      proto.animate = proto.__hfOriginalAnimate;
      delete proto.__hfOriginalAnimate;
    }
  }
  animations.clear();
  // baselines is WeakMap; GCs naturally once Animation strong refs drop
  animateHookInstalled = false;
  didDiscover = false;
},

Other notes

  • PR #1718 (Miga, also closes #1715) is open with a different RCA — GSAP proxy queue rAF drain + sequential init polls. Original reporter's hypothesis at #1653 actually aligns with #1718, not this PR. The two regressions look complementary (init-time + per-frame), not duplicative; worth coordinating merge order with @miga-heygen so neither closes #1715 prematurely. If both land, captureAvgMs should land closer to the 0.6.42 baseline than either fix alone.
  • No perf regression test for the empty-scan skip. The added unit test does not rescan document animations on every seek when discover found none (waapi.test.ts:159-167, plus the matching CSS test at css.test.ts:129-152 in a900d5e4) asserts the call-count contract once, which is good. But there's no Player perf shard that benchmarks captureAvgMs against an empty-WAAPI fixture — a future refactor of seek could silently regress this the same way #1069 did. Worth filing a follow-up to add a perf-shard guard.
  • CSS adapter cache-at-discover behavior change (a900d5e). Good fix for the per-element-per-seek scan. Subtle: if a CSS animation gets a NEW CSSAnimation instance later (e.g., style mutation that changes keyframes / timing), the cached Animation[] becomes stale. For deterministic seek of static-style compositions this is fine; for mid-render style-mutation the adapter was already broken (entries are only built at discover), so no regression — just worth flagging that "no mid-render style mutations" is now a slightly firmer assumption.
  • isSoftwareWebGlRenderer empty-string early-return is unreachable. At browserManager.ts:26 if (!info.vendor.trim() && !renderer) return true; is dead — the only caller (resolveWebGlProbeMode, browserManager.ts:256-259) routes !hasWebGL directly to "software" without calling isSoftwareWebGlRenderer. The probe always sets vendor/renderer non-empty when hasWebGL === true (lines 86-90). Cosmetic, not a bug — keep as defense in depth, but a comment would help.
  • GPU log line is observability gold. console.error("[hyperframes] browserGpuMode auto → ${resolved} (${reason})") at browserManager.ts:337 — exactly the signal a future on-call needs to bisect a similar regression. Consider promoting to a producer telemetry event so we can dashboard "% of auto-mode renders that resolved to software" across the fleet. Per Via's now-resolved Microsoft-Basic-Render-Driver gap, this metric would have caught the very class of bug this PR fixes if it had existed pre-#1069.

What I didn't verify

  • I did not run the bench-39 suite locally — relying on Magi's reported 13.2s → 10.1s number. CI Player-perf shard is green, but neither it nor the local bench has an empty-WAAPI-fixture explicit assertion, so a future regression has nothing to bisect against beyond wall-clock.
  • I did not exhaustively grep for new Animation(new KeyframeEffect(...)) callers in hyperframes-launches / customer compositions. Via's "rare in authored code" matches my read of the templates I've seen; the long tail of customer compositions is opaque.
  • Windows-specific GPU probe behavior — Windows render CI shard passed, so the runner classifies correctly even pre-a900d5e4. Now-extended substring list at HEAD should catch the Microsoft Basic Render Driver Windows-VM / RDP / disabled-GPU case as well.

Summary

Ship it. Verdict matches Via's. Two strictly-additive asks before merge if you want to close them now (else as follow-ups):

  1. (5-min) Add addEventListener("finish"/"cancel", ..., {once: true}) in trackAnimations so the Set drains and the skip-scan latch can re-engage on transient-WAAPI compositions.
  2. (5-min) Add a revert() that restores Element.prototype.animate and clears the Set.

The bisect-wording correction in the PR description is author-discretion. Nice work landing two real regressions on crisp before/after numbers, and on the iteration that just shipped to address Via's findings.

Review by Rames D Jusso

@miguel-heygen miguel-heygen force-pushed the fix/1715-waapi-empty-scan-regression branch from a900d5e to a92c7fc Compare June 28, 2026 14:19
@miguel-heygen miguel-heygen merged commit fc0f8c3 into main Jun 28, 2026
50 checks passed
@miguel-heygen miguel-heygen deleted the fix/1715-waapi-empty-scan-regression branch June 28, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

https://github.com/heygen-com/hyperframes/issues/1653 - still reproducible in 0.75.0

3 participants