fix(render): avoid empty WAAPI scans and llvmpipe auto GPU#1775
Conversation
vanceingalls
left a comment
There was a problem hiding this comment.
Verdict: LGTM with one caveat
HEAD: 02c0c54 · CI: all required green (regression × 8 shards, Player perf, Windows render, preview parity, CodeQL, Fallow, typecheck, lint, build, test). Prior reviewers: none at read-time.
The PR is a careful, two-pronged repair on a real regression. The bisect to #1069 is sound, the WAAPI snapshot-and-skip dodges the getAnimations() global scan tax convincingly, and the GPU auto-probe finally reads WEBGL_debug_renderer_info instead of trusting llvmpipe-backed context creation. CI signal is broad and positive. The note below is a documented gap, not a blocker.
Challenge 1 — "Any other root cause hidden in #1715 beyond these two mechanisms?"
I went hunting and turned up one strong candidate that the PR does not address, plus a second worth flagging.
Found: the CSS adapter walks el.getAnimations() per tracked element, per seek. See packages/core/src/runtime/adapters/css.ts seek() (lines 89–106 at HEAD 02c0c54): for every entry collected at discover() (any element whose computed style has a non-none animationName), each seek calls getAnimationsForElement(entry.el) → el.getAnimations(). That is the per-element flavour of the same Chromium API whose global flavour Magi just exorcised. On a fixture with many CSS-animated elements but no WAAPI animations, you would expect a comparable per-frame tax on the screenshot path — just paid at the element level rather than document level. The reporter's fixture is a "no WAAPI animations" GSAP composition (likely no CSS animations either, hence Magi's specific 11.9 → 21.2s bisect), so this is not what surfaced in #1715, but it is exactly the same anti-pattern one layer down and would absolutely show up on a CSS-heavy template. Worth a tracking issue.
The 2.2s residual gap (Magi's 7.8s --no-browser-gpu vs 10.1s this-PR auto): the most likely culprit is the auto probe itself — probeHardwareWebGlInfo now launches a second headless Chromium, opens a page, runs page.evaluate to extract WEBGL_debug_renderer_info, and closes the browser. The cache deduplicates concurrent callers per process (good), but the cold-start probe cost is a real ~200ms-2s tax that the explicit --no-browser-gpu path skips entirely. Magi's own benchmarks suggest the cache amortizes this across 150 frames (8.0s capture vs 7.8s explicit-software is only 200ms), and the remaining 2.1s of the 10.1s total is plausibly fixed setup / shutdown / ffmpeg encode unrelated to the regression. Not a blocker.
Other per-seek scans audited and acquitted: GSAP, AnimeJS, Three, Lottie, Mapbox, D3 adapters all use locally-cached registries built at discover() rather than re-querying a global registry per seek. Only WAAPI (now fixed) and CSS (open) had the pathological pattern.
Challenge 2 — "Does the WAAPI lazy tracking miss a valid authoring pattern?"
Yes — three of them, though the practical impact is small.
The hook is installed exclusively on Element.prototype.animate (waapi.ts:69–86). The following WAAPI-creating paths bypass it entirely:
new Animation(new KeyframeEffect(target, keyframes, timing))+.play()— the direct-constructor path. Rare in authored code; common in animation libraries built on raw WAAPI primitives.- CSS
animation:declarations on elements added/mutated mid-render. These createCSSAnimationinstances by the style engine without ever callingElement.prototype.animate. However, this case is mostly covered downstream: the CSS adapter'sseek()walksel.getAnimations()per tracked entry and would catch them (see Challenge 1). The gap is narrow: an element added to the DOM after the CSS adapter'sdiscover()whose computed style has a CSS animation — that element is in neither registry until the nextdiscover()cycle (which fires on clip transitions but not mid-clip). - CSS
transition:declarations triggered by style mutations. Same shape as (2) —CSSTransitionis style-engine-created, not via.animate(). The CSS adapter only tracksanimationName, not transitions, so transitions on mid-render style mutations fall through entirely.
For (1), the documented "Best-effort only" comment in installAnimateHook is honest about the limit. For (2)/(3), the trade is acceptable on the headless-render path (compositions are largely pre-authored), but worth a // known gap: line and ideally a tracking issue. Not a blocker — the PR makes things strictly better than the per-seek scan for the common case, and the rare missed-animation case fails as "animation runs at wall-clock speed" rather than "frame is wrong" because the WAAPI animation's own time keeps advancing.
One additional dispatch-chain note: pause() now iterates the cached animations Set rather than document.getAnimations() (line 130). Pre-PR, pause() paused everything the browser had; post-PR it pauses only what the adapter has tracked. For an empty-discover composition that subsequently created animations via routes (1)/(2)/(3), the unpause-at-shutdown semantics change. Probably fine for the screenshot capture path (renderer tears down anyway), but worth a thought for the studio playback adapter that may also drive this.
Challenge 3 — "Should GPU auto-detection classify any other software renderers?"
The current list (isSoftwareWebGlRenderer, browserManager.ts:23–32) covers swiftshader, llvmpipe, softpipe, software rasterizer, plus the empty-string sentinel. Cross-referencing against real-world Chromium ANGLE renderer strings:
Microsoft Basic Render Driver(Windows software fallback when the GPU driver is unavailable / RDP / VM) — missing. Appears asANGLE (Microsoft, Microsoft Basic Render Driver Direct3D11 vs_5_0 ps_5_0). Not caught by any current substring.VMware SVGA 3D/vmware(VMware guest GPU passthrough; performs like software for most workloads but renders as hardware-ish) — debatable whether this should be classified software, but if the goal is "is this a real GPU", VMware SVGA on a non-3D-accelerated host is software in practice.Mesa GLSL/Mesa Offscreen— the bare Mesa software-rendering paths that don't go through llvmpipe naming.Parallels Display Adapter(Parallels Desktop guest) — similar to VMware case.lavapipe(Vulkan software rasterizer via Mesa, when surfaced through ANGLE-Vulkan) — emerging, not yet ubiquitous in Chromium ANGLE strings but on the horizon.virgl(virtio-gpu guest passthrough) — passes WebGL to host GPU; legitimately hardware-ish when host has a GPU. Skip.
Concrete recommendation: add microsoft basic render driver and mesa offscreen to the substring list. The first is a real Windows headless / RDP / disabled-GPU fallback and would silently land in the "hardware" bucket today, defeating the very point of the PR. The others are judgment calls.
Also worth noting: the probe defaults to software on any failure (probeAutoBrowserGpuMode catch-all on line ~289). That's the safer-by-default direction, but it means a transiently slow probe Chromium (timeout 30s default) silently misclassifies a real-GPU host as software. The cache is per-process, so the misclassification persists for the worker's lifetime. Logging captures this — fine.
Tiny nits (non-blocking)
Object.defineProperty(proto, "__hfOriginalAnimate", { value, configurable: true })is descriptor-only and doesn't setwritable— defaults tofalse. Re-running the installer in the same realm (e.g., studio hot-reload) woulddefinePropertyover a non-writable slot, which theconfigurable: truesaves — but theif (proto.__hfOriginalAnimate) returnguard already short-circuits. Belt + suspenders fine.- The
// best-effort onlycatch atwaapi.ts:84silently swallows install failure. Consider routing throughswallow("runtime.adapters.waapi.installHook", err)to match the rest of the file's diagnostic discipline. isSoftwareWebGlRenderershort-circuits on empty vendor and empty renderer; the more defensible rule is "renderer is empty" alone (vendor is often empty even on real hardware behind masking extensions). Minor — current logic is correct for the probe context where both come from the samegetParametersource.
Summary table
| Magi's challenge | Verdict | Evidence |
|---|---|---|
| Hidden root cause? | One sibling pattern found | CSS adapter calls el.getAnimations() per entry per seek (adapters/css.ts:99) |
| WAAPI lazy tracking gaps? | Three real gaps; small practical impact | new Animation(), CSS animations on mid-render elements, CSS transitions all bypass Element.prototype.animate |
| Other software renderers missing? | Yes — Microsoft Basic Render Driver | Real-world Windows headless / RDP / disabled-GPU fallback string |
Ship it. File a follow-up for the CSS-adapter per-seek scan (same anti-pattern, deferred) and add microsoft basic render driver to the renderer list.
Review by Via
02c0c54 to
a900d5e
Compare
james-russo-rames-d-jusso
left a comment
There was a problem hiding this comment.
Verdict: LGTM, layering on Via — three remaining additions
HEAD: a900d5e4 · Prior reviews: @vanceingalls posted "LGTM with one caveat" against 02c0c54 at 14:06 UTC; the author has since pushed a900d5e4 that addresses Via's Challenge 1 (CSS adapter sibling pattern → cached at discover) and Challenge 3 (added microsoft basic render driver, mesa offscreen, lavapipe to the software-renderer list). Verified at HEAD: css.ts:88 caches getAnimationsForElement(rawEl) per entry, and browserManager.ts:30-32 extends the substring set. CI green.
I did a blind RCA pass first (per James' instructions) and converged with Via on all three of his challenges. This review surfaces what the joint blind pass turned up that his didn't, focused on the gaps not yet addressed at HEAD.
1. Bisect framing in the PR body is misleading — worth a one-line correction
"It made the WAAPI adapter call
document.getAnimations()on every seek."
Pre-#1069 seek at 8b776cb^:packages/core/src/runtime/adapters/waapi.ts:11:
seek: (ctx) => {
if (!document.getAnimations) return;
const timeMs = Math.max(0, (Number(ctx.time) || 0) * 1000);
for (const animation of document.getAnimations()) { ... }
}
document.getAnimations() was already called once per seek pre-#1069. What PR #1069 actually added was the WeakMap<Animation, {compositionTimeMs, animationTimeMs}> + ensureBaseline + normalizeInitialAnimationTimeMs machinery wrapped around the same call (packages/core/src/runtime/adapters/waapi.ts:42-57 at #1069 merge). The empty-list global-scan tax existed pre-#1069; what regressed at v0.6.42 → v0.6.43 is the per-call cost growing (WeakMap traffic + ensureBaseline branching), not the call frequency. Bisect target is correct, RCA wording is off. Worth a one-line correction in the PR description so the next reader doesn't form the wrong mental model of where the per-frame tax actually lived.
2. animations: Set<Animation> never shrinks → soft leak + reactivation latch is one-shot
waapi.ts:8 declares const animations = new Set<Animation>(); and the only writer is trackAnimations (waapi.ts:59-64). No deletion path. Two compounding effects:
Soft leak. Strong-ref Set holds finished/cancelled animations forever; sibling baselines: WeakMap (waapi.ts:9-15) correctly allows GC but the Set defeats that. Headless render workers exit per-job so this doesn't matter for production rendering; studio / preview / long-lived sessions accumulate.
One-shot latch. Skip-scan condition is if (!didDiscover || animations.size > 0) (waapi.ts:106). Once any animation enters, animations.size > 0 forever (per the leak), so the per-frame snapshotAnimations() global-scan resumes for the rest of the render. A composition with one transient el.animate({...}, 100) followed by 2000 empty frames pays the full pre-PR per-frame document.getAnimations() tax on those 2000 frames. The fix is PREEMPT for "no WAAPI animations ever" compositions and NARROW for "transient WAAPI animations" compositions — the latter regresses back to the pre-PR cost shape mid-render.
Suggested fix at trackAnimations:
const trackAnimations = (items: Animation[], compositionTimeMs: number) => {
for (const animation of items) {
if (!animations.has(animation)) {
try {
animation.addEventListener("finish", () => animations.delete(animation), { once: true });
animation.addEventListener("cancel", () => animations.delete(animation), { once: true });
} catch { /* best-effort */ }
}
animations.add(animation);
ensureBaseline(animation, compositionTimeMs);
}
};Animation extends EventTarget; both finish and cancel events are spec. With this, the latch toggles back when the Set drains and the skip-scan re-engages for the trailing empty frames.
Not a blocker (the #1715 fixture is the PREEMPT case and the fix nails it), but the perf claim should be scoped to "compositions with no WAAPI activity" rather than "all screenshot renders."
3. No revert() on the WAAPI adapter — Element.prototype.animate patch never unwinds
init.ts:2817-2825 iterates state.deterministicAdapters at runtime teardown and calls adapter.revert?.(). CSS, Three, Lottie, anime.js, typegpu all implement it. The WAAPI adapter (this PR) does not, despite installing a global Element.prototype.animate wrapper at waapi.ts:81-86.
In production render workers this doesn't matter — process exits. In studio / preview / hot-reloaded compositions, the prototype patch stays on Element.prototype.animate forever and the closure it captures keeps the entire prior adapter instance (Set, WeakMap, lastSeekTimeMs) alive across runtime tear-downs. The if (proto.__hfOriginalAnimate) return guard at waapi.ts:74 prevents re-wrapping, but the first adapter instance is now permanently retained.
Suggested shape:
revert: () => {
if (animateHookInstalled && typeof Element !== "undefined") {
const proto = Element.prototype as Element & {
animate?: Element["animate"];
__hfOriginalAnimate?: Element["animate"];
};
if (proto.__hfOriginalAnimate) {
proto.animate = proto.__hfOriginalAnimate;
delete proto.__hfOriginalAnimate;
}
}
animations.clear();
// baselines is WeakMap; GCs naturally once Animation strong refs drop
animateHookInstalled = false;
didDiscover = false;
},Other notes
- PR #1718 (Miga, also closes #1715) is open with a different RCA — GSAP proxy queue rAF drain + sequential init polls. Original reporter's hypothesis at #1653 actually aligns with #1718, not this PR. The two regressions look complementary (init-time + per-frame), not duplicative; worth coordinating merge order with @miga-heygen so neither closes #1715 prematurely. If both land, captureAvgMs should land closer to the 0.6.42 baseline than either fix alone.
- No perf regression test for the empty-scan skip. The added unit test
does not rescan document animations on every seek when discover found none(waapi.test.ts:159-167, plus the matching CSS test atcss.test.ts:129-152ina900d5e4) asserts the call-count contract once, which is good. But there's noPlayer perfshard that benchmarks captureAvgMs against an empty-WAAPI fixture — a future refactor ofseekcould silently regress this the same way #1069 did. Worth filing a follow-up to add a perf-shard guard. - CSS adapter cache-at-discover behavior change (a900d5e). Good fix for the per-element-per-seek scan. Subtle: if a CSS animation gets a NEW
CSSAnimationinstance later (e.g., style mutation that changes keyframes / timing), the cachedAnimation[]becomes stale. For deterministic seek of static-style compositions this is fine; for mid-render style-mutation the adapter was already broken (entries are only built at discover), so no regression — just worth flagging that "no mid-render style mutations" is now a slightly firmer assumption. isSoftwareWebGlRendererempty-string early-return is unreachable. AtbrowserManager.ts:26if (!info.vendor.trim() && !renderer) return true;is dead — the only caller (resolveWebGlProbeMode,browserManager.ts:256-259) routes!hasWebGLdirectly to"software"without callingisSoftwareWebGlRenderer. The probe always setsvendor/renderernon-empty whenhasWebGL === true(lines 86-90). Cosmetic, not a bug — keep as defense in depth, but a comment would help.- GPU log line is observability gold.
console.error("[hyperframes] browserGpuMode auto → ${resolved} (${reason})")atbrowserManager.ts:337— exactly the signal a future on-call needs to bisect a similar regression. Consider promoting to a producer telemetry event so we can dashboard "% of auto-mode renders that resolved to software" across the fleet. Per Via's now-resolved Microsoft-Basic-Render-Driver gap, this metric would have caught the very class of bug this PR fixes if it had existed pre-#1069.
What I didn't verify
- I did not run the bench-39 suite locally — relying on Magi's reported 13.2s → 10.1s number. CI Player-perf shard is green, but neither it nor the local bench has an empty-WAAPI-fixture explicit assertion, so a future regression has nothing to bisect against beyond wall-clock.
- I did not exhaustively grep for
new Animation(new KeyframeEffect(...))callers inhyperframes-launches/ customer compositions. Via's "rare in authored code" matches my read of the templates I've seen; the long tail of customer compositions is opaque. - Windows-specific GPU probe behavior — Windows render CI shard passed, so the runner classifies correctly even pre-
a900d5e4. Now-extended substring list at HEAD should catch theMicrosoft Basic Render DriverWindows-VM / RDP / disabled-GPU case as well.
Summary
Ship it. Verdict matches Via's. Two strictly-additive asks before merge if you want to close them now (else as follow-ups):
- (5-min) Add
addEventListener("finish"/"cancel", ..., {once: true})intrackAnimationsso the Set drains and the skip-scan latch can re-engage on transient-WAAPI compositions. - (5-min) Add a
revert()that restoresElement.prototype.animateand clears the Set.
The bisect-wording correction in the PR description is author-discretion. Nice work landing two real regressions on crisp before/after numbers, and on the iteration that just shipped to address Via's findings.
Review by Rames D Jusso
a900d5e to
a92c7fc
Compare
Summary
v0.6.42 -> v0.6.43by avoiding per-framedocument.getAnimations()scans when WAAPI discovery found no animationsElement.prototype.animateand tracking created animations lazily, with cleanup/revert so transient animations do not permanently disable the empty-WAAPI fast pathel.getAnimations()scan by caching discovered CSS animations per elementbrowserGpuMode=automisclassification of software WebGL (llvmpipe/SwiftShader/lavapipe/Microsoft Basic Render Driver/etc.) as hardware by inspectingWEBGL_debug_renderer_infoFixes #1715.
RCA
The issue report compared a screenshot-fallback environment (
Chromium 149 bundled: /usr/bin/chromium) where BeginFrame is unavailable, so render usesrenderSeek + Page.captureScreenshot.I normalized the attached #1715/#1653 artifact into a standalone GSAP fixture (
/tmp/hf-1715/run-gsap) and bisected the forced screenshot path:v0.6.42forced screenshotv0.6.43forced screenshotv0.6.44-v0.6.50forced screenshotv0.7.5forced screenshotcaptureAvgMs=156The bad boundary is PR #1069 (
fix(core): correct WAAPI rediscovery seek baselines). Pre-#1069, the WAAPI adapter already calleddocument.getAnimations()during each seek. #1069 added baseline bookkeeping around the same global scan, increasing the per-call cost enough that an empty-WAAPI GSAP composition regressed badly whenrenderSeekinvoked the adapter once per frame. Disablingdocument.getAnimationsinv0.6.43/v0.7.5restored the old ~11-12s wall time, which isolated the root cost to the empty global WAAPI scan plus the new baseline work around it.A sibling pattern existed in the CSS adapter: it called
el.getAnimations()per tracked element on every seek/play/pause. This PR now caches those CSS animations duringdiscover()instead of re-querying Chromium on each seek.There is also a current default-path regression:
browserGpuMode=autoonly checked whether WebGL context creation succeeded. On CPU-only Linux this succeeds through llvmpipe, so auto chose hardware flags (--use-angle=gl-egl) even though the renderer was software:{ "vendor": "Google Inc. (Mesa/X.org)", "renderer": "ANGLE (Mesa/X.org, llvmpipe (LLVM 12.0.0 256 bits), OpenGL ES 3.2)" }On
0.7.17forced screenshot:browserGpuMode=autobefore fix--no-browser-gpubrowserGpuMode=autoVerification
bun --filter @hyperframes/core test -- src/runtime/adapters/waapi.test.ts src/runtime/adapters/css.test.tsbun --filter @hyperframes/engine test -- src/services/browserManager.test.tsbun --filter @hyperframes/core typecheckbun --filter @hyperframes/engine typecheckbun run build02c0c542:PRODUCER_FORCE_SCREENSHOT=true, CHS 150, defaultbrowserGpuMode=auto,--fps 15 --workers 4browserGpuMode auto → software (WebGL renderer vendor="Google Inc. (Mesa/X.org)" renderer="ANGLE (Mesa/X.org, llvmpipe ...)")7993ms, total:10087msNote
bun --filter @hyperframes/engine testfull-suite fails on this workstation in unrelated VFR extractor tests because the local system ffmpeg lacks-fps_mode(Unrecognized option 'fps_mode'). The changed engine test file passes, and CI's ffmpeg image should cover the full suite.