feat(core): duck() programmatic audio ducking (re-submit of #1337)#1747
feat(core): duck() programmatic audio ducking (re-submit of #1337)#1747mvanhorn wants to merge 2 commits into
Conversation
…yframes Adds a duck(musicTrack, voiceTracks, options) helper that lowers a music track whenever any voice track is audible, emitting deterministic compile-time volume keyframes into the same envelope used by hand-authored fades. Replaces the previous data-duck/data-duck-fade/data-role HTML attribute approach (heygen-com#1337) with a JS API per maintainer guidance — no new HTML schema attributes. amount accepts linear gain or dB (default -12dB); fade ramp defaults to 0.3s; an optional GSAP-compatible timeline receives the ramps.
miguel-heygen
left a comment
There was a problem hiding this comment.
I agree with the overall direction here: keep ducking framework-native by writing ordinary media .volume ramps onto the captured timeline, not by adding another public data-duck schema path. duck() is the right ergonomics layer over the existing volume-automation primitive.
blocker: preview and render disagree for ducked/probed volume automation when the music track starts at a non-zero data-start. The new helper computes/writes keyframes in composition time: element timing is resolved from data-start (packages/core/src/runtime/audioDucking.ts:118), the duck points are added at absolute overlap/ramp times (audioDucking.ts:208-214), and timeline writes use those same absolute first.time / previous.time values (audioDucking.ts:253-265). The existing probe path also samples absolute composition time (packages/core/src/runtime/mediaVolumeEnvelope.ts:90-119). Render handles that correctly by normalizing absolute keyframes with the track start before baking (packages/engine/src/services/audioVolumeEnvelope.ts:83-90). Preview does not: init.ts attaches the raw probed keyframes directly to the runtime clip (packages/core/src/runtime/init.ts:1652-1657), then syncRuntimeMedia computes a track-relative relTime (packages/core/src/runtime/media.ts:169) and interpolates the raw keyframes against that relative value (media.ts:192-197).
Concrete failure: if music has data-start="5" and ducking produces keyframes around 5.75/6/7/7.25, render subtracts 5 and ducks during track-relative 0.75-2.25. Preview queries those raw keyframes at relTime 0.75-2.25, so it stays on the first envelope segment and misses the duck. The tests only cover start=0 (packages/core/src/runtime/audioDucking.test.ts:124-133 normalizes with trackStart 0; the DOM media case uses music.dataset.start = "0" at audioDucking.test.ts:136-155), so this parity break is unpinned.
Please fix the preview path to use the same time basis as render and add a regression with non-zero music data-start (either normalize keyframes before attaching them to runtime clips, or interpolate with composition time, but keep preview/render semantics identical).
Non-blocking API note: the no-new-attribute choice is correct. I would just clarify in docs/API that plain timing objects are for computing keyframes; DOM media elements + the captured timeline are the render-affecting path. A future split into computeDuckKeyframes(...) and duck(...timeline...) would make that even clearer, but it should not block once the time-basis bug is fixed.
Verdict: REQUEST CHANGES
Reasoning: The helper rides the right native framework primitive, but it exposes a real preview/render parity bug for non-zero-start audio tracks and needs a targeted fix/test before merge.
— Magi
Probed volume keyframes carry absolute composition-time stamps, and the renderer normalises them by the track start before baking, so its envelope lookup is shift-invariant in composition time. The preview path in syncRuntimeMedia interpolated those absolute keyframes at the track-relative relTime (which also folds in mediaStart/playbackRate), so for any track with a non-zero data-start it sampled the wrong window and missed the duck entirely. Look the envelope up at absolute composition time instead, matching the renderer. Adds syncRuntimeMedia regression tests pinning preview/render parity for a non-zero data-start track, and clarifies in the duck() docs which inputs are render-affecting. Addresses review feedback on heygen-com#1747.
|
Thanks for the precise trace — that's exactly it. Fixed in The probe stamps keyframes in absolute composition time, and the renderer normalises them by the track start before baking, so its envelope lookup is shift-invariant in composition time. The preview path was interpolating those absolute keyframes at the track-relative Added On the non-blocking note: added a docs line clarifying the DOM elements + captured timeline are the render-affecting path, while plain timing objects / the returned array just compute keyframes. The |
Re-submit of #1337 along the lines you suggested: instead of new
data-duck/data-duck-fade/data-roleHTML attributes, this adds a programmaticduck(musicTrack, voiceTracks, options)helper that writes deterministic compile-time volume keyframes into the same envelope used by hand-authored fades. No new HTML schema attributes;htmlCompiler/audioElementParser/the schema docs are untouched.amountaccepts linear gain or dB (default-12dB),faderamp defaults to 0.3s, optional GSAP-compatibletimelinereceives the ramps.packages/core/src/runtime/audioDucking.ts+ tests; exported from@hyperframes/core.Happy to attach a rendered music-under-voiceover demo if useful.