Skip to content

fix(avatar): preserve audio wrappers across avatar hot-swaps#5863

Open
longcw wants to merge 13 commits into
mainfrom
longc/preserve-audio-wrappers-on-avatar-swap
Open

fix(avatar): preserve audio wrappers across avatar hot-swaps#5863
longcw wants to merge 13 commits into
mainfrom
longc/preserve-audio-wrappers-on-avatar-swap

Conversation

@longcw

@longcw longcw commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Avatar plugins set session.output.audio = DataStreamAudioOutput(...) on every start. On the first start this works because AgentSession.start() wraps the sink with the TranscriptSynchronizer / RecorderAudioOutput chain afterwards; on a mid-session rebind (avatar switch) the raw assignment blows the chain away, silently breaking transcription sync and recording.

Fix it by auto-inserting an _AudioSinkProxy at the bottom of any wrapper chain. Wrappers cache the proxy, the proxy holds the swappable leaf — so hot-swaps preserve the wrappers above. New AgentOutput.swap_audio_endpoint(sink) walks the chain to the proxy and swaps its downstream in place, leaving the wrappers attached; full replacement stays as output.audio = sink.

Plugin migration

All 13 avatar plugins migrated to swap_audio_endpoint(...): anam, avatario, avatartalk, bey, bithuman, did, keyframe, lemonslice, liveavatar, runway, simli, tavus, trugen.

Example

examples/avatar_agents/audio_wave now demonstrates hot-swapping through a swap_avatar RPC method: it tears down the current avatar (removing it from the room) and launches a fresh one under the same identity, while the audio wrappers and the listeners attached to session.output.audio survive the swap.

Related

Clean avatar-worker shutdown on swap depends on livekit/python-sdks#699.

close #4198

longcw added 6 commits May 27, 2026 14:56
AvatarSession.start() rebinds session.output.audio to a fresh
DataStreamAudioOutput. On the first call the wrapper chain (Recorder,
TranscriptSynchronizer) wraps it correctly, but a re-bind during a
mid-session avatar switch overwrites the synchronizer-wrapped output
with a raw sink, breaking audio/transcription sync and recording.

Introduce _AudioSinkProxy, a transparent proxy auto-inserted at the
bottom of any wrapper chain. Wrappers cache the proxy (not the leaf),
so the leaf can be hot-swapped via the proxy without invalidating
upstream references. When the proxy has no inner sink, flush()
synthesizes a playback_finished so upstream wrappers don't hang.

Add AgentOutput.set_audio_sink(sink, *, preserve_wrappers=False).
With preserve_wrappers=True, walks the chain to find the proxy and
swaps its downstream; otherwise behaves as the existing audio setter.
Avatar plugins migrate to this API; AvatarSession.aclose() detaches
the sink so the chain stays intact across aclose -> restart.

Drops the "may be replaced by the avatar" warning in AvatarSession.start
since the proxy makes mid-session rebinding correct by construction.
…ers=True)

Route every avatar plugin's audio sink binding through the new
AgentOutput.set_audio_sink API so mid-session hot-swaps (e.g. avatar
switches) preserve the TranscriptSynchronizer / RecorderAudioOutput
wrapper chain.

Plugins migrated: anam, avatario, avatartalk, bey, bithuman, did,
keyframe, liveavatar, runway, simli, tavus, trugen.
Covers:
- auto-wrap inserts the proxy between a wrapper and a bare leaf
- auto-wrap skipped when the downstream is already a proxy or a non-leaf
- set_audio_sink default replaces the chain
- set_audio_sink with preserve_wrappers swaps the proxy's inner in place
- preserve_wrappers fallback when no proxy exists in the chain
- proxy rejects a wrapper chain as inner (set_next_in_chain assert)
- detached proxy synthesizes playback_finished on flush
- swap routes new-leaf playback events to upstream listeners
- swap disconnects the old leaf from the chain
- on_attached/on_detached propagate to current inner and across swaps
Drop the leaf-only assertion in _AudioSinkProxy.set_next_in_chain — the
base AudioOutput machinery cascades capture/flush and bubbles playback
events through any chain, so the proxy can hold either a leaf or a
wrapper chain without breaking the contract upstream.
The base class doesn't track which sink the avatar set, so nulling
session.output.audio unconditionally could clobber a sink owned by
someone else. The wrapper chain stays intact across hot-swaps anyway
because the proxy preserves the wrappers regardless of what's in its
downstream slot, so leaving the sink in place until it's replaced or
the session tears down is fine.
@chenghao-mou chenghao-mou requested a review from a team May 27, 2026 07:27
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Comment thread livekit-agents/livekit/agents/voice/io.py Outdated
@longcw longcw requested a review from theomonnom June 3, 2026 00:54
longcw added 3 commits June 3, 2026 21:36
Drop the preserve_wrappers flag: the wrapper-preserving leaf swap is now its
own method, and full replacement stays as output.audio = sink.
devin-ai-integration[bot]

This comment was marked as resolved.

The detached no-op mode of _AudioSinkProxy synthesized a synchronous
playback_finished during flush(), which re-entered
_SyncedAudioOutput.on_playback_finished and caused a double
rotate_segment while skipping end_audio_input. Nothing passes None in
practice, so require a real sink instead of fixing the re-entrancy.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

Open in Devin Review

Comment thread livekit-agents/livekit/agents/voice/io.py
Comment on lines +553 to +557
@property
def sample_rate(self) -> int | None:
if self._sample_rate is not None:
return self._sample_rate
return self.next_in_chain.sample_rate if self.next_in_chain else None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 sample_rate is now dynamic instead of fixed at construction time

Both _SyncedAudioOutput (synchronizer.py:553-557) and RecorderAudioOutput (recorder_io.py:366-370) changed from setting sample_rate at construction time (passing it to super().__init__) to computing it dynamically via a property that delegates to self.next_in_chain.sample_rate. This is intentional for the hot-swap use case — after swapping the leaf sink, the sample rate should reflect the new sink's requirements. However, in generation.py:418-428, the resampler is created lazily on the first frame and never recreated. If the sample rate changes after the first frame (e.g., from a hot-swap), the resampler won't be updated. This is a pre-existing limitation, not introduced by this PR, but it becomes more relevant now that hot-swapping is supported.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@longcw longcw force-pushed the longc/preserve-audio-wrappers-on-avatar-swap branch from 2900ae1 to b62cd5f Compare June 11, 2026 12:12

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

A swap while a flushed segment was still playing out removed the old
sink's listeners before its playback_finished arrived, leaving the
playback accounting unbalanced and wait_for_playout() hanging. Now the
proxy clears the old sink's buffer and reports the orphaned segment as
interrupted; a segment still being captured continues on the new sink,
which reports it on its own.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment thread livekit-agents/livekit/agents/voice/io.py
else:
self._audio_sink.on_detached()

def swap_audio_endpoint(self, sink: AudioOutput) -> None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def swap_audio_endpoint(self, sink: AudioOutput) -> None:
def replace_audio_leadl(self, sink: AudioOutput) -> None:

not a fan of the name

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be something like replace_audio_sink?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lead" reads as the head of the chain, but the method replaces the tail, how about

  • replace_audio_endpoint
  • replace_audio_destination
  • redirect_audio
  • replace_audio_tail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VideoAvatar handling does not allow per-agent session customization

3 participants