fix: port simulation participant sync race#1775
fix: port simulation participant sync race#1775rosetta-livekit-bot[bot] wants to merge 1 commit into
Conversation
|
| if (this._textOnly) { | ||
| this.logger.info('text simulation: disabling STT/TTS/VAD and audio I/O'); | ||
| inputOptions = { ...inputOptions, audioEnabled: false }; | ||
| outputOptions = { ...outputOptions, audioEnabled: false }; | ||
| } |
There was a problem hiding this comment.
🔴 _textOnly simulation check always returns false in the auto-connect flow because room isn't connected yet
The _textOnly getter (agent_session.ts:537-540) calls ctx.simulationContext(), which scans room.remoteParticipants to find a simulator participant. Both call sites — start() at line 699 and _startImpl() at line 566 — execute before the room is connected. In the common auto-connect pattern (used in most examples, e.g. session.start({ agent, room: ctx.room }) then ctx.connect()), the room connects later via tasks.push(ctx.connect()) at agent_session.ts:616, but by that point the _textOnly checks have already passed with false. This means text simulation mode never activates: audio I/O is not disabled (lines 568–569) and audio recording is not disabled (line 700).
Trace of the auto-connect flow
start()callsthis._textOnlyat line 699 → room not connected →simulationContext()finds no participants → returnsundefined→_textOnly = falsestart()calls_startImpl()which checksthis._textOnlyat line 566 → same result_startImpl()creates RoomIO with unmodifiedinputOptions/outputOptionsat line 592–597- Auto-connect pushes
ctx.connect()at line 616, but the audio IO options were already applied
Prompt for agents
The _textOnly getter relies on simulationContext() which reads room.remoteParticipants, but the two call sites in start() (line 699) and _startImpl() (line 566) both execute before the room is connected in the common auto-connect flow. The auto-connect happens via tasks.push(ctx.connect()) at _startImpl:616, which runs after the _textOnly checks and RoomIO setup.
Possible approaches:
1. Move the _textOnly check in _startImpl to after `await ThrowsPromise.allSettled(tasks)` (line 636), then retroactively reconfigure RoomIO's audio settings. This is complex since RoomIO is already constructed.
2. In start(), if the room is not connected, defer the _textOnly/recording check until after _startImpl completes (which connects the room). However, this requires restructuring the start() flow.
3. Have simulationContext() accept a Room parameter or pre-resolve the simulation context during ctx.connect() so it's available immediately. The connect() method at job.ts:351 already registers the onSimulatorDisconnected handler, so it could also resolve the simulation context there and cache it.
4. The simplest fix: in _startImpl, move the _textOnly check to after the auto-connect task completes. Split the tasks into a connect phase and a post-connect phase, checking _textOnly between them.
Files: agents/src/voice/agent_session.ts (start method around line 699, _startImpl around line 566), agents/src/job.ts (simulationContext around line 221, connect around line 330).
Was this helpful? React with 👍 or 👎 to provide feedback.
| get simulationMode(): SimulationMode { | ||
| const mode = this.#dispatch.mode; | ||
| if (mode === SimulationMode.SIMULATION_MODE_AUDIO || mode === 'SIMULATION_MODE_AUDIO') { | ||
| return SimulationMode.SIMULATION_MODE_AUDIO; | ||
| } | ||
| if (mode === SimulationMode.SIMULATION_MODE_TEXT || mode === 'SIMULATION_MODE_TEXT') { | ||
| return SimulationMode.SIMULATION_MODE_TEXT; | ||
| } | ||
| // Simulations predating the mode field were text-only. | ||
| return SimulationMode.SIMULATION_MODE_TEXT; | ||
| } |
There was a problem hiding this comment.
🚩 SimulationMode defaults to TEXT for unspecified modes
At job.ts:128-129, when the mode field doesn't match SIMULATION_MODE_AUDIO or SIMULATION_MODE_TEXT (including when mode is undefined, 0/SIMULATION_MODE_UNSPECIFIED, or any unrecognized value), the getter defaults to SIMULATION_MODE_TEXT. This means SIMULATION_MODE_UNSPECIFIED (value 0) maps to TEXT mode rather than being treated as truly unspecified. The comment says "Simulations predating the mode field were text-only" which explains backward compatibility, but callers should be aware that SIMULATION_MODE_UNSPECIFIED triggers text-only behavior (audio I/O disabled) rather than being a no-op.
Was this helpful? React with 👍 or 👎 to provide feedback.
Ports livekit/agents#6069 to agents-js.\n\nSummary:\n- Resolve simulation dispatch from simulator participant attributes without job metadata fallback.\n- Register simulator disconnect handling unconditionally after connect to avoid participant-list sync races.\n- Disable audio recording for text simulation sessions.\n\nVerification:\n- pnpm build:agents\n- pnpm test agents/src/job.test.ts agents/src/voice/agent_session.test.ts\n\nNo tests added, matching the source PR.
Ported from livekit/agents#6069
Original PR description
No description.