Skip to content

fix: port simulation participant sync race#1775

Open
rosetta-livekit-bot[bot] wants to merge 1 commit into
1.5.0from
pipit-newborn-steel
Open

fix: port simulation participant sync race#1775
rosetta-livekit-bot[bot] wants to merge 1 commit into
1.5.0from
pipit-newborn-steel

Conversation

@rosetta-livekit-bot

@rosetta-livekit-bot rosetta-livekit-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Ports livekit/agents#6069 to agents-js.\n\nSummary:\n- Resolve simulation dispatch from simulator participant attributes without job metadata fallback.\n- Register simulator disconnect handling unconditionally after connect to avoid participant-list sync races.\n- Disable audio recording for text simulation sessions.\n\nVerification:\n- pnpm build:agents\n- pnpm test agents/src/job.test.ts agents/src/voice/agent_session.test.ts\n\nNo tests added, matching the source PR.


Ported from livekit/agents#6069

Original PR description

No description.

@changeset-bot

changeset-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: 525539d

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@rosetta-livekit-bot rosetta-livekit-bot Bot requested a review from theomonnom June 12, 2026 03:31

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

Open in Devin Review

Comment on lines +566 to +570
if (this._textOnly) {
this.logger.info('text simulation: disabling STT/TTS/VAD and audio I/O');
inputOptions = { ...inputOptions, audioEnabled: false };
outputOptions = { ...outputOptions, audioEnabled: false };
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 _textOnly simulation check always returns false in the auto-connect flow because room isn't connected yet

The _textOnly getter (agent_session.ts:537-540) calls ctx.simulationContext(), which scans room.remoteParticipants to find a simulator participant. Both call sites — start() at line 699 and _startImpl() at line 566 — execute before the room is connected. In the common auto-connect pattern (used in most examples, e.g. session.start({ agent, room: ctx.room }) then ctx.connect()), the room connects later via tasks.push(ctx.connect()) at agent_session.ts:616, but by that point the _textOnly checks have already passed with false. This means text simulation mode never activates: audio I/O is not disabled (lines 568–569) and audio recording is not disabled (line 700).

Trace of the auto-connect flow
  1. start() calls this._textOnly at line 699 → room not connected → simulationContext() finds no participants → returns undefined_textOnly = false
  2. start() calls _startImpl() which checks this._textOnly at line 566 → same result
  3. _startImpl() creates RoomIO with unmodified inputOptions/outputOptions at line 592–597
  4. Auto-connect pushes ctx.connect() at line 616, but the audio IO options were already applied
Prompt for agents
The _textOnly getter relies on simulationContext() which reads room.remoteParticipants, but the two call sites in start() (line 699) and _startImpl() (line 566) both execute before the room is connected in the common auto-connect flow. The auto-connect happens via tasks.push(ctx.connect()) at _startImpl:616, which runs after the _textOnly checks and RoomIO setup.

Possible approaches:
1. Move the _textOnly check in _startImpl to after `await ThrowsPromise.allSettled(tasks)` (line 636), then retroactively reconfigure RoomIO's audio settings. This is complex since RoomIO is already constructed.
2. In start(), if the room is not connected, defer the _textOnly/recording check until after _startImpl completes (which connects the room). However, this requires restructuring the start() flow.
3. Have simulationContext() accept a Room parameter or pre-resolve the simulation context during ctx.connect() so it's available immediately. The connect() method at job.ts:351 already registers the onSimulatorDisconnected handler, so it could also resolve the simulation context there and cache it.
4. The simplest fix: in _startImpl, move the _textOnly check to after the auto-connect task completes. Split the tasks into a connect phase and a post-connect phase, checking _textOnly between them.

Files: agents/src/voice/agent_session.ts (start method around line 699, _startImpl around line 566), agents/src/job.ts (simulationContext around line 221, connect around line 330).
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread agents/src/job.ts
Comment on lines +120 to +130
get simulationMode(): SimulationMode {
const mode = this.#dispatch.mode;
if (mode === SimulationMode.SIMULATION_MODE_AUDIO || mode === 'SIMULATION_MODE_AUDIO') {
return SimulationMode.SIMULATION_MODE_AUDIO;
}
if (mode === SimulationMode.SIMULATION_MODE_TEXT || mode === 'SIMULATION_MODE_TEXT') {
return SimulationMode.SIMULATION_MODE_TEXT;
}
// Simulations predating the mode field were text-only.
return SimulationMode.SIMULATION_MODE_TEXT;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 SimulationMode defaults to TEXT for unspecified modes

At job.ts:128-129, when the mode field doesn't match SIMULATION_MODE_AUDIO or SIMULATION_MODE_TEXT (including when mode is undefined, 0/SIMULATION_MODE_UNSPECIFIED, or any unrecognized value), the getter defaults to SIMULATION_MODE_TEXT. This means SIMULATION_MODE_UNSPECIFIED (value 0) maps to TEXT mode rather than being treated as truly unspecified. The comment says "Simulations predating the mode field were text-only" which explains backward compatibility, but callers should be aware that SIMULATION_MODE_UNSPECIFIED triggers text-only behavior (audio I/O disabled) rather than being a no-op.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants