Skip to content

fix(sdk): prevent duplicate usage_id error when registering LLMs#3378

Closed
tofarr wants to merge 4 commits into
mainfrom
fix/duplicate-usage-id-registry-error
Closed

fix(sdk): prevent duplicate usage_id error when registering LLMs#3378
tofarr wants to merge 4 commits into
mainfrom
fix/duplicate-usage-id-registry-error

Conversation

@tofarr
Copy link
Copy Markdown
Collaborator

@tofarr tofarr commented May 24, 2026

Summary

Fixes a bug where conversations would fail to start with:

ValueError: Usage ID 'default' already exists in registry. Use a different usage_id on the LLM or call get() to retrieve the existing LLM.

Root Cause

The error occurred in LocalConversation._ensure_agent_ready() when:

  1. An agent has multiple LLM objects (e.g., one for the main agent, one for a condenser)
  2. Both LLM objects have the same usage_id (the default is "default")
  3. When registering LLMs, the code only checked against the initially registered IDs, not tracking IDs added during the loop

Example Triggering Scenario

# Two separate LLM objects with the same default usage_id
agent_llm = LLM(model='gpt-4o')      # usage_id='default'
condenser_llm = LLM(model='gpt-4o')  # usage_id='default' (different object)

condenser = LLMSummarizingCondenser(llm=condenser_llm, max_size=100)
agent = Agent(llm=agent_llm, condenser=condenser)

# This would fail when starting a conversation
convo = Conversation(agent=agent, workspace='/tmp')
convo.send_message('Hello')  # ValueError!

Fix

A one-line fix in openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py:

 # Register LLMs in the registry (still holding lock)
 self.llm_registry.subscribe(self._state.stats.register_llm)
 registered = set(self.llm_registry.list_usage_ids())
 for llm in list(self.agent.get_all_llms()):
     if llm.usage_id not in registered:
         self.llm_registry.add(llm)
+        registered.add(llm.usage_id)  # Track newly added IDs

This ensures that when multiple LLM objects have the same usage_id, only the first one is registered and subsequent duplicates are skipped (rather than causing an error).

Test Plan

  • Added a regression test test_conversation_handles_duplicate_usage_id_llms that verifies the fix
  • All existing LLM registry tests pass

This pull request was created by an AI agent (OpenHands) on behalf of the user.

@tofarr can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:497d439-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-497d439-python \
  ghcr.io/openhands/agent-server:497d439-python

All tags pushed for this build

ghcr.io/openhands/agent-server:497d439-golang-amd64
ghcr.io/openhands/agent-server:497d439b3fc65ba443253f8224e6c841f0903129-golang-amd64
ghcr.io/openhands/agent-server:fix-duplicate-usage-id-registry-error-golang-amd64
ghcr.io/openhands/agent-server:497d439-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:497d439-golang-arm64
ghcr.io/openhands/agent-server:497d439b3fc65ba443253f8224e6c841f0903129-golang-arm64
ghcr.io/openhands/agent-server:fix-duplicate-usage-id-registry-error-golang-arm64
ghcr.io/openhands/agent-server:497d439-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:497d439-java-amd64
ghcr.io/openhands/agent-server:497d439b3fc65ba443253f8224e6c841f0903129-java-amd64
ghcr.io/openhands/agent-server:fix-duplicate-usage-id-registry-error-java-amd64
ghcr.io/openhands/agent-server:497d439-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:497d439-java-arm64
ghcr.io/openhands/agent-server:497d439b3fc65ba443253f8224e6c841f0903129-java-arm64
ghcr.io/openhands/agent-server:fix-duplicate-usage-id-registry-error-java-arm64
ghcr.io/openhands/agent-server:497d439-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:497d439-python-amd64
ghcr.io/openhands/agent-server:497d439b3fc65ba443253f8224e6c841f0903129-python-amd64
ghcr.io/openhands/agent-server:fix-duplicate-usage-id-registry-error-python-amd64
ghcr.io/openhands/agent-server:497d439-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:497d439-python-arm64
ghcr.io/openhands/agent-server:497d439b3fc65ba443253f8224e6c841f0903129-python-arm64
ghcr.io/openhands/agent-server:fix-duplicate-usage-id-registry-error-python-arm64
ghcr.io/openhands/agent-server:497d439-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:497d439-golang
ghcr.io/openhands/agent-server:497d439b3fc65ba443253f8224e6c841f0903129-golang
ghcr.io/openhands/agent-server:fix-duplicate-usage-id-registry-error-golang
ghcr.io/openhands/agent-server:497d439-golang_tag_1.21-bookworm
ghcr.io/openhands/agent-server:497d439-java
ghcr.io/openhands/agent-server:497d439b3fc65ba443253f8224e6c841f0903129-java
ghcr.io/openhands/agent-server:fix-duplicate-usage-id-registry-error-java
ghcr.io/openhands/agent-server:497d439-eclipse-temurin_tag_17-jdk
ghcr.io/openhands/agent-server:497d439-python
ghcr.io/openhands/agent-server:497d439b3fc65ba443253f8224e6c841f0903129-python
ghcr.io/openhands/agent-server:fix-duplicate-usage-id-registry-error-python
ghcr.io/openhands/agent-server:497d439-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim

About Multi-Architecture Support

  • Each variant tag (e.g., 497d439-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 497d439-python-amd64) are also available if needed

When an agent has multiple LLM objects with the same usage_id (e.g., both
the agent and its condenser using separate LLM instances with the default
usage_id='default'), _ensure_agent_ready() would fail with:

  ValueError: Usage ID 'default' already exists in registry

The fix tracks newly-added usage_ids during the LLM registration loop,
so subsequent LLMs with duplicate IDs are skipped rather than causing
an error.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   base.py3153788%60, 64, 73, 76, 79, 84, 234, 265, 291, 295, 299–300, 362, 458, 526–528, 577–579, 606, 616, 624–625, 667, 669–670, 742–744, 746–748, 784–785, 795–796
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py5334491%308, 313, 457, 503, 540, 556, 621, 851–852, 855, 968, 979–982, 989–990, 993, 999–1000, 1003, 1009, 1024, 1027, 1031–1032, 1036–1038, 1045, 1131, 1136, 1246, 1248, 1252–1253, 1264–1265, 1290, 1485, 1489, 1559, 1566–1567
TOTAL27833825670% 

openhands-agent and others added 3 commits May 24, 2026 17:05
Instead of silently skipping duplicate LLMs during registry registration
(which could cause confusion when LLMs have different settings), we now
validate at Agent construction time that all reachable LLMs have unique
usage_id values.

This provides a clear error message guiding users to set distinct usage_id
values, e.g.:
  LLM(model='gpt-4o', usage_id='agent')
  LLM(model='gpt-4o-mini', usage_id='condenser')

The duplicate usage_id issue commonly occurs when an agent is deserialized
from JSON, as Pydantic creates separate LLM objects for the agent and
condenser even if they have the same configuration.

Changes:
- Add _validate_unique_llm_usage_ids validator to AgentBase
- Add warning log in _ensure_agent_ready as defense-in-depth
- Update tests to verify error at construction time

Co-authored-by: openhands <openhands@all-hands.dev>
The original validation was too strict - it rejected any agent with multiple
LLMs sharing a usage_id, even when they had identical configurations.

This broke the common pattern of sharing an LLM object between the agent and
its condenser, because when the agent is serialized and restored from JSON,
Pydantic creates separate LLM objects (even though they're identical).

The new validation only raises an error when LLMs have the same usage_id but
*different* configurations, which is the actual problematic case that would
lead to confusing behavior.

Cases that now work:
1. Sharing the same LLM object (deduped by object identity)
2. Deserialized agents with identical LLM configs (configs match)

Cases that still error (with helpful message):
3. Different LLMs with same usage_id but different models/settings

Co-authored-by: openhands <openhands@all-hands.dev>
@tofarr tofarr marked this pull request as ready for review May 25, 2026 13:26
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable, but one important metrics-accounting issue needs attention before this is safe to merge.

Risk Assessment: 🟡 MEDIUM — touches LLM registration and conversation stats for multi-LLM agents.

Automated review by OpenHands on behalf of the requester.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26402871778

f"'{llm.usage_id}' (model={llm.model}). Consider using "
f"distinct usage_id values for each LLM."
)
continue
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Skipping the duplicate here leaves that LLM instance disconnected from ConversationStats. The new validator explicitly allows the deserialized/identical-config case where two separate LLM objects share a usage_id; only the first object gets registered, so if the condenser calls the second object its cost/token metrics won't be reflected in state.stats. Please either share/restore the existing metrics onto the duplicate, or notify register_llm for the duplicate before continuing, and add a regression assertion that duplicate identical LLM objects contribute to the tracked metrics.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The duplicate usage_id='default' conversation-start crash was reproduced on main and no longer occurs on PR head when exercising the SDK as a user would.

Does this PR achieve its stated goal?

Yes. The stated goal was to prevent conversations from failing when an agent has multiple LLM objects with the same usage_id; on origin/main, Conversation.send_message() failed with ValueError: Usage ID 'default' already exists in registry, while on PR commit 497d439b the same scenario succeeded and registered ['default']. I also verified related behavior: distinct usage_ids still register separately, and conflicting same-usage_id configs now fail early with a clear validation message.

Phase Result
Environment Setup make build completed successfully; dependencies installed via uv sync --dev.
CI Status 🟡 Snapshot showed 33 successful checks, 1 in-progress qa-changes check, and skipped cleanup/review-artifact checks; merge state BLOCKED only because a check was still pending.
Functional Verification ✅ Reproduced the base failure and verified the PR fixes it through real SDK Agent/Conversation calls.
Functional Verification

Test 1: Duplicate default usage_id no longer crashes conversation startup

Step 1 — Reproduce baseline without the fix:
Ran a short SDK script that creates two separate LLM(model="gpt-4o", usage_id="default") objects, uses one on the agent and one on an LLMSummarizingCondenser, then creates a Conversation and calls send_message("Hello from QA").

Command:

git switch --detach origin/main
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_duplicate_usage_id.py 2>&1   | grep -E 'duplicate-default|distinct-usage-ids|ValueError|Traceback|Usage ID'

Observed output:

=== BASELINE: origin/main ===
2aa5256e
Traceback (most recent call last):
    raise ValueError(message)
ValueError: Usage ID 'default' already exists in registry. Use a different usage_id on the LLM or call get() to retrieve the existing LLM.
BASE_EXIT=1

This confirms the reported bug exists on the base branch: send_message() fails before the conversation can start because duplicate default LLM registrations collide.

Step 2 — Apply the PR's changes:
Checked out fix/duplicate-usage-id-registry-error at PR commit 497d439b.

Step 3 — Re-run with the fix in place:
Ran the same script on PR head.

Observed output:

=== PR HEAD: fix/duplicate-usage-id-registry-error ===
497d439b
duplicate-default: send_message succeeded
duplicate-default: registry=['default']
distinct-usage-ids: send_message succeeded
distinct-usage-ids: registry=['agent', 'condenser']
PR_EXIT=0

This shows the duplicate-default case no longer crashes, and a related normal case with distinct usage_ids still registers both LLMs as expected.

Test 2: Same usage_id with conflicting LLM configs produces a clear failure

Step 1 — Establish baseline without the fix:
Ran a script that creates LLM(model="gpt-4o") for the agent and LLM(model="gpt-4o-mini") for the condenser, both using the default usage_id.

Observed output on origin/main:

=== BASELINE CONFLICTING CONFIG: origin/main ===
2aa5256e
conflicting-config: Agent construction succeeded
Traceback (most recent call last):
    raise ValueError(message)
ValueError: Usage ID 'default' already exists in registry. Use a different usage_id on the LLM or call get() to retrieve the existing LLM.
BASE_EXIT=1

The base branch allowed construction and only failed later with the generic registry collision.

Step 2 — Apply the PR's changes:
Checked out fix/duplicate-usage-id-registry-error at PR commit 497d439b.

Step 3 — Re-run with the fix in place:
Observed output on PR head:

=== PR CONFLICTING CONFIG: fix/duplicate-usage-id-registry-error ===
497d439b
Traceback (most recent call last):
  Value error, Multiple LLMs share usage_id='default' but have different configurations. This would cause one LLM's settings to be silently ignored. Set distinct usage_id values for LLMs with different configs, e.g.: LLM(model='gpt-4o', usage_id='agent'), LLM(model='gpt-4o-mini', usage_id='condenser') [type=value_error, input_value={'llm': LLM(model='gpt-4o...curity_analyzer': True}}, input_type=dict]
PR_EXIT=1

This is the expected behavior for conflicting configs: the invalid setup fails earlier with actionable guidance instead of surfacing as the duplicate registry error during conversation startup.

Issues Found

None.

This review was generated by an AI agent (OpenHands) on behalf of the user.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-motivated fix. The two-layer defense — rejecting mismatched configs at agent-construction time (_validate_unique_llm_usage_ids) and silently skipping same-config duplicates at registration time — is solid defense-in-depth. The root cause is clearly documented and the tests cover all four meaningful cases (conflicting configs, shared object, identical configs from deserialization, and distinct usage_ids).

One suggestion on the warning message; otherwise looks good to merge.


This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.

# was validated properly, but can occur with deserialized
# agents that have the same usage_id on multiple LLMs
logger.warning(
f"Skipping duplicate LLM registration for usage_id "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This warning fires for every skipped duplicate, including same-config ones that AgentBase._validate_unique_llm_usage_ids explicitly permits (e.g., agents deserialized from JSON where a previously shared LLM reference becomes two separate objects with identical configs). For those callers, the message is noise and the advice "Consider using distinct usage_id values" is misleading — the configs are identical and skipping one is the correct outcome.

Since the validator ensures that any duplicate reaching _ensure_agent_ready must be a same-config one (different-config duplicates are rejected earlier), consider softening the advisory:

Suggested change
f"Skipping duplicate LLM registration for usage_id "
logger.warning(
f"Skipping duplicate LLM registration for usage_id "
f"'{llm.usage_id}' (model={llm.model}). This is expected "
f"for deserialized agents; if you see unexpected behavior, "
f"verify that LLMs with different configs use distinct usage_id values."
)


# Compare LLM configs (excluding usage_id itself)
# Use model_dump with redaction to compare functional equivalence
first_config = llms[0].model_dump(exclude={"usage_id"})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: model_dump() in default Python mode returns SecretStr objects for API-key fields rather than their underlying string values. Equality still works (SecretStr.__eq__ compares get_secret_value()), so this is correct for common configurations. However, if any LLM field holds a mutable or non-deterministically-ordered structure (e.g., a custom extra_headers dict that was built in different ways), the comparison could produce a false positive (two functionally identical LLMs treated as different). This is a low-probability edge case, but worth a brief comment so reviewers understand the intended semantics:

Suggested change
first_config = llms[0].model_dump(exclude={"usage_id"})
# Compare serialised configs for functional equivalence. Note:
# - SecretStr fields compare by value (Pydantic __eq__)
# - dict fields must have consistent insertion order (Python 3.7+)
first_config = llms[0].model_dump(exclude={"usage_id"})

@tofarr
Copy link
Copy Markdown
Collaborator Author

tofarr commented May 25, 2026

This was fixed by #3368

@tofarr tofarr closed this May 25, 2026
@tofarr tofarr deleted the fix/duplicate-usage-id-registry-error branch May 25, 2026 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants