fix(sdk): prevent duplicate usage_id error when registering LLMs#3378
fix(sdk): prevent duplicate usage_id error when registering LLMs#3378tofarr wants to merge 4 commits into
Conversation
When an agent has multiple LLM objects with the same usage_id (e.g., both the agent and its condenser using separate LLM instances with the default usage_id='default'), _ensure_agent_ready() would fail with: ValueError: Usage ID 'default' already exists in registry The fix tracks newly-added usage_ids during the LLM registration loop, so subsequent LLMs with duplicate IDs are skipped rather than causing an error. Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||||||||||||
Instead of silently skipping duplicate LLMs during registry registration (which could cause confusion when LLMs have different settings), we now validate at Agent construction time that all reachable LLMs have unique usage_id values. This provides a clear error message guiding users to set distinct usage_id values, e.g.: LLM(model='gpt-4o', usage_id='agent') LLM(model='gpt-4o-mini', usage_id='condenser') The duplicate usage_id issue commonly occurs when an agent is deserialized from JSON, as Pydantic creates separate LLM objects for the agent and condenser even if they have the same configuration. Changes: - Add _validate_unique_llm_usage_ids validator to AgentBase - Add warning log in _ensure_agent_ready as defense-in-depth - Update tests to verify error at construction time Co-authored-by: openhands <openhands@all-hands.dev>
The original validation was too strict - it rejected any agent with multiple LLMs sharing a usage_id, even when they had identical configurations. This broke the common pattern of sharing an LLM object between the agent and its condenser, because when the agent is serialized and restored from JSON, Pydantic creates separate LLM objects (even though they're identical). The new validation only raises an error when LLMs have the same usage_id but *different* configurations, which is the actual problematic case that would lead to confusing behavior. Cases that now work: 1. Sharing the same LLM object (deduped by object identity) 2. Deserialized agents with identical LLM configs (configs match) Cases that still error (with helpful message): 3. Different LLMs with same usage_id but different models/settings Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable, but one important metrics-accounting issue needs attention before this is safe to merge.
Risk Assessment: 🟡 MEDIUM — touches LLM registration and conversation stats for multi-LLM agents.
Automated review by OpenHands on behalf of the requester.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26402871778
| f"'{llm.usage_id}' (model={llm.model}). Consider using " | ||
| f"distinct usage_id values for each LLM." | ||
| ) | ||
| continue |
There was a problem hiding this comment.
🟠 Important: Skipping the duplicate here leaves that LLM instance disconnected from ConversationStats. The new validator explicitly allows the deserialized/identical-config case where two separate LLM objects share a usage_id; only the first object gets registered, so if the condenser calls the second object its cost/token metrics won't be reflected in state.stats. Please either share/restore the existing metrics onto the duplicate, or notify register_llm for the duplicate before continuing, and add a regression assertion that duplicate identical LLM objects contribute to the tracked metrics.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The duplicate usage_id='default' conversation-start crash was reproduced on main and no longer occurs on PR head when exercising the SDK as a user would.
Does this PR achieve its stated goal?
Yes. The stated goal was to prevent conversations from failing when an agent has multiple LLM objects with the same usage_id; on origin/main, Conversation.send_message() failed with ValueError: Usage ID 'default' already exists in registry, while on PR commit 497d439b the same scenario succeeded and registered ['default']. I also verified related behavior: distinct usage_ids still register separately, and conflicting same-usage_id configs now fail early with a clear validation message.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed successfully; dependencies installed via uv sync --dev. |
| CI Status | 🟡 Snapshot showed 33 successful checks, 1 in-progress qa-changes check, and skipped cleanup/review-artifact checks; merge state BLOCKED only because a check was still pending. |
| Functional Verification | ✅ Reproduced the base failure and verified the PR fixes it through real SDK Agent/Conversation calls. |
Functional Verification
Test 1: Duplicate default usage_id no longer crashes conversation startup
Step 1 — Reproduce baseline without the fix:
Ran a short SDK script that creates two separate LLM(model="gpt-4o", usage_id="default") objects, uses one on the agent and one on an LLMSummarizingCondenser, then creates a Conversation and calls send_message("Hello from QA").
Command:
git switch --detach origin/main
OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_duplicate_usage_id.py 2>&1 | grep -E 'duplicate-default|distinct-usage-ids|ValueError|Traceback|Usage ID'Observed output:
=== BASELINE: origin/main ===
2aa5256e
Traceback (most recent call last):
raise ValueError(message)
ValueError: Usage ID 'default' already exists in registry. Use a different usage_id on the LLM or call get() to retrieve the existing LLM.
BASE_EXIT=1
This confirms the reported bug exists on the base branch: send_message() fails before the conversation can start because duplicate default LLM registrations collide.
Step 2 — Apply the PR's changes:
Checked out fix/duplicate-usage-id-registry-error at PR commit 497d439b.
Step 3 — Re-run with the fix in place:
Ran the same script on PR head.
Observed output:
=== PR HEAD: fix/duplicate-usage-id-registry-error ===
497d439b
duplicate-default: send_message succeeded
duplicate-default: registry=['default']
distinct-usage-ids: send_message succeeded
distinct-usage-ids: registry=['agent', 'condenser']
PR_EXIT=0
This shows the duplicate-default case no longer crashes, and a related normal case with distinct usage_ids still registers both LLMs as expected.
Test 2: Same usage_id with conflicting LLM configs produces a clear failure
Step 1 — Establish baseline without the fix:
Ran a script that creates LLM(model="gpt-4o") for the agent and LLM(model="gpt-4o-mini") for the condenser, both using the default usage_id.
Observed output on origin/main:
=== BASELINE CONFLICTING CONFIG: origin/main ===
2aa5256e
conflicting-config: Agent construction succeeded
Traceback (most recent call last):
raise ValueError(message)
ValueError: Usage ID 'default' already exists in registry. Use a different usage_id on the LLM or call get() to retrieve the existing LLM.
BASE_EXIT=1
The base branch allowed construction and only failed later with the generic registry collision.
Step 2 — Apply the PR's changes:
Checked out fix/duplicate-usage-id-registry-error at PR commit 497d439b.
Step 3 — Re-run with the fix in place:
Observed output on PR head:
=== PR CONFLICTING CONFIG: fix/duplicate-usage-id-registry-error ===
497d439b
Traceback (most recent call last):
Value error, Multiple LLMs share usage_id='default' but have different configurations. This would cause one LLM's settings to be silently ignored. Set distinct usage_id values for LLMs with different configs, e.g.: LLM(model='gpt-4o', usage_id='agent'), LLM(model='gpt-4o-mini', usage_id='condenser') [type=value_error, input_value={'llm': LLM(model='gpt-4o...curity_analyzer': True}}, input_type=dict]
PR_EXIT=1
This is the expected behavior for conflicting configs: the invalid setup fails earlier with actionable guidance instead of surfacing as the duplicate registry error during conversation startup.
Issues Found
None.
This review was generated by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
Clean, well-motivated fix. The two-layer defense — rejecting mismatched configs at agent-construction time (_validate_unique_llm_usage_ids) and silently skipping same-config duplicates at registration time — is solid defense-in-depth. The root cause is clearly documented and the tests cover all four meaningful cases (conflicting configs, shared object, identical configs from deserialization, and distinct usage_ids).
One suggestion on the warning message; otherwise looks good to merge.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation.
| # was validated properly, but can occur with deserialized | ||
| # agents that have the same usage_id on multiple LLMs | ||
| logger.warning( | ||
| f"Skipping duplicate LLM registration for usage_id " |
There was a problem hiding this comment.
🟡 Suggestion: This warning fires for every skipped duplicate, including same-config ones that AgentBase._validate_unique_llm_usage_ids explicitly permits (e.g., agents deserialized from JSON where a previously shared LLM reference becomes two separate objects with identical configs). For those callers, the message is noise and the advice "Consider using distinct usage_id values" is misleading — the configs are identical and skipping one is the correct outcome.
Since the validator ensures that any duplicate reaching _ensure_agent_ready must be a same-config one (different-config duplicates are rejected earlier), consider softening the advisory:
| f"Skipping duplicate LLM registration for usage_id " | |
| logger.warning( | |
| f"Skipping duplicate LLM registration for usage_id " | |
| f"'{llm.usage_id}' (model={llm.model}). This is expected " | |
| f"for deserialized agents; if you see unexpected behavior, " | |
| f"verify that LLMs with different configs use distinct usage_id values." | |
| ) |
|
|
||
| # Compare LLM configs (excluding usage_id itself) | ||
| # Use model_dump with redaction to compare functional equivalence | ||
| first_config = llms[0].model_dump(exclude={"usage_id"}) |
There was a problem hiding this comment.
🟡 Suggestion: model_dump() in default Python mode returns SecretStr objects for API-key fields rather than their underlying string values. Equality still works (SecretStr.__eq__ compares get_secret_value()), so this is correct for common configurations. However, if any LLM field holds a mutable or non-deterministically-ordered structure (e.g., a custom extra_headers dict that was built in different ways), the comparison could produce a false positive (two functionally identical LLMs treated as different). This is a low-probability edge case, but worth a brief comment so reviewers understand the intended semantics:
| first_config = llms[0].model_dump(exclude={"usage_id"}) | |
| # Compare serialised configs for functional equivalence. Note: | |
| # - SecretStr fields compare by value (Pydantic __eq__) | |
| # - dict fields must have consistent insertion order (Python 3.7+) | |
| first_config = llms[0].model_dump(exclude={"usage_id"}) |
|
This was fixed by #3368 |
Summary
Fixes a bug where conversations would fail to start with:
Root Cause
The error occurred in
LocalConversation._ensure_agent_ready()when:usage_id(the default is"default")Example Triggering Scenario
Fix
A one-line fix in
openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py:# Register LLMs in the registry (still holding lock) self.llm_registry.subscribe(self._state.stats.register_llm) registered = set(self.llm_registry.list_usage_ids()) for llm in list(self.agent.get_all_llms()): if llm.usage_id not in registered: self.llm_registry.add(llm) + registered.add(llm.usage_id) # Track newly added IDsThis ensures that when multiple LLM objects have the same
usage_id, only the first one is registered and subsequent duplicates are skipped (rather than causing an error).Test Plan
test_conversation_handles_duplicate_usage_id_llmsthat verifies the fixThis pull request was created by an AI agent (OpenHands) on behalf of the user.
@tofarr can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:497d439-pythonRun
All tags pushed for this build
About Multi-Architecture Support
497d439-python) is a multi-arch manifest supporting both amd64 and arm64497d439-python-amd64) are also available if needed