fix(conversation): prevent fork(agent=...) from clobbering source prompt_cache_key#2923
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean fix that eliminates a footgun.
Key strengths:
- Unifies two code paths into one safe approach
- Always deep-copies to prevent cache key aliasing bug
- Solid test coverage for both default and aliased agent scenarios
- Clear comments explaining the rationale
The behavior change (fork always creates a deep copy) is the right trade-off for correctness. Nice work!
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The PR successfully fixes the cache key clobbering bug. Verified with before/after reproduction demonstrating the fix works for all aliasing scenarios.
Does this PR achieve its stated goal?
Yes. The PR fixes issue #2917 where fork(agent=...) could clobber the source conversation's prompt_cache_key. I verified the bug exists on main (scenarios 2 & 3 fail) and is fixed on this branch (all scenarios pass). The fix correctly handles all three aliasing scenarios: default fork, fork(agent=source.agent), and fork(agent=Agent(llm=source.agent.llm)).
| Phase | Result |
|---|---|
| Environment Setup | ✅ Dependencies installed, SDK imports successfully |
| CI & Tests | ✅ 21+ checks passed, sdk-tests: 17/17 pass (2 new) |
| Functional Verification | ✅ Bug reproduced on main, fix verified on PR branch |
Functional Verification
Test 1: Reproduce bug on main branch (commit 1ea8d64)
Step 1 — Establish baseline (without the fix):
Created reproduction script testing three scenarios:
- Default
fork()without agent parameter fork(agent=source.agent)- aliased agent (the reported bug)fork(agent=Agent(llm=source.agent.llm))- shared LLM
Ran on main branch:
$ git checkout 1ea8d64a # main branch before fix
$ OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/reproduce_2917.pyResults WITHOUT the fix:
SCENARIO 1: Default fork() - no agent parameter
Source LLM cache key BEFORE fork: 0547203f-a277-43f3-982f-c5aa9b91a9ef
Source LLM cache key AFTER fork: 0547203f-a277-43f3-982f-c5aa9b91a9ef
✅ PASS: Source cache key was NOT clobbered
SCENARIO 2: fork(agent=source.agent) - aliased agent
Source LLM cache key BEFORE fork: 1ea9a8e1-17e3-40e7-a546-c4802bf77621
Source LLM cache key AFTER fork: cfe65867-1b31-4261-9324-4a70dcc2526d
❌ FAIL: Source cache key was CLOBBERED!
Expected: 1ea9a8e1-17e3-40e7-a546-c4802bf77621
Got: cfe65867-1b31-4261-9324-4a70dcc2526d
SCENARIO 3: fork(agent=Agent(llm=source.agent.llm)) - shared LLM
Source LLM cache key BEFORE fork: 6a9a774a-aa22-448a-9dd1-de37257b59eb
Source LLM cache key AFTER fork: 8ae0e55b-97cc-4bed-b261-b7b1b4a25e6e
❌ FAIL: Source cache key was CLOBBERED!
Expected: 6a9a774a-aa22-448a-9dd1-de37257b59eb
Got: 8ae0e55b-97cc-4bed-b261-b7b1b4a25e6e
This confirms the bug exists: scenarios 2 and 3 fail because the fork's __init__ repins the shared LLM's cache key to the fork's ID, overwriting the source's cache key.
Test 2: Verify fix on PR branch (commit 8030c54)
Step 2 — Apply the PR's changes:
$ git checkout 2917-localconversationforkagent-can-clobber-the-source-prompt_cache_keyStep 3 — Re-run with the fix in place:
$ OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/reproduce_2917.pyResults WITH the fix:
SCENARIO 1: Default fork() - no agent parameter
Source LLM cache key BEFORE fork: 5d6940a1-d97b-4a9e-bb4e-7a45746f9314
Source LLM cache key AFTER fork: 5d6940a1-d97b-4a9e-bb4e-7a45746f9314
✅ PASS: Source cache key was NOT clobbered
✅ PASS: Fork has its own cache key
SCENARIO 2: fork(agent=source.agent) - aliased agent
Source LLM cache key BEFORE fork: 2bd8cb85-12ec-450e-b8dd-b8b18b6a17e9
Source LLM cache key AFTER fork: 2bd8cb85-12ec-450e-b8dd-b8b18b6a17e9
✅ PASS: Source cache key was NOT clobbered
✅ PASS: Fork has its own cache key
✅ PASS: Fork has its own LLM object (not aliased)
SCENARIO 3: fork(agent=Agent(llm=source.agent.llm)) - shared LLM
Source LLM cache key BEFORE fork: d24ab2e9-0f2e-4433-bd80-4b530642f977
Source LLM cache key AFTER fork: d24ab2e9-0f2e-4433-bd80-4b530642f977
✅ PASS: Source cache key was NOT clobbered
✅ PASS: Fork has its own cache key
This shows the fix works: all scenarios pass. The source cache key remains unchanged after forking, and each fork gets its own cache key.
Test 3: Verify all fork tests pass
$ uv run pytest tests/sdk/conversation/local/test_fork.py -vResult: ✅ 17 passed (including 2 new tests for this fix)
The two new tests explicitly verify:
test_fork_default_does_not_clobber_source_cache_key- default fork scenariotest_fork_with_aliased_agent_does_not_clobber_source_cache_key- aliased agent scenario
All existing tests continue to pass, confirming no regression.
Test 4: Verify documented use case still works
Reviewed examples/01_standalone_sdk/48_conversation_fork.py which shows the documented A/B-test pattern:
alt_agent = Agent(llm=alt_llm, tools=[...])
fork_alt = source.fork(agent=alt_agent, title="Tool-change experiment")This pattern creates a brand-new agent with a brand-new LLM, which is the optimization the original code path supported. The fix preserves this functionality while also handling the aliased-agent case that was broken.
Issues Found
None.
…the-source-prompt_cache_key
…the-source-prompt_cache_key
Summary
Fixes #2917.
LocalConversation.fork(agent=...)could silently overwrite the source conversation'sprompt_cache_keywhenever the caller passed an agent that aliased the source agent or shared the source'sLLMinstance. That defeats the one-conversation / one-cache-shard invariant introduced in #2907 and causes the source and the fork to send requests with the same cache key, the exact condition #2907 was designed to prevent.Root cause
LocalConversation.__init__pins the cache key by mutating the agent's LLM in place:fork()had two branches for selecting the fork's agent: 1. Noagent=supplied → round-trip the source agent throughmodel_dump/model_validateto produce an independent copy. Safe.2.
agent=supplied → use the caller's object by reference. Unsafe: if the object (or itsllm) is the source's, the fork's__init__repins the source's LLM.Reporter's verification, reproduced locally:
41176d21-79e0-412a-b974-02a87b41dd47fork(agent=conv.agent)→ fork key:398100de-cf1f-4029-b758-ab5755de9a8a398100de-cf1f-4029-b758-ab5755de9a8a← clobberedWhy collapse the two paths (rather than fix only path 2)?
The two paths existed for a real reason, so it's worth explaining the decision.
Path 1 (default) deep-copies the source agent via a JSON round-trip instead of
copy.deepcopy/model_copy(deep=True), because agent/LLM private attrs holdthreading.Lockobjects that aren't picklable (the original PR #2841 calls this out explicitly).Path 2 (
agent=...) was an optimization for the A/B-test / tool-swap use case (seeexamples/01_standalone_sdk/48_conversation_fork.py): the caller constructs a brand-newAgentwith a brand-newLLMand hands it in, so copying would be wasteful.The footgun is that path 2 assumes the supplied agent is fresh and non-aliased — but that assumption is never checked. It breaks silently when a caller:
source.agentdirectly (the issue's exact scenario), orAgent(llm=source.agent.llm, tools=...)reusing the existing LLM — a natural pattern if you only want to swap tools but keep model config.Behavior change worth flagging
Before:
fork(agent=caller_agent).agent is caller_agentwasTrue. After: it'sFalse— the fork receives a reconstituted clone. Callers who held a reference tocaller_agentand mutated it post-fork expecting the fork to see the change will now silently diverge. This is not documented as guaranteed anywhere, the shipped example doesn't rely on it, and it matches the mental model that a fork is an independent conversation — but calling it out for reviewers.Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:234be48-pythonRun
All tags pushed for this build
About Multi-Architecture Support
234be48-python) is a multi-arch manifest supporting both amd64 and arm64234be48-python-amd64) are also available if needed