Skip to content

fix(conversation): prevent fork(agent=...) from clobbering source prompt_cache_key#2923

Merged
xingyaoww merged 3 commits into
mainfrom
2917-localconversationforkagent-can-clobber-the-source-prompt_cache_key
Apr 28, 2026
Merged

fix(conversation): prevent fork(agent=...) from clobbering source prompt_cache_key#2923
xingyaoww merged 3 commits into
mainfrom
2917-localconversationforkagent-can-clobber-the-source-prompt_cache_key

Conversation

@VascoSch92
Copy link
Copy Markdown
Member

@VascoSch92 VascoSch92 commented Apr 22, 2026

Summary

Fixes #2917.

LocalConversation.fork(agent=...) could silently overwrite the source conversation's prompt_cache_key whenever the caller passed an agent that aliased the source agent or shared the source's LLM instance. That defeats the one-conversation / one-cache-shard invariant introduced in #2907 and causes the source and the fork to send requests with the same cache key, the exact condition #2907 was designed to prevent.

Root cause

LocalConversation.__init__ pins the cache key by mutating the agent's LLM in place:

  self.agent.llm._prompt_cache_key = str(self._state.id)

fork() had two branches for selecting the fork's agent: 1. No agent= supplied → round-trip the source agent through model_dump / model_validate to produce an independent copy. Safe.
2. agent= supplied → use the caller's object by reference. Unsafe: if the object (or its llm) is the source's, the fork's __init__ repins the source's LLM.

Reporter's verification, reproduced locally:

  • source key before fork: 41176d21-79e0-412a-b974-02a87b41dd47
  • fork(agent=conv.agent) → fork key: 398100de-cf1f-4029-b758-ab5755de9a8a
  • source key after fork: 398100de-cf1f-4029-b758-ab5755de9a8a ← clobbered

Why collapse the two paths (rather than fix only path 2)?

The two paths existed for a real reason, so it's worth explaining the decision.

Path 1 (default) deep-copies the source agent via a JSON round-trip instead of copy.deepcopy / model_copy(deep=True), because agent/LLM private attrs hold threading.Lock objects that aren't picklable (the original PR #2841 calls this out explicitly).

Path 2 (agent=...) was an optimization for the A/B-test / tool-swap use case (see examples/01_standalone_sdk/48_conversation_fork.py): the caller constructs a brand-new Agent with a brand-new LLM and hands it in, so copying would be wasteful.

The footgun is that path 2 assumes the supplied agent is fresh and non-aliased — but that assumption is never checked. It breaks silently when a caller:

  • passes source.agent directly (the issue's exact scenario), or
  • builds a new Agent(llm=source.agent.llm, tools=...) reusing the existing LLM — a natural pattern if you only want to swap tools but keep model config.

Behavior change worth flagging

Before: fork(agent=caller_agent).agent is caller_agent was True. After: it's False — the fork receives a reconstituted clone. Callers who held a reference to caller_agent and mutated it post-fork expecting the fork to see the change will now silently diverge. This is not documented as guaranteed anywhere, the shipped example doesn't rely on it, and it matches the mental model that a fork is an independent conversation — but calling it out for reviewers.


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:234be48-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-234be48-python \
  ghcr.io/openhands/agent-server:234be48-python

All tags pushed for this build

ghcr.io/openhands/agent-server:234be48-golang-amd64
ghcr.io/openhands/agent-server:234be48-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:234be48-golang-arm64
ghcr.io/openhands/agent-server:234be48-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:234be48-java-amd64
ghcr.io/openhands/agent-server:234be48-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:234be48-java-arm64
ghcr.io/openhands/agent-server:234be48-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:234be48-python-amd64
ghcr.io/openhands/agent-server:234be48-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:234be48-python-arm64
ghcr.io/openhands/agent-server:234be48-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:234be48-golang
ghcr.io/openhands/agent-server:234be48-java
ghcr.io/openhands/agent-server:234be48-python

About Multi-Architecture Support

  • Each variant tag (e.g., 234be48-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 234be48-python-amd64) are also available if needed

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@VascoSch92 VascoSch92 changed the title fix(conversation): prevent fork(agent=...) from clobbering source prompt_cache_key (#2917) fix(conversation): prevent fork(agent=...) from clobbering source prompt_cache_key Apr 22, 2026
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean fix that eliminates a footgun.

Key strengths:

  • Unifies two code paths into one safe approach
  • Always deep-copies to prevent cache key aliasing bug
  • Solid test coverage for both default and aliased agent scenarios
  • Clear comments explaining the rationale

The behavior change (fork always creates a deep copy) is the right trade-off for correctness. Nice work!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py4402594%297, 302, 433, 479, 516, 532, 597, 798–799, 802, 962, 970, 972, 976–977, 988, 990–992, 1017, 1212, 1216, 1286, 1293–1294
TOTAL23673683971% 

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

The PR successfully fixes the cache key clobbering bug. Verified with before/after reproduction demonstrating the fix works for all aliasing scenarios.

Does this PR achieve its stated goal?

Yes. The PR fixes issue #2917 where fork(agent=...) could clobber the source conversation's prompt_cache_key. I verified the bug exists on main (scenarios 2 & 3 fail) and is fixed on this branch (all scenarios pass). The fix correctly handles all three aliasing scenarios: default fork, fork(agent=source.agent), and fork(agent=Agent(llm=source.agent.llm)).

Phase Result
Environment Setup ✅ Dependencies installed, SDK imports successfully
CI & Tests ✅ 21+ checks passed, sdk-tests: 17/17 pass (2 new)
Functional Verification ✅ Bug reproduced on main, fix verified on PR branch
Functional Verification

Test 1: Reproduce bug on main branch (commit 1ea8d64)

Step 1 — Establish baseline (without the fix):

Created reproduction script testing three scenarios:

  1. Default fork() without agent parameter
  2. fork(agent=source.agent) - aliased agent (the reported bug)
  3. fork(agent=Agent(llm=source.agent.llm)) - shared LLM

Ran on main branch:

$ git checkout 1ea8d64a  # main branch before fix
$ OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/reproduce_2917.py

Results WITHOUT the fix:

SCENARIO 1: Default fork() - no agent parameter
Source LLM cache key BEFORE fork: 0547203f-a277-43f3-982f-c5aa9b91a9ef
Source LLM cache key AFTER fork: 0547203f-a277-43f3-982f-c5aa9b91a9ef
✅ PASS: Source cache key was NOT clobbered

SCENARIO 2: fork(agent=source.agent) - aliased agent
Source LLM cache key BEFORE fork: 1ea9a8e1-17e3-40e7-a546-c4802bf77621
Source LLM cache key AFTER fork: cfe65867-1b31-4261-9324-4a70dcc2526d
❌ FAIL: Source cache key was CLOBBERED!
   Expected: 1ea9a8e1-17e3-40e7-a546-c4802bf77621
   Got: cfe65867-1b31-4261-9324-4a70dcc2526d

SCENARIO 3: fork(agent=Agent(llm=source.agent.llm)) - shared LLM
Source LLM cache key BEFORE fork: 6a9a774a-aa22-448a-9dd1-de37257b59eb
Source LLM cache key AFTER fork: 8ae0e55b-97cc-4bed-b261-b7b1b4a25e6e
❌ FAIL: Source cache key was CLOBBERED!
   Expected: 6a9a774a-aa22-448a-9dd1-de37257b59eb
   Got: 8ae0e55b-97cc-4bed-b261-b7b1b4a25e6e

This confirms the bug exists: scenarios 2 and 3 fail because the fork's __init__ repins the shared LLM's cache key to the fork's ID, overwriting the source's cache key.


Test 2: Verify fix on PR branch (commit 8030c54)

Step 2 — Apply the PR's changes:

$ git checkout 2917-localconversationforkagent-can-clobber-the-source-prompt_cache_key

Step 3 — Re-run with the fix in place:

$ OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/reproduce_2917.py

Results WITH the fix:

SCENARIO 1: Default fork() - no agent parameter
Source LLM cache key BEFORE fork: 5d6940a1-d97b-4a9e-bb4e-7a45746f9314
Source LLM cache key AFTER fork: 5d6940a1-d97b-4a9e-bb4e-7a45746f9314
✅ PASS: Source cache key was NOT clobbered
✅ PASS: Fork has its own cache key

SCENARIO 2: fork(agent=source.agent) - aliased agent
Source LLM cache key BEFORE fork: 2bd8cb85-12ec-450e-b8dd-b8b18b6a17e9
Source LLM cache key AFTER fork: 2bd8cb85-12ec-450e-b8dd-b8b18b6a17e9
✅ PASS: Source cache key was NOT clobbered
✅ PASS: Fork has its own cache key
✅ PASS: Fork has its own LLM object (not aliased)

SCENARIO 3: fork(agent=Agent(llm=source.agent.llm)) - shared LLM
Source LLM cache key BEFORE fork: d24ab2e9-0f2e-4433-bd80-4b530642f977
Source LLM cache key AFTER fork: d24ab2e9-0f2e-4433-bd80-4b530642f977
✅ PASS: Source cache key was NOT clobbered
✅ PASS: Fork has its own cache key

This shows the fix works: all scenarios pass. The source cache key remains unchanged after forking, and each fork gets its own cache key.


Test 3: Verify all fork tests pass

$ uv run pytest tests/sdk/conversation/local/test_fork.py -v

Result: ✅ 17 passed (including 2 new tests for this fix)

The two new tests explicitly verify:

  1. test_fork_default_does_not_clobber_source_cache_key - default fork scenario
  2. test_fork_with_aliased_agent_does_not_clobber_source_cache_key - aliased agent scenario

All existing tests continue to pass, confirming no regression.


Test 4: Verify documented use case still works

Reviewed examples/01_standalone_sdk/48_conversation_fork.py which shows the documented A/B-test pattern:

alt_agent = Agent(llm=alt_llm, tools=[...])
fork_alt = source.fork(agent=alt_agent, title="Tool-change experiment")

This pattern creates a brand-new agent with a brand-new LLM, which is the optimization the original code path supported. The fix preserves this functionality while also handling the aliased-agent case that was broken.

Issues Found

None.

Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Copy link
Copy Markdown
Member

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@xingyaoww xingyaoww enabled auto-merge (squash) April 24, 2026 14:13
@xingyaoww xingyaoww merged commit 61a7f0d into main Apr 28, 2026
25 checks passed
@xingyaoww xingyaoww deleted the 2917-localconversationforkagent-can-clobber-the-source-prompt_cache_key branch April 28, 2026 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LocalConversation.fork(agent=...) can clobber the source prompt_cache_key

4 participants