Skip to content

feat: add Conversation.fork() as a first-class SDK primitive#2841

Merged
xingyaoww merged 27 commits into
mainfrom
openhands/conversation-fork
Apr 19, 2026
Merged

feat: add Conversation.fork() as a first-class SDK primitive#2841
xingyaoww merged 27 commits into
mainfrom
openhands/conversation-fork

Conversation

@xingyaoww
Copy link
Copy Markdown
Member

@xingyaoww xingyaoww commented Apr 16, 2026

  • A human has tested these changes.

Why

Downstream apps that surface system-generated conversations (CI runs, scheduled pipelines, agent-on-agent tasks) need a way to let humans branch off a run for follow-up exploration without contaminating the original audit trail. The only workaround today is to start a fresh conversation and replay the original transcript, which loses event-level fidelity (tool_call_id pairing, LLM token/cost metrics, condenser state, thinking blocks, hook execution records).

Summary

  • Add fork() as an abstract method on BaseConversation and implement it on LocalConversation and RemoteConversation
  • Deep-copies events, agent config (via serialization round-trip to avoid thread-lock pickling issues), workspace metadata, and runtime state (agent_state, activated_knowledge_skills) into a new conversation with its own ID and persistence directory
  • By default metrics start fresh on the fork (reset_metrics=True); set reset_metrics=False to carry them over
  • Expose through the agent-server REST API as POST /api/conversations/{id}/fork with an optional ForkConversationRequest body (id, title, tags, reset_metrics)
  • Add 12 unit tests covering all fork behaviors

Issue Number

Closes #2840

Docs PR: OpenHands/docs#459

How to Test

uv run pytest tests/sdk/conversation/local/test_fork.py -v

All 12 tests pass:

tests/sdk/conversation/local/test_fork.py::test_fork_creates_new_id PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_with_explicit_id PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_copies_events PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_source_unmodified PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_execution_status_is_idle PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_resets_metrics_by_default PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_preserves_metrics_when_requested PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_copies_agent_state PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_accepts_replacement_agent PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_with_tags PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_with_title_sets_tag PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_shares_workspace PASSED

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Notes

  • Agent deep-copy uses model_validate(model_dump(...)) serialization round-trip instead of model_copy(deep=True) to avoid TypeError: cannot pickle '_thread.lock' object from Pydantic's deepcopy of private attributes containing thread locks.
  • The server-side fork persists the forked conversation to disk and then resumes it via _start_event_service, keeping the approach consistent with how conversations are normally created and resumed.
  • The RemoteConversation.fork() delegates to the server's POST .../fork endpoint and returns a new RemoteConversation pointing at the fork.

This PR was created by an AI assistant (OpenHands) on behalf of the user.

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:80bebdc-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-80bebdc-python \
  ghcr.io/openhands/agent-server:80bebdc-python

All tags pushed for this build

ghcr.io/openhands/agent-server:80bebdc-golang-amd64
ghcr.io/openhands/agent-server:80bebdc-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:80bebdc-golang-arm64
ghcr.io/openhands/agent-server:80bebdc-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:80bebdc-java-amd64
ghcr.io/openhands/agent-server:80bebdc-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:80bebdc-java-arm64
ghcr.io/openhands/agent-server:80bebdc-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:80bebdc-python-amd64
ghcr.io/openhands/agent-server:80bebdc-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:80bebdc-python-arm64
ghcr.io/openhands/agent-server:80bebdc-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:80bebdc-golang
ghcr.io/openhands/agent-server:80bebdc-java
ghcr.io/openhands/agent-server:80bebdc-python

About Multi-Architecture Support

  • Each variant tag (e.g., 80bebdc-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 80bebdc-python-amd64) are also available if needed

Add fork() to BaseConversation (abstract), LocalConversation, and
RemoteConversation. The method deep-copies the event log, agent config,
workspace metadata, and runtime state into a new conversation with its
own ID and persistence directory. By default metrics start fresh on the
fork; set reset_metrics=False to carry them over.

Expose the primitive through the agent-server REST API as
POST /api/conversations/{id}/fork with an optional
ForkConversationRequest body (id, title, tags, reset_metrics).

Closes #2840

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   conversation_router.py1441093%287, 377–380, 392–395, 429
   conversation_service.py49111875%141–142, 169, 172, 174, 181–187, 215, 222, 243, 342, 348, 353, 359, 367–368, 377–380, 389, 403–405, 412, 445–446, 485, 488, 505–509, 511–512, 515–516, 519–524, 621, 628–632, 635–636, 640–644, 647–648, 652–656, 659–660, 682–683, 687–688, 690–692, 694, 697, 705–709, 712, 719–724, 726–727, 733–738, 745–746, 750, 752–753, 758–759, 765–766, 773–774, 778–780, 798, 822, 1100, 1103
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py4292594%294, 299, 428, 471, 489, 505, 570, 762–763, 766, 926, 934, 936, 940–941, 952, 954–956, 981, 1176, 1180, 1250, 1257–1258
   remote_conversation.py64010184%78, 80, 147, 174, 187, 189–192, 202, 224–225, 230–233, 316, 326–328, 334, 375, 522–525, 527, 553–557, 562–565, 568, 584, 745–746, 750–751, 765, 789–790, 809, 820–821, 841–844, 846–847, 871–873, 876–880, 882–883, 887, 889–897, 899, 936, 1067, 1139–1140, 1144, 1149–1153, 1159–1165, 1178, 1183, 1218, 1274, 1281, 1287–1288, 1439–1440
TOTAL23572570775% 

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solves a real problem with a pragmatic approach, but has correctness and API consistency issues that should be addressed.

[RISK ASSESSMENT]
⚠️ Risk Assessment: 🟡 MEDIUM

New feature with limited blast radius, but potential for data corruption via shared mutable objects and API surface inconsistency between local/remote implementations that could confuse users.

Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Outdated
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py Outdated
Comment thread openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py Outdated
- Deep-copy events via model_copy(deep=True) so source stays immutable
- Deep-copy agent_state via copy.deepcopy for mutable values
- RemoteConversation.fork() now raises NotImplementedError when
  agent is passed (server doesn't support agent replacement yet)

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member Author

Addressed all review feedback in 71b9751. Ready for another look.

@xingyaoww xingyaoww requested a review from all-hands-bot April 16, 2026 02:05
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Successfully verified Conversation.fork() creates independent conversation copies with full event history preservation and configurable metric handling.

Does this PR achieve its stated goal?

Yes. The PR set out to add fork() as a first-class SDK primitive for branching off conversations while preserving full event history, and it delivers on all claims:

  1. Abstract method added: fork() is now part of the BaseConversation contract
  2. LocalConversation implementation: Deep-copies events, agent config (via serialization to avoid thread-lock issues), workspace metadata, and runtime state (agent_state, activated_knowledge_skills)
  3. RemoteConversation implementation: Delegates to server API and returns new RemoteConversation instance
  4. REST API endpoint: POST /api/conversations/{id}/fork with ForkConversationRequest body
  5. Comprehensive testing: 12 unit tests covering all fork behaviors
  6. Functional verification: End-to-end testing confirms all claimed features work correctly

The implementation enables downstream apps to branch off system-generated conversations (CI runs, scheduled pipelines) for human follow-up exploration without contaminating the original audit trail — exactly the use case described in the PR.

Phase Result
Environment Setup uv sync completed, 12 fork tests available
CI & Tests ✅ sdk-tests: SUCCESS, tools-tests: SUCCESS, pre-commit: SUCCESS, 12/12 fork tests pass
Functional Verification ✅ All claimed fork behaviors verified end-to-end
Functional Verification

Test 1: Baseline — fork() does not exist on main branch

Before (main branch):

$ git show origin/main:openhands-sdk/openhands/sdk/conversation/base.py | grep "def fork"
(exit code 1 — method does not exist)

This confirms fork() is a new feature, not a modification of existing behavior.

After (PR branch):

$ git show HEAD:openhands-sdk/openhands/sdk/conversation/base.py | grep -A 3 "def fork"
def fork(
        self,
        *,
        conversation_id: ConversationID | None = None,

fork() is now an abstract method on BaseConversation.


Test 2: Unit tests for LocalConversation.fork()

Command:

$ uv run pytest tests/sdk/conversation/local/test_fork.py -v

Output:

tests/sdk/conversation/local/test_fork.py::test_fork_creates_new_id PASSED                    [  8%]
tests/sdk/conversation/local/test_fork.py::test_fork_with_explicit_id PASSED                  [ 16%]
tests/sdk/conversation/local/test_fork.py::test_fork_copies_events PASSED                     [ 25%]
tests/sdk/conversation/local/test_fork.py::test_fork_source_unmodified PASSED                 [ 33%]
tests/sdk/conversation/local/test_fork.py::test_fork_execution_status_is_idle PASSED          [ 41%]
tests/sdk/conversation/local/test_fork.py::test_fork_resets_metrics_by_default PASSED         [ 50%]
tests/sdk/conversation/local/test_fork.py::test_fork_preserves_metrics_when_requested PASSED  [ 58%]
tests/sdk/conversation/local/test_fork.py::test_fork_copies_agent_state PASSED                [ 66%]
tests/sdk/conversation/local/test_fork.py::test_fork_accepts_replacement_agent PASSED         [ 75%]
tests/sdk/conversation/local/test_fork.py::test_fork_with_tags PASSED                         [ 83%]
tests/sdk/conversation/local/test_fork.py::test_fork_with_title_sets_tag PASSED               [ 91%]
tests/sdk/conversation/local/test_fork.py::test_fork_shares_workspace PASSED                  [100%]

12 passed in 0.09s

✅ All 12 unit tests pass, covering:

  • Auto-generated and explicit fork IDs
  • Event copying
  • Source immutability
  • Execution status reset to idle
  • Metric reset and preservation
  • Agent state copying
  • Agent replacement
  • Tags and title
  • Workspace sharing

Test 3: End-to-end functional verification

Created a realistic scenario with a conversation containing:

  • 2 message events (user + assistant)
  • Agent state: {"task_progress": 0.5, "current_step": "analysis"}
  • Metrics: $0.05 accumulated cost, 100 prompt tokens, 50 completion tokens

Test 3a: Basic fork with auto-generated ID

✓ Forked conversation: 3e1485d2-5a1e-4516-b7b4-c2f379461371
  Source ID: ce82ec46-f25e-4af0-a9e1-38e68195c89b
  Fork ID:   3e1485d2-5a1e-4516-b7b4-c2f379461371
  ✓ IDs are different
  ✓ Events copied: 2
  ✓ Agent state copied: {'task_progress': 0.5, 'current_step': 'analysis'}
  ✓ Metrics reset: $0.0
  ✓ Execution status: ConversationExecutionStatus.IDLE

✅ Fork created with unique ID, events and agent state copied, metrics reset by default, execution status is idle.

Test 3b: Fork with metrics preservation

✓ Fork with preserved metrics: 53041ca9-d814-4998-92c5-5071cf27b5e6
  Source cost: $0.05
  Fork cost:   $0.05
  ✓ Metrics preserved

✅ When reset_metrics=False, cost metrics are preserved in the fork.

Test 3c: Fork with custom ID, title, and tags

✓ Fork with custom params: b3f38780-2509-4311-8c2b-9f1384ac510d
  ✓ Custom ID: b3f38780-2509-4311-8c2b-9f1384ac510d
  ✓ Title tag: Forked Investigation
  ✓ Custom tags: forked=true, priority=high

✅ Custom ID, title, and tags are correctly applied to the fork.

Test 3d: Fork with replacement agent

✓ Fork with replacement agent: abfbed38-63f5-4941-8089-16ba59e44bb8
  Source model: gpt-4o-mini
  Fork model:   gpt-4o
  ✓ Agent replaced

✅ Fork can use a different agent while preserving events from the source.

Test 3e: Isolation — fork modifications don't affect source

✓ Modified fork: c8f63514-ff95-46f6-be67-01f2c48ee753
  Source events: 2
  Fork events:   3
  ✓ Source events unchanged
  Source agent_state: {'task_progress': 0.5, 'current_step': 'analysis'}
  Fork agent_state:   {'task_progress': 0.5, 'current_step': 'analysis', 'new_key': 'new_value'}
  ✓ Source agent state unchanged

✅ Adding events to the fork does not modify the source. Agent state modifications in the fork do not affect the source. This confirms true isolation.

Test 3f: Workspace sharing

✓ Fork workspace check: af00852c-e278-4d90-b629-240c389292e1
  Source workspace: /tmp/tmp8b9ocsje
  Fork workspace:   /tmp/tmp8b9ocsje
  ✓ Workspace shared

✅ Fork shares the same workspace directory as the source (as designed).

Test 3g: Independent persistence directories

✓ Persistence paths:
  Source: /tmp/tmp8b9ocsje/ce82ec46f25e4af0a9e138e68195c89b
  Fork:   /tmp/tmp8b9ocsje/7c5eb130b5274ad28519ecec61addfae/7c5eb130b5274ad28519ecec61addfae
  ✓ Fork has independent persistence directory
  ✓ Fork persistence is in same workspace

✅ Fork has its own persistence directory (for independent state storage) while remaining in the same workspace.


Test 4: Implementation details verification

Agent deep-copy uses serialization round-trip:

# From local_conversation.py lines 343-348
agent_cls = type(self.agent)
fork_agent = agent_cls.model_validate(
    self.agent.model_dump(context={"expose_secrets": True}),
)

✅ Implementation matches PR claim: uses model_validate(model_dump(...)) instead of model_copy(deep=True) to avoid TypeError: cannot pickle '_thread.lock' object.


Test 5: CI validation

SDK tests:

sdk-tests: COMPLETED - SUCCESS

✅ All SDK tests pass, including the 12 new fork tests.

Other relevant checks:

tools-tests: COMPLETED - SUCCESS
pre-commit: COMPLETED - SUCCESS
Python API: COMPLETED - SUCCESS
REST API (OpenAPI): COMPLETED - SUCCESS
build-binary-and-test (ubuntu-latest): COMPLETED - SUCCESS

✅ No regressions detected in related test suites.

Issues Found

None.


Conclusion: This PR successfully adds Conversation.fork() as a first-class SDK primitive with all claimed functionality working correctly. The implementation is clean, well-tested, and ready for merge.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solid implementation of a useful feature, but has a correctness bug in the remote fork path and missing test coverage.

[RISK ASSESSMENT]
⚠️ Risk Assessment: 🟡 MEDIUM

New feature with limited blast radius, but has a data inconsistency bug in RemoteConversation.fork() and lacks test coverage for remote fork and server endpoint. Not in the eval-risk category since it's a pure addition that doesn't affect existing conversation behavior.

Comment thread tests/sdk/conversation/local/test_fork.py
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Successfully verified conversation fork functionality works end-to-end. The PR delivers on its promise to enable branching conversations for human follow-up without contaminating the original audit trail.

Does this PR achieve its stated goal?

Yes. The PR successfully adds Conversation.fork() as a first-class SDK primitive that enables downstream apps to branch off system-generated conversations (CI runs, scheduled pipelines, agent-on-agent tasks) for human exploration while preserving the original audit trail intact. Testing confirms events, agent state, and runtime state are deep-copied, the fork operates independently, and metrics can be reset or preserved as needed.

Phase Result
Environment Setup ✅ Dependencies installed, project builds successfully
CI & Tests ✅ 19/19 core checks pass, 12/12 fork unit tests pass
Functional Verification ✅ Fork creates independent conversations, deep-copy works, REST API functional
Functional Verification

Test 1: LocalConversation.fork() — Core Behavior

Setup — Create source conversation with history:

src = Conversation(agent=agent, persistence_dir=tmpdir, workspace=tmpdir)
src.state.events.append(MessageEvent(id="msg-1", ...))
src.state.events.append(MessageEvent(id="msg-2", ...))
src._state.agent_state = {"deployment_id": "deploy-123", "status": "success"}
src._state.stats.usage_to_metrics["test"] = Metrics(accumulated_cost=2.5)

Source conversation created with:

  • 2 events
  • Agent state: {deployment_id: deploy-123, status: success}
  • Cost: $2.50

Action — Fork with reset_metrics=True (default):

fork1 = src.fork(
    title="Human follow-up on deploy-123",
    tags={"type": "manual", "parent": "ci-run"},
)

Result:

Fork conversation ID: 84d8da92-d079-401a-b666-54afae00592a (different from source)
Events: 2 (copied from source)
Agent state: {deployment_id: deploy-123, status: success} (deep-copied)
Cost: $0.0 (reset as expected)
Tags: {type: manual, parent: ci-run, title: Human follow-up on deploy-123}

✅ Fork created with new ID, events copied, metrics reset, title and tags applied.


Test 2: Fork Independence — Mutation Isolation

Action — Modify fork without affecting source:

fork1.state.events.append(MessageEvent(id="fork-msg-1", ...))
fork1._state.agent_state["investigation"] = "active"

Result:

Source events: 2 (unchanged)
Fork events: 3 (added 1)
Source agent state: {deployment_id: deploy-123, status: success}
Fork agent state: {deployment_id: deploy-123, status: success, investigation: active}

✅ Fork modifications do NOT affect source — deep-copy works correctly.


Test 3: Metrics Preservation with reset_metrics=False

Action — Fork with metrics preserved:

fork2 = src.fork(title="Another fork with metrics", reset_metrics=False)

Result:

Fork2 cost: $2.5 (preserved from source)

✅ Metrics are preserved when reset_metrics=False.


Test 4: Event Deep-Copy Isolation

Action — Verify events are different objects:

src_event = src.state.events[0]
fork1_event = fork1.state.events[0]
assert src_event.id == fork1_event.id  # Same data
assert src_event is not fork1_event     # Different objects

Result:

Source event: id=msg-1, obj=139942274651536
Fork event:   id=msg-1, obj=139942274656016

✅ Events are deep-copied (different object IDs, same data).


Test 5: Additional Behaviors

Workspace sharing:

Both conversations share workspace: /tmp/tmpokgwgw93

✅ Fork shares workspace with source.

Execution status:

Fork execution status: idle

✅ Fork starts in idle status.

Custom conversation ID:

custom_id = uuid.uuid4()
fork3 = src.fork(conversation_id=custom_id)
assert fork3.id == custom_id

✅ Custom ID is honored.


Test 6: Unit Tests Coverage

Ran the full fork test suite:

uv run pytest tests/sdk/conversation/local/test_fork.py -v

Result:

12 passed in 0.10s

test_fork_creates_new_id PASSED
test_fork_with_explicit_id PASSED
test_fork_copies_events PASSED
test_fork_source_unmodified PASSED
test_fork_execution_status_is_idle PASSED
test_fork_resets_metrics_by_default PASSED
test_fork_preserves_metrics_when_requested PASSED
test_fork_copies_agent_state PASSED
test_fork_accepts_replacement_agent PASSED
test_fork_with_tags PASSED
test_fork_with_title_sets_tag PASSED
test_fork_shares_workspace PASSED

✅ All 12 unit tests pass.


Test 7: REST API Implementation

Verified the REST API endpoint is properly implemented:

Endpoint: POST /api/conversations/{id}/fork

Implementation verified:

  • ForkConversationRequest model added with proper validation
  • conversation_router.fork_conversation() endpoint implemented
  • conversation_service.fork_conversation() delegates to LocalConversation.fork()
  • ✅ Server persists fork and returns new ConversationInfo
  • ✅ OpenAPI schema validation passed (CI check: Agent Server/Check OpenAPI Schema ✅)

RemoteConversation.fork() implementation:

# Sends POST request to server /api/conversations/{id}/fork
resp = _send_request(self._client, "POST", 
                     f"{base_path}/{self._id}/fork", json=body)
fork_info = resp.json()
return RemoteConversation(agent=fork_agent, 
                          conversation_id=fork_uuid, ...)

✅ RemoteConversation properly delegates to server API.

Issues Found

None.

- Fix tags inconsistency: RemoteConversation.fork() now uses tags from
  the server response (which includes merged title) instead of the raw
  input kwargs
- Add RemoteConversation.fork() tests (POST request, server tags, agent
  param rejection, body fields)
- Add fork endpoint tests (201 success, 404 not found)
- Add event deep-copy isolation test for LocalConversation.fork()

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member Author

Addressed both remaining review comments in 06919d4: fixed tags inconsistency bug in RemoteConversation.fork() and added comprehensive tests for remote fork, server endpoint, and deep-copy event isolation. Ready for another look.

@xingyaoww xingyaoww requested a review from all-hands-bot April 16, 2026 02:18
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean implementation of a useful feature with proper deep-copying and good test coverage.

[RISK ASSESSMENT]
⚠️ Risk Assessment: 🟡 MEDIUM

New feature with persistence implications (creates new conversation directories, copies state). Well-tested with 18 tests covering local/remote implementations and endpoint behavior. Limited blast radius since it only affects code that explicitly calls fork(). Not in eval-risk category - pure API addition that doesn't modify agent behavior or planning logic.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

Verified: Conversation.fork() successfully creates independent conversation copies that preserve event history without contaminating the source audit trail.

Does this PR achieve its stated goal?

Yes. The PR goal was to enable downstream apps to let humans branch off system-generated conversations (CI runs, scheduled pipelines) for follow-up exploration without contaminating the original audit trail. The implementation delivers on this by deep-copying all events, creating an independent fork with its own ID and persistence directory, preserving runtime state (agent_state, activated_knowledge_skills), and optionally resetting metrics for fresh cost tracking. Testing confirmed that modifications to a fork do not affect the source conversation.

Phase Result
Environment Setup ✅ Build successful, dependencies installed
CI & Tests ✅ 19/19 fork tests pass (13 local + 4 remote + 2 router), pre-commit/API checks pass
Functional Verification ✅ End-to-end fork workflow verified with realistic CI → debug session scenario
Functional Verification

Test 1: LocalConversation.fork() end-to-end

Step 1 — Establish baseline (create source conversation):

Created a conversation simulating a CI run with:

  • 2 events (test trigger + results)
  • Agent state: {"last_test_run": "2024-04-16T02:00:00Z", "failed_tests": [...]}
  • Accumulated cost: $0.05
  • Tags: {"type": "ci-run", "build": "1234"}

Step 2 — Fork for human follow-up:

Ran fork_conv = source_conv.fork(title="Debug test failures from CI run #1234", tags={"type": "debug-session", "user": "developer"}, reset_metrics=True)

Result:

Forked conversation 61e6915e-00a9-48e7-8862-ba232747dd39 → 67ee52d7-6159-4172-a3c7-8e5020d7fb56
(2 events copied, reset_metrics=True)

Fork received:

  • Unique ID: 67ee52d7-6159-4172-a3c7-8e5020d7fb56
  • All 2 events from source
  • Deep copy of agent_state
  • Reset cost: $0.00
  • Merged tags: {"type": "debug-session", "user": "developer", "title": "Debug test failures..."}

Step 3 — Verify independence:

Added new event to fork and modified fork's agent_state to include "debug_session_started": True

Result:

  • Source events: 2 (unchanged)
  • Fork events: 3 (new event added)
  • Source agent_state: {"last_test_run": ..., "failed_tests": [...]} (unchanged)
  • Fork agent_state: {..., "debug_session_started": True} (modified)

✓ Confirmed: Fork is fully independent; source audit trail remains uncontaminated.

Test 2: Metrics preservation option

Step 1 — Fork with reset_metrics=False:

Ran fork_with_metrics = source_conv.fork(title="Debug session with cost tracking", reset_metrics=False)

Result:

Forked conversation 61e6915e-00a9-48e7-8862-ba232747dd39 → 47146d51-2b95-4204-9404-98f007791a24
(2 events copied, reset_metrics=False)
  • Source cost: $0.05
  • Fork cost: $0.05 (preserved)

✓ Confirmed: Metrics are preserved when reset_metrics=False.

Test 3: Unit test coverage

All 19 fork tests passed:

tests/sdk/conversation/local/test_fork.py::test_fork_creates_new_id PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_with_explicit_id PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_copies_events PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_source_unmodified PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_execution_status_is_idle PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_resets_metrics_by_default PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_preserves_metrics_when_requested PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_copies_agent_state PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_accepts_replacement_agent PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_with_tags PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_with_title_sets_tag PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_shares_workspace PASSED
tests/sdk/conversation/local/test_fork.py::test_fork_event_deep_copy_isolation PASSED

tests/sdk/conversation/remote/test_remote_fork.py::test_remote_fork_sends_post_request PASSED
tests/sdk/conversation/remote/test_remote_fork.py::test_remote_fork_uses_server_returned_tags PASSED
tests/sdk/conversation/remote/test_remote_fork.py::test_remote_fork_raises_on_agent_param PASSED
tests/sdk/conversation/remote/test_remote_fork.py::test_remote_fork_passes_body_fields PASSED

tests/agent_server/test_conversation_router.py::test_fork_conversation_success PASSED
tests/agent_server/test_conversation_router.py::test_fork_conversation_not_found PASSED

Tests cover:

  • ID generation (auto and explicit)
  • Event copying and deep-copy isolation
  • Source immutability
  • Execution status (starts idle)
  • Metrics (reset/preserve)
  • Agent state copying and replacement
  • Tags and title handling
  • Workspace sharing
  • REST API endpoint (success and 404)
  • RemoteConversation delegation

Issues Found

None.

Demonstrates the key fork() use cases:
  1. Basic fork — branch off after some interaction
  2. Independent continuation — fork runs without affecting source
  3. Agent replacement — fork with a different agent (tool-change)
  4. Tags and title — metadata on the fork

Co-authored-by: openhands <openhands@all-hands.dev>
Rewrote example to demonstrate fork() end-to-end without needing LLM
credentials:
  1. Basic fork — same events, different ID, title tag
  2. Source isolation — fork changes don't affect source
  3. Deep-copy isolation — event lists are independent
  4. Agent replacement — fork with a different agent (A/B testing)
  5. Metrics reset — reset_metrics flag behavior

Evidence of successful run included in .pr/ (auto-removed on merge).

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 16, 2026

PR Artifacts Cleaned Up

The .pr/ directory has been automatically removed.

xingyaoww pushed a commit to OpenHands/docs that referenced this pull request Apr 16, 2026
Add SDK guide page for the new Conversation.fork() primitive that lets
users branch off an existing conversation for follow-up exploration
without contaminating the original audit trail.

Covers:
  - Basic usage (fork, source isolation, deep-copy semantics)
  - Fork with a different agent (A/B testing, tool-change)
  - Tags, metadata, and metrics reset
  - Agent-server REST endpoint (POST /api/conversations/{id}/fork)
  - Full ready-to-run example (no LLM calls needed)

Added to Conversation Features nav group in docs.json.

Related SDK PR: OpenHands/software-agent-sdk#2841

Co-authored-by: openhands <openhands@all-hands.dev>
Rewrote example to use actual LLM calls (via LLM_API_KEY / LLM_BASE_URL
env vars) instead of a fake demo key. The example now:
  1. Runs the source conversation with a real agent (echo hello-from-source)
  2. Forks and continues with a new message (echo hello-from-fork)
  3. Verifies source isolation (source events unchanged)
  4. Forks with an alternate agent (A/B testing scenario)

Evidence of successful run against openhands/claude-haiku included in .pr/.

Co-authored-by: openhands <openhands@all-hands.dev>
The fork was created with visualizer=None, so fork.run() produced
no console output. Fixed to inherit the source's visualizer type
so forked conversations show the same agent action/observation
output as the source.

Updated example evidence shows all three runs producing full
visualizer output.

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai

This comment was marked as duplicate.

xingyaoww and others added 5 commits April 17, 2026 15:29
The RemoteEventsList default callback was including FULL_STATE_KEY
ConversationStateUpdateEvents delivered over WebSocket that are NOT
stored in the server-side EventLog. This caused the client-side
event count to diverge from the server, breaking fork event-count
parity in RemoteConversation.fork().

Co-authored-by: openhands <openhands@all-hands.dev>
The source conversation's client-side event list includes transient
WebSocket-only events (e.g. full-state snapshots) that are not
persisted to the EventLog.  The fork copies only persisted events,
so exact count parity is not expected.  Replace the strict equality
assertion with a >0 check and verify the fork grows after its own
run() instead.

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Member

@VascoSch92 VascoSch92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool :-)

Comment thread openhands-agent-server/openhands/agent_server/models.py
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Comment thread openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
@xingyaoww xingyaoww merged commit 18d0051 into main Apr 19, 2026
40 checks passed
@xingyaoww xingyaoww deleted the openhands/conversation-fork branch April 19, 2026 15:44
@xingyaoww
Copy link
Copy Markdown
Member Author

@OpenHands review the latest data of this PR and then help me update this documentation PR.

OpenHands/docs#459

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Apr 19, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented Apr 19, 2026

Summary of Work

I reviewed the final merged state of PR #2841 (Conversation.fork()) and updated the documentation PR at OpenHands/docs#459 (commit dcc2019).

Checklist — Was the request fully addressed?

  • Reviewed the latest data of PR feat: add Conversation.fork() as a first-class SDK primitive #2841 — Examined the merged code for fork() across base.py, local_conversation.py, remote_conversation.py, conversation_router.py, conversation_service.py, models.py, and both example files.
  • Updated the documentation PR — Pushed changes to the openhands/conversation-fork branch on the docs repo.

What was changed (commit dcc2019)

1. Synced the embedded remote example with the actual examples/02_remote_agent_server/11_conversation_fork.py:

  • Remote fork event count assertions were relaxed in the bug fix (WebSocket-only events aren't persisted server-side), but the docs still had the old strict assertions. Now matches the merged code exactly.

2. Fixed the "What Gets Copied" table to match the actual implementation:

  • Removed "Confirmation policy" and "Security analyzer" rows — these are not copied by fork(); the fork gets defaults.
  • Added "Agent state" (deep-copied), "Activated knowledge skills" (list-copied), and "Tags" (fresh from kwargs, not inherited).
  • Updated Stats/Metrics description to mention the reset_metrics=False option.

Conciseness

The changes are minimal and targeted — only the two sections that were inaccurate were touched. No extraneous changes were made. The standalone SDK example was already in sync and was left unchanged.

xingyaoww added a commit to OpenHands/docs that referenced this pull request Apr 20, 2026
* docs: add Conversation.fork() guide

Add SDK guide page for the new Conversation.fork() primitive that lets
users branch off an existing conversation for follow-up exploration
without contaminating the original audit trail.

Covers:
  - Basic usage (fork, source isolation, deep-copy semantics)
  - Fork with a different agent (A/B testing, tool-change)
  - Tags, metadata, and metrics reset
  - Agent-server REST endpoint (POST /api/conversations/{id}/fork)
  - Full ready-to-run example (no LLM calls needed)

Added to Conversation Features nav group in docs.json.

Related SDK PR: OpenHands/software-agent-sdk#2841

Co-authored-by: openhands <openhands@all-hands.dev>

* docs: sync fork guide with real-LLM example code

Update the ready-to-run example to match the real-LLM version from
the SDK repo, and add the RunExampleCode shared snippet.

Co-authored-by: openhands <openhands@all-hands.dev>

* docs(sdk): add remote fork example

Co-authored-by: openhands <openhands@all-hands.dev>

* docs: sync fork guide with final merged code

- Update embedded remote example (02_remote_agent_server/11_conversation_fork.py)
  to match bug-fixed version: relaxed event count assertions for remote
  forks since WebSocket-only events aren't persisted server-side
- Fix 'What Gets Copied' table: removed incorrect entries for
  confirmation_policy and security_analyzer (not copied in fork),
  added accurate entries for agent_state, activated_knowledge_skills,
  and tags

Co-authored-by: openhands <openhands@all-hands.dev>

* fix: adjust focus range to highlight fork call (lines 4-8)

Co-authored-by: openhands <openhands@all-hands.dev>

* chore: teach code-review skill to use APPROVE event for clean reviews

Add explicit guidance on GitHub review event values (APPROVE,
REQUEST_CHANGES, COMMENT) to the repo-specific code-review skill.
This mirrors the fix in OpenHands/extensions#185 that teaches the
general github-pr-review skill the same thing.

Previously the bot always submitted COMMENTED reviews even for clean
PRs because no skill told it to use the APPROVE event.

Co-authored-by: openhands <openhands@all-hands.dev>

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: enyst <engel.nyst@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Conversation.fork() as a first-class SDK primitive

6 participants