Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
5282351
feat: add Conversation.fork() as a first-class SDK primitive
openhands-agent Apr 16, 2026
c0aae2a
fix: add fork() stub to MockConversation for pyright compliance
openhands-agent Apr 16, 2026
71b9751
fix: address PR review — deep-copy events/state, clarify agent param
openhands-agent Apr 16, 2026
06919d4
fix: use server-returned tags in RemoteConversation.fork(), add tests
openhands-agent Apr 16, 2026
2017bb5
docs: add conversation fork example
openhands-agent Apr 16, 2026
55aceef
docs: add conversation fork example with run evidence
openhands-agent Apr 16, 2026
3af425f
docs: use real LLM in conversation fork example
openhands-agent Apr 16, 2026
66a1707
fix: inherit visualizer in Conversation.fork()
openhands-agent Apr 16, 2026
10a751b
fix: address fork() bugs found in code review
openhands-agent Apr 17, 2026
08ac67b
docs: add SDK architecture conventions to code review skill
openhands-agent Apr 17, 2026
10f5627
Merge branch 'main' into openhands/conversation-fork
xingyaoww Apr 17, 2026
b44034e
docs(.pr): update example run logs after bug fixes
openhands-agent Apr 17, 2026
2cd0c3a
feat: add agent-server fork example (02_remote_agent_server/11)
openhands-agent Apr 17, 2026
c7adb2c
docs(.pr): add note on agent-server fork example testing limitations
openhands-agent Apr 17, 2026
e73d404
chore: Remove PR-only artifacts [automated]
Apr 17, 2026
858cbfe
Fix RemoteEventsList to filter out full-state snapshot events
xingyaoww Apr 17, 2026
78c4e44
Revert "Fix RemoteEventsList to filter out full-state snapshot events"
xingyaoww Apr 17, 2026
c1fb8de
fix(example): relax fork event-count assertions
xingyaoww Apr 17, 2026
9da75b3
docs(.pr): add example 11 stdout output artifact
xingyaoww Apr 17, 2026
5013b46
docs(.pr): add example 48 (standalone fork) stdout artifact
xingyaoww Apr 17, 2026
ada4921
chore: Remove PR-only artifacts [automated]
Apr 17, 2026
b9b09af
Apply suggestion from @xingyaoww
xingyaoww Apr 19, 2026
deafeea
Apply suggestion from @xingyaoww
xingyaoww Apr 19, 2026
c71e581
Apply suggestion from @xingyaoww
xingyaoww Apr 19, 2026
0517104
Revert "Apply suggestion from @xingyaoww"
xingyaoww Apr 19, 2026
0523331
Revert "Apply suggestion from @xingyaoww"
xingyaoww Apr 19, 2026
80bebdc
Revert "Apply suggestion from @xingyaoww"
xingyaoww Apr 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 49 additions & 1 deletion .agents/skills/custom-codereview-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,12 +109,16 @@ If the updated package was uploaded **within the last 7 days**, treat it as a re
## What to Check

- **Complexity**: Over-engineered solutions, unnecessary abstractions, complex logic that could be refactored
- **Testing**: Duplicate test coverage, tests for library features, missing edge case coverage
- **Testing**: Duplicate test coverage, tests for library features, missing edge case coverage. For code that writes to disk, verify that tests cover the **persistence round-trip** (write → close → reopen → verify), not just in-memory state
- **Type Safety**: `# type: ignore` usage, missing type annotations, `getattr`/`hasattr` guards, mocking non-existent arguments
- **Breaking Changes**: API changes affecting users, removed public fields/methods, changed defaults
- **Code Quality**: Code duplication, missing comments for non-obvious decisions, inline imports (unless necessary for circular deps)
- **Repository Conventions**: Use `pyright` not `mypy`, put fixtures in `conftest.py`, avoid `sys.path.insert` hacks
- **Event Type Deprecation**: Changes to event types (Pydantic models used in serialization) must handle deprecated fields properly
- **Thread Safety**: New methods in `LocalConversation` that read or write `self._state` must use `with self._state:` — see the [Concurrency](#concurrency---localconversation-state-lock) section below
- **Persistence Paths**: Code that computes persistence directories must not double-append the conversation hex — see the [Persistence Paths](#persistence-path-construction) section below
- **Server-Side Cleanup**: Endpoints that create persistent state (directories, files) must have rollback logic for partial failures — see the [Server Error Handling](#server-side-error-handling) section below
- **Cross-File Data Flow**: When new code calls existing APIs (constructors, factory methods), trace 1–2 levels into those APIs to verify the caller uses them correctly. Bugs often hide at layer boundaries where the caller's assumptions don't match the callee's behavior

## Event Type Deprecation - Critical Review Checkpoint

Expand Down Expand Up @@ -162,6 +166,50 @@ pydantic_core.ValidationError: Extra inputs are not permitted

**This is a production-breaking change.** Do not approve PRs that modify event types without proper backward compatibility handling and tests.

## SDK Architecture Conventions

These conventions codify patterns that are easy to violate when adding new features. Each was learned from a real bug.

### Concurrency - LocalConversation State Lock

`LocalConversation` protects mutable state with a FIFOLock accessed via `with self._state:`. **Every** method that reads or writes `self._state.events`, `self._state.stats`, `self._state.agent_state`, `self._state.activated_knowledge_skills`, or any other mutable field on `ConversationState` must hold this lock. There are currently ~13 call sites using this pattern.

When reviewing a PR that adds a new method to `LocalConversation`:
1. Check whether it accesses any `self._state.*` field.
2. If yes, verify the access is inside a `with self._state:` block.
3. If not, flag it — the method is unsafe for concurrent use with `run()`.

### Persistence Path Construction

`BaseConversation.get_persistence_dir(base, conversation_id)` returns `str(Path(base) / conversation_id.hex)`. The `LocalConversation.__init__` constructor calls this automatically when `persistence_dir` is provided.

**Rule:** Callers that pass `persistence_dir` to `LocalConversation()` must pass only the **base directory** (e.g., `/data/conversations/`). The constructor appends the conversation hex. Passing a pre-constructed full path (e.g., `/data/conversations/abc123`) causes double-appending: `/data/conversations/abc123/abc123`.

When reviewing code that creates a new `LocalConversation` (fork, resume, migration):
1. Check what value is passed as `persistence_dir`.
2. Verify it does **not** already include the conversation ID hex.

### Server-Side Error Handling

Server endpoints in `conversation_service.py` that create persistent state (writing directories, files, or calling `fork()` which writes to disk) and then perform follow-up operations (like `_start_event_service`) must handle partial failure.

**Pattern:** If the follow-up operation fails, clean up the already-written persistent state so it doesn't become an orphaned directory that confuses future startups.

```python
# Good: rollback on failure
fork_dir = self.conversations_dir / fork_conv_id.hex
try:
fork_event_service = await self._start_event_service(fork_stored)
except Exception:
safe_rmtree(fork_dir)
raise
```

When reviewing server endpoints that create conversations or persistent artifacts:
1. Identify the "point of no return" where state is written to disk.
2. Check that subsequent operations are wrapped in try/except with cleanup.
3. For client-supplied IDs, verify there's a duplicate check before creating state (return 409 Conflict if taken).

## What NOT to Comment On

Do not leave comments for:
Expand Down
105 changes: 105 additions & 0 deletions examples/01_standalone_sdk/48_conversation_fork.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
"""Fork a conversation to branch off for follow-up exploration.

``Conversation.fork()`` deep-copies a conversation — events, agent config,
workspace metadata — into a new conversation with its own ID. The fork
starts in ``idle`` status and retains full event memory of the source, so
calling ``run()`` picks up right where the original left off.

Use cases:
- CI agents that produced a wrong patch — engineer forks to debug
without losing the original run's audit trail
- A/B-testing prompts — fork at a given turn, change one variable,
compare downstream
- Swapping tools mid-conversation (fork-on-tool-change)
"""

import os

from openhands.sdk import LLM, Agent, Conversation, Tool
from openhands.tools.terminal import TerminalTool


# -----------------------------------------------------------------
# Setup
# -----------------------------------------------------------------
llm = LLM(
model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"),
api_key=os.getenv("LLM_API_KEY"),
base_url=os.getenv("LLM_BASE_URL", None),
)

agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)])
cwd = os.getcwd()

# =================================================================
# 1. Run the source conversation
# =================================================================
source = Conversation(agent=agent, workspace=cwd)
source.send_message("Run `echo hello-from-source` in the terminal.")
source.run()

print("=" * 64)
print(" Conversation.fork() — SDK Example")
print("=" * 64)
print(f"\nSource conversation ID : {source.id}")
print(f"Source events count : {len(source.state.events)}")

# =================================================================
# 2. Fork and continue independently
# =================================================================
fork = source.fork(title="Follow-up fork")
source_event_count = len(source.state.events)

print("\n--- Fork created ---")
print(f"Fork ID : {fork.id}")
print(f"Fork events (copied) : {len(fork.state.events)}")
print(f"Fork title : {fork.state.tags.get('title')}")

assert fork.id != source.id
assert len(fork.state.events) == source_event_count

fork.send_message("Now run `echo hello-from-fork` in the terminal.")
fork.run()

# Source is untouched
assert len(source.state.events) == source_event_count
print("\n--- After running fork ---")
print(f"Source events (unchanged): {source_event_count}")
print(f"Fork events (grew) : {len(fork.state.events)}")

# =================================================================
# 3. Fork with a different agent (tool-change / A/B testing)
# =================================================================
alt_llm = LLM(
model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"),
api_key=os.getenv("LLM_API_KEY"),
base_url=os.getenv("LLM_BASE_URL", None),
usage_id="alt",
)
alt_agent = Agent(llm=alt_llm, tools=[Tool(name=TerminalTool.name)])

fork_alt = source.fork(
agent=alt_agent,
title="Tool-change experiment",
tags={"purpose": "a/b-test"},
)

print("\n--- Fork with alternate agent ---")
print(f"Fork ID : {fork_alt.id}")
print(f"Fork tags : {dict(fork_alt.state.tags)}")

fork_alt.send_message("What command did you run earlier? Just tell me, no tools.")
fork_alt.run()

print(f"Fork events : {len(fork_alt.state.events)}")

# =================================================================
# Summary
# =================================================================
print(f"\n{'=' * 64}")
print("All done — fork() works end-to-end.")
print("=" * 64)

# Report cost
cost = llm.metrics.accumulated_cost + alt_llm.metrics.accumulated_cost
print(f"EXAMPLE_COST: {cost}")
201 changes: 201 additions & 0 deletions examples/02_remote_agent_server/11_conversation_fork.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
"""Fork a conversation through the agent server REST API.

Demonstrates ``RemoteConversation.fork()`` which delegates to the server's
``POST /api/conversations/{id}/fork`` endpoint. The fork deep-copies
events and state on the server side, then returns a new
``RemoteConversation`` pointing at the copy.

Scenarios covered:
1. Run a source conversation on the server
2. Fork it — verify independent event histories
3. Fork with a title and custom tags
"""

import os
import subprocess
import sys
import tempfile
import threading
import time

from pydantic import SecretStr

from openhands.sdk import LLM, Agent, Conversation, RemoteConversation, Tool, Workspace
from openhands.tools.terminal import TerminalTool


# -----------------------------------------------------------------
# Managed server helper (reused from example 01)
# -----------------------------------------------------------------
def _stream_output(stream, prefix, target_stream):
try:
for line in iter(stream.readline, ""):
if line:
target_stream.write(f"[{prefix}] {line}")
target_stream.flush()
except Exception as e:
print(f"Error streaming {prefix}: {e}", file=sys.stderr)
finally:
stream.close()


class ManagedAPIServer:
"""Context manager that starts and stops a local agent-server."""

def __init__(self, port: int = 8000, host: str = "127.0.0.1"):
self.port = port
self.host = host
self.process: subprocess.Popen[str] | None = None
self.base_url = f"http://{host}:{port}"

def __enter__(self):
print(f"Starting agent-server on {self.base_url} ...")
self.process = subprocess.Popen(
[
"python",
"-m",
"openhands.agent_server",
"--port",
str(self.port),
"--host",
self.host,
],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
env={"LOG_JSON": "true", **os.environ},
)
assert self.process.stdout is not None
assert self.process.stderr is not None
threading.Thread(
target=_stream_output,
args=(self.process.stdout, "SERVER", sys.stdout),
daemon=True,
).start()
threading.Thread(
target=_stream_output,
args=(self.process.stderr, "SERVER", sys.stderr),
daemon=True,
).start()

import httpx

for _ in range(30):
try:
if httpx.get(f"{self.base_url}/health", timeout=1.0).status_code == 200:
print(f"Agent-server ready at {self.base_url}")
return self
except Exception:
pass
assert self.process.poll() is None, "Server exited unexpectedly"
time.sleep(1)
raise RuntimeError("Server failed to start in 30 s")

def __exit__(self, *args):
if self.process:
self.process.terminate()
try:
self.process.wait(timeout=5)
except subprocess.TimeoutExpired:
self.process.kill()
self.process.wait()
time.sleep(0.5)
print("Agent-server stopped.")


# -----------------------------------------------------------------
# Config
# -----------------------------------------------------------------
api_key = os.getenv("LLM_API_KEY")
assert api_key, "LLM_API_KEY must be set"

llm = LLM(
model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"),
api_key=SecretStr(api_key),
base_url=os.getenv("LLM_BASE_URL"),
)
agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)])

# -----------------------------------------------------------------
# Run
# -----------------------------------------------------------------
with ManagedAPIServer(port=8002) as server:
workspace_dir = tempfile.mkdtemp(prefix="fork_demo_")
workspace = Workspace(host=server.base_url, working_dir=workspace_dir)

# =============================================================
# 1. Source conversation
# =============================================================
source = Conversation(agent=agent, workspace=workspace)
assert isinstance(source, RemoteConversation)

source.send_message("Run `echo hello-from-source` in the terminal.")
source.run()

print("=" * 64)
print(" RemoteConversation.fork() — Agent-Server Example")
print("=" * 64)
print(f"\nSource conversation ID : {source.id}")
source_event_count = len(source.state.events)
print(f"Source events count : {source_event_count}")

# =============================================================
# 2. Fork and continue independently
# =============================================================
fork = source.fork(title="Follow-up fork")
assert isinstance(fork, RemoteConversation)

print("\n--- Fork created ---")
print(f"Fork ID : {fork.id}")
fork_event_count = len(fork.state.events)
print(f"Fork events (copied) : {fork_event_count}")

assert fork.id != source.id
# The fork copies all persisted events from the server-side EventLog.
# The source's client-side list may additionally contain transient
# WebSocket-only events (e.g. full-state snapshots) that are never
# persisted, so we only assert the fork has a non-trivial number of
# events rather than exact parity.
assert fork_event_count > 0

fork.send_message("Now run `echo hello-from-fork` in the terminal.")
fork.run()

print("\n--- After running fork ---")
print(f"Source events : {len(source.state.events)}")
print(f"Fork events (grew) : {len(fork.state.events)}")
assert len(fork.state.events) > fork_event_count

# =============================================================
# 3. Fork with tags
# =============================================================
fork_tagged = source.fork(
title="Tagged experiment",
tags={"purpose": "a/b-test"},
)
assert isinstance(fork_tagged, RemoteConversation)

print("\n--- Fork with tags ---")
print(f"Fork ID : {fork_tagged.id}")

fork_tagged.send_message(
"What command did you run earlier? Just tell me, no tools."
)
fork_tagged.run()

print(f"Fork events : {len(fork_tagged.state.events)}")

# =============================================================
# Summary
# =============================================================
print(f"\n{'=' * 64}")
print("All done — RemoteConversation.fork() works end-to-end.")
print("=" * 64)

# Cleanup
fork.close()
fork_tagged.close()
source.close()

cost = llm.metrics.accumulated_cost
print(f"EXAMPLE_COST: {cost}")
Loading
Loading