OpenHands · xingyaoww · Apr 19, 2026 · Apr 16, 2026 · Apr 16, 2026 · Apr 16, 2026
diff --git a/.agents/skills/custom-codereview-guide.md b/.agents/skills/custom-codereview-guide.md
@@ -109,12 +109,16 @@ If the updated package was uploaded **within the last 7 days**, treat it as a re
 ## What to Check
 
 - **Complexity**: Over-engineered solutions, unnecessary abstractions, complex logic that could be refactored
-- **Testing**: Duplicate test coverage, tests for library features, missing edge case coverage
+- **Testing**: Duplicate test coverage, tests for library features, missing edge case coverage. For code that writes to disk, verify that tests cover the **persistence round-trip** (write → close → reopen → verify), not just in-memory state
 - **Type Safety**: `# type: ignore` usage, missing type annotations, `getattr`/`hasattr` guards, mocking non-existent arguments
 - **Breaking Changes**: API changes affecting users, removed public fields/methods, changed defaults
 - **Code Quality**: Code duplication, missing comments for non-obvious decisions, inline imports (unless necessary for circular deps)
 - **Repository Conventions**: Use `pyright` not `mypy`, put fixtures in `conftest.py`, avoid `sys.path.insert` hacks
 - **Event Type Deprecation**: Changes to event types (Pydantic models used in serialization) must handle deprecated fields properly
+- **Thread Safety**: New methods in `LocalConversation` that read or write `self._state` must use `with self._state:` — see the [Concurrency](#concurrency---localconversation-state-lock) section below
+- **Persistence Paths**: Code that computes persistence directories must not double-append the conversation hex — see the [Persistence Paths](#persistence-path-construction) section below
+- **Server-Side Cleanup**: Endpoints that create persistent state (directories, files) must have rollback logic for partial failures — see the [Server Error Handling](#server-side-error-handling) section below
+- **Cross-File Data Flow**: When new code calls existing APIs (constructors, factory methods), trace 1–2 levels into those APIs to verify the caller uses them correctly. Bugs often hide at layer boundaries where the caller's assumptions don't match the callee's behavior
 
 ## Event Type Deprecation - Critical Review Checkpoint
 
@@ -162,6 +166,50 @@ pydantic_core.ValidationError: Extra inputs are not permitted
 
 **This is a production-breaking change.** Do not approve PRs that modify event types without proper backward compatibility handling and tests.
 
+## SDK Architecture Conventions
+
+These conventions codify patterns that are easy to violate when adding new features. Each was learned from a real bug.
+
+### Concurrency - LocalConversation State Lock
+
+`LocalConversation` protects mutable state with a FIFOLock accessed via `with self._state:`. **Every** method that reads or writes `self._state.events`, `self._state.stats`, `self._state.agent_state`, `self._state.activated_knowledge_skills`, or any other mutable field on `ConversationState` must hold this lock. There are currently ~13 call sites using this pattern.
+
+When reviewing a PR that adds a new method to `LocalConversation`:
+1. Check whether it accesses any `self._state.*` field.
+2. If yes, verify the access is inside a `with self._state:` block.
+3. If not, flag it — the method is unsafe for concurrent use with `run()`.
+
+### Persistence Path Construction
+
+`BaseConversation.get_persistence_dir(base, conversation_id)` returns `str(Path(base) / conversation_id.hex)`. The `LocalConversation.__init__` constructor calls this automatically when `persistence_dir` is provided.
+
+**Rule:** Callers that pass `persistence_dir` to `LocalConversation()` must pass only the **base directory** (e.g., `/data/conversations/`). The constructor appends the conversation hex. Passing a pre-constructed full path (e.g., `/data/conversations/abc123`) causes double-appending: `/data/conversations/abc123/abc123`.
+
+When reviewing code that creates a new `LocalConversation` (fork, resume, migration):
+1. Check what value is passed as `persistence_dir`.
+2. Verify it does **not** already include the conversation ID hex.
+
+### Server-Side Error Handling
+
+Server endpoints in `conversation_service.py` that create persistent state (writing directories, files, or calling `fork()` which writes to disk) and then perform follow-up operations (like `_start_event_service`) must handle partial failure.
+
+**Pattern:** If the follow-up operation fails, clean up the already-written persistent state so it doesn't become an orphaned directory that confuses future startups.
+
+```python
+# Good: rollback on failure
+fork_dir = self.conversations_dir / fork_conv_id.hex
+try:
+    fork_event_service = await self._start_event_service(fork_stored)
+except Exception:
+    safe_rmtree(fork_dir)
+    raise
+```
+
+When reviewing server endpoints that create conversations or persistent artifacts:
+1. Identify the "point of no return" where state is written to disk.
+2. Check that subsequent operations are wrapped in try/except with cleanup.
+3. For client-supplied IDs, verify there's a duplicate check before creating state (return 409 Conflict if taken).
+
 ## What NOT to Comment On
 
 Do not leave comments for:

diff --git a/examples/01_standalone_sdk/48_conversation_fork.py b/examples/01_standalone_sdk/48_conversation_fork.py
@@ -0,0 +1,105 @@
+"""Fork a conversation to branch off for follow-up exploration.
+
+``Conversation.fork()`` deep-copies a conversation — events, agent config,
+workspace metadata — into a new conversation with its own ID.  The fork
+starts in ``idle`` status and retains full event memory of the source, so
+calling ``run()`` picks up right where the original left off.
+
+Use cases:
+  - CI agents that produced a wrong patch — engineer forks to debug
+    without losing the original run's audit trail
+  - A/B-testing prompts — fork at a given turn, change one variable,
+    compare downstream
+  - Swapping tools mid-conversation (fork-on-tool-change)
+"""
+
+import os
+
+from openhands.sdk import LLM, Agent, Conversation, Tool
+from openhands.tools.terminal import TerminalTool
+
+
+# -----------------------------------------------------------------
+# Setup
+# -----------------------------------------------------------------
+llm = LLM(
+    model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"),
+    api_key=os.getenv("LLM_API_KEY"),
+    base_url=os.getenv("LLM_BASE_URL", None),
+)
+
+agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)])
+cwd = os.getcwd()
+
+# =================================================================
+# 1. Run the source conversation
+# =================================================================
+source = Conversation(agent=agent, workspace=cwd)
+source.send_message("Run `echo hello-from-source` in the terminal.")
+source.run()
+
+print("=" * 64)
+print("  Conversation.fork() — SDK Example")
+print("=" * 64)
+print(f"\nSource conversation ID : {source.id}")
+print(f"Source events count    : {len(source.state.events)}")
+
+# =================================================================
+# 2. Fork and continue independently
+# =================================================================
+fork = source.fork(title="Follow-up fork")
+source_event_count = len(source.state.events)
+
+print("\n--- Fork created ---")
+print(f"Fork ID                : {fork.id}")
+print(f"Fork events (copied)   : {len(fork.state.events)}")
+print(f"Fork title             : {fork.state.tags.get('title')}")
+
+assert fork.id != source.id
+assert len(fork.state.events) == source_event_count
+
+fork.send_message("Now run `echo hello-from-fork` in the terminal.")
+fork.run()
+
+# Source is untouched
+assert len(source.state.events) == source_event_count
+print("\n--- After running fork ---")
+print(f"Source events (unchanged): {source_event_count}")
+print(f"Fork events (grew)       : {len(fork.state.events)}")
+
+# =================================================================
+# 3. Fork with a different agent (tool-change / A/B testing)
+# =================================================================
+alt_llm = LLM(
+    model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"),
+    api_key=os.getenv("LLM_API_KEY"),
+    base_url=os.getenv("LLM_BASE_URL", None),
+    usage_id="alt",
+)
+alt_agent = Agent(llm=alt_llm, tools=[Tool(name=TerminalTool.name)])
+
+fork_alt = source.fork(
+    agent=alt_agent,
+    title="Tool-change experiment",
+    tags={"purpose": "a/b-test"},
+)
+
+print("\n--- Fork with alternate agent ---")
+print(f"Fork ID     : {fork_alt.id}")
+print(f"Fork tags   : {dict(fork_alt.state.tags)}")
+
+fork_alt.send_message("What command did you run earlier? Just tell me, no tools.")
+fork_alt.run()
+
+print(f"Fork events : {len(fork_alt.state.events)}")
+
+# =================================================================
+# Summary
+# =================================================================
+print(f"\n{'=' * 64}")
+print("All done — fork() works end-to-end.")
+print("=" * 64)
+
+# Report cost
+cost = llm.metrics.accumulated_cost + alt_llm.metrics.accumulated_cost
+print(f"EXAMPLE_COST: {cost}")
diff --git a/examples/02_remote_agent_server/11_conversation_fork.py b/examples/02_remote_agent_server/11_conversation_fork.py
@@ -0,0 +1,201 @@
+"""Fork a conversation through the agent server REST API.
+
+Demonstrates ``RemoteConversation.fork()`` which delegates to the server's
+``POST /api/conversations/{id}/fork`` endpoint.  The fork deep-copies
+events and state on the server side, then returns a new
+``RemoteConversation`` pointing at the copy.
+
+Scenarios covered:
+  1. Run a source conversation on the server
+  2. Fork it — verify independent event histories
+  3. Fork with a title and custom tags
+"""
+
+import os
+import subprocess
+import sys
+import tempfile
+import threading
+import time
+
+from pydantic import SecretStr
+
+from openhands.sdk import LLM, Agent, Conversation, RemoteConversation, Tool, Workspace
+from openhands.tools.terminal import TerminalTool
+
+
+# -----------------------------------------------------------------
+# Managed server helper (reused from example 01)
+# -----------------------------------------------------------------
+def _stream_output(stream, prefix, target_stream):
+    try:
+        for line in iter(stream.readline, ""):
+            if line:
+                target_stream.write(f"[{prefix}] {line}")
+                target_stream.flush()
+    except Exception as e:
+        print(f"Error streaming {prefix}: {e}", file=sys.stderr)
+    finally:
+        stream.close()
+
+
+class ManagedAPIServer:
+    """Context manager that starts and stops a local agent-server."""
+
+    def __init__(self, port: int = 8000, host: str = "127.0.0.1"):
+        self.port = port
+        self.host = host
+        self.process: subprocess.Popen[str] | None = None
+        self.base_url = f"http://{host}:{port}"
+
+    def __enter__(self):
+        print(f"Starting agent-server on {self.base_url} ...")
+        self.process = subprocess.Popen(
+            [
+                "python",
+                "-m",
+                "openhands.agent_server",
+                "--port",
+                str(self.port),
+                "--host",
+                self.host,
+            ],
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            text=True,
+            env={"LOG_JSON": "true", **os.environ},
+        )
+        assert self.process.stdout is not None
+        assert self.process.stderr is not None
+        threading.Thread(
+            target=_stream_output,
+            args=(self.process.stdout, "SERVER", sys.stdout),
+            daemon=True,
+        ).start()
+        threading.Thread(
+            target=_stream_output,
+            args=(self.process.stderr, "SERVER", sys.stderr),
+            daemon=True,
+        ).start()
+
+        import httpx
+
+        for _ in range(30):
+            try:
+                if httpx.get(f"{self.base_url}/health", timeout=1.0).status_code == 200:
+                    print(f"Agent-server ready at {self.base_url}")
+                    return self
+            except Exception:
+                pass
+            assert self.process.poll() is None, "Server exited unexpectedly"
+            time.sleep(1)
+        raise RuntimeError("Server failed to start in 30 s")
+
+    def __exit__(self, *args):
+        if self.process:
+            self.process.terminate()
+            try:
+                self.process.wait(timeout=5)
+            except subprocess.TimeoutExpired:
+                self.process.kill()
+                self.process.wait()
+            time.sleep(0.5)
+            print("Agent-server stopped.")
+
+
+# -----------------------------------------------------------------
+# Config
+# -----------------------------------------------------------------
+api_key = os.getenv("LLM_API_KEY")
+assert api_key, "LLM_API_KEY must be set"
+
+llm = LLM(
+    model=os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"),
+    api_key=SecretStr(api_key),
+    base_url=os.getenv("LLM_BASE_URL"),
+)
+agent = Agent(llm=llm, tools=[Tool(name=TerminalTool.name)])
+
+# -----------------------------------------------------------------
+# Run
+# -----------------------------------------------------------------
+with ManagedAPIServer(port=8002) as server:
+    workspace_dir = tempfile.mkdtemp(prefix="fork_demo_")
+    workspace = Workspace(host=server.base_url, working_dir=workspace_dir)
+
+    # =============================================================
+    # 1. Source conversation
+    # =============================================================
+    source = Conversation(agent=agent, workspace=workspace)
+    assert isinstance(source, RemoteConversation)
+
+    source.send_message("Run `echo hello-from-source` in the terminal.")
+    source.run()
+
+    print("=" * 64)
+    print("  RemoteConversation.fork() — Agent-Server Example")
+    print("=" * 64)
+    print(f"\nSource conversation ID : {source.id}")
+    source_event_count = len(source.state.events)
+    print(f"Source events count    : {source_event_count}")
+
+    # =============================================================
+    # 2. Fork and continue independently
+    # =============================================================
+    fork = source.fork(title="Follow-up fork")
+    assert isinstance(fork, RemoteConversation)
+
+    print("\n--- Fork created ---")
+    print(f"Fork ID                : {fork.id}")
+    fork_event_count = len(fork.state.events)
+    print(f"Fork events (copied)   : {fork_event_count}")
+
+    assert fork.id != source.id
+    # The fork copies all persisted events from the server-side EventLog.
+    # The source's client-side list may additionally contain transient
+    # WebSocket-only events (e.g. full-state snapshots) that are never
+    # persisted, so we only assert the fork has a non-trivial number of
+    # events rather than exact parity.
+    assert fork_event_count > 0
+
+    fork.send_message("Now run `echo hello-from-fork` in the terminal.")
+    fork.run()
+
+    print("\n--- After running fork ---")
+    print(f"Source events          : {len(source.state.events)}")
+    print(f"Fork events (grew)     : {len(fork.state.events)}")
+    assert len(fork.state.events) > fork_event_count
+
+    # =============================================================
+    # 3. Fork with tags
+    # =============================================================
+    fork_tagged = source.fork(
+        title="Tagged experiment",
+        tags={"purpose": "a/b-test"},
+    )
+    assert isinstance(fork_tagged, RemoteConversation)
+
+    print("\n--- Fork with tags ---")
+    print(f"Fork ID     : {fork_tagged.id}")
+
+    fork_tagged.send_message(
+        "What command did you run earlier? Just tell me, no tools."
+    )
+    fork_tagged.run()
+
+    print(f"Fork events : {len(fork_tagged.state.events)}")
+
+    # =============================================================
+    # Summary
+    # =============================================================
+    print(f"\n{'=' * 64}")
+    print("All done — RemoteConversation.fork() works end-to-end.")
+    print("=" * 64)
+
+    # Cleanup
+    fork.close()
+    fork_tagged.close()
+    source.close()
+
+cost = llm.metrics.accumulated_cost
+print(f"EXAMPLE_COST: {cost}")