deeplethe · WaylandYang · Jun 7, 2026 · Jun 6, 2026 · Jun 6, 2026
diff --git a/recipes/README.md b/recipes/README.md
@@ -1,89 +1,82 @@
 # Recipes
 
-Two flavors:
+forkd is a fork-on-write microVM primitive — the "for AI agents"
+framing on the front page is one prominent use case, not the
+ceiling. Anything that wants **N isolated children spawned from a
+warmed parent in milliseconds** fits.
 
-1. **Integration recipes** (host-side, no rootfs build) — Python scripts
-   that drive forkd from an agent framework, demonstrating the
-   BRANCH + fanout pattern with that framework's idioms. Start here if
-   you want to plug forkd into CrewAI / AutoGen / Swarm / Claude
-   Desktop in five minutes.
-2. **Rootfs recipes** (parent images) — `build.sh` scripts that turn
-   public OCI images into forkd parent snapshots. Start here if you
-   want a custom warmed image to fork from.
+## Pick your starting point
 
-## Integration recipes (host-side)
+### By problem you're solving
 
-Read the script, copy the helper, drop it into your project. Each is
-~150-250 lines of Python with a `--dry-run` mode so you can verify
-the forkd plumbing without an LLM key.
-
-| Recipe | Framework | Forkd-specific value |
+| Problem | Recipes | What forkd buys you |
 |---|---|---|
-| [`mcp-agent/`](./mcp-agent/) | Claude Desktop / Cursor / Cline (via MCP) | End-to-end MCP protocol verification — same JSON-RPC framing a real LLM client uses |
-| [`crewai-fanout/`](./crewai-fanout/) | CrewAI | N agents on N microVMs from one warmed parent — per-agent isolation without Docker cold-start |
-| [`autogen-branch/`](./autogen-branch/) | AutoGen | Forkd-backed `CodeExecutor` + mid-conversation BRANCH that fans out N alternatives from the same warmed state |
-| [`openai-swarm/`](./openai-swarm/) | OpenAI Swarm / Agents SDK | Handoff = BRANCH: agent B inherits agent A's full VM state (filesystem, imports, env) on handoff |
+| **AI agent fan-out** — try N approaches, branch a thinking agent | [`langgraph-react/`](./langgraph-react/) · [`crewai-fanout/`](./crewai-fanout/) · [`autogen-branch/`](./autogen-branch/) · [`openai-swarm/`](./openai-swarm/) · [`mcp-agent/`](./mcp-agent/) · [`speculative-agent/`](./speculative-agent/) · [`coding-agent-fork/`](./coding-agent-fork/) | Per-child KVM isolation + warmed runtime inheritance. The "fork mid-thought" story. |
+| **CI test parallelism** — run 100 pytest workers from a warmed parent | [`postgres-fixture/`](./postgres-fixture/) (DB-per-test) · [`ci-parallel-pytest/`](./ci-parallel-pytest/) (worker fan-out) | Skip per-worker container cold-start + dependency install. ~50 ms / worker instead of ~3 s. |
+| **Database test fixtures** — fresh, isolated postgres per test | [`postgres-fixture/`](./postgres-fixture/) | `initdb` runs **once** at parent build; every fork inherits the post-init state. ~200× faster than per-test container. |
+| **Browser automation farms** — Playwright / Puppeteer fan-out at scale | [`playwright-browser/`](./playwright-browser/) | Fork warmed headless Chromium at ~10 ms instead of ~2 s cold-boot. |
+| **Notebook / code interpreter** — Jupyter kernel per session | [`jupyter-kernel/`](./jupyter-kernel/) · [`e2b-codeinterpreter/`](./e2b-codeinterpreter/) | Full SciPy stack pre-imported. ~1 ms per fresh kernel. |
+| **General-purpose compute fan-out** — anything that needs N warmed sandboxes | [`python-numpy/`](./python-numpy/) · [`coding-agent/`](./coding-agent/) · [`nodejs/`](./nodejs/) · [`agent-workbench/`](./agent-workbench/) | Pre-baked language runtime + canonical fan-out benchmark. |
+
+### By integration framework (host-side Python scripts)
 
-## Rootfs recipes
+If you're plugging forkd into an existing agent framework, these
+are ~150-250 lines of Python with a `--dry-run` mode so you can
+verify the forkd plumbing without an LLM key.
 
-Ready-made parent-rootfs recipes for common workbench images.
-Each recipe takes a public Docker / OCI image and turns it into a
-forkd parent snapshot, so you can fork N warmed children from it
-in milliseconds.
+| Framework | Recipe |
+|---|---|
+| Claude Desktop / Cursor / Cline (via MCP) | [`mcp-agent/`](./mcp-agent/) |
+| CrewAI multi-agent crew | [`crewai-fanout/`](./crewai-fanout/) |
+| AutoGen ConversableAgent / GroupChat | [`autogen-branch/`](./autogen-branch/) |
+| OpenAI Swarm / Agents SDK | [`openai-swarm/`](./openai-swarm/) |
+| LangGraph ReAct (the front-page demo) | [`langgraph-react/`](./langgraph-react/) |
 
-The pattern is the same across recipes:
+## How rootfs recipes work
+
+Rootfs recipes turn a public Docker / OCI image into a forkd parent
+snapshot. Same shape across all of them:
 
 ```bash
 # 1. Build a parent rootfs from an upstream image
 sudo bash recipes/<name>/build.sh
 
 # 2. Snapshot the warmed parent (one-time per image version)
 sudo forkd snapshot --tag <name> \
-    --kernel ./vmlinux-6.1.141 \
+    --kernel /var/lib/forkd/kernels/vmlinux \
     --rootfs recipes/<name>/parent.ext4 \
     --tap forkd-tap0
 
 # 3. Fork N children, fan-out workload
 sudo -E forkd fork --tag <name> -n 100 --per-child-netns
 ```
 
+The first-time `build.sh` of each recipe takes a few minutes
+(pulling the Docker image + converting to ext4). The snapshot step
+is ~10 s. After that, forking children is the published benchmark
+cost.
+
 ### Available rootfs recipes
 
-| Recipe | Parent image | Size | Audience |
+| Recipe | Parent image | Size | Best for |
 |---|---|---|---|
-| [`python-numpy/`](./python-numpy/) | `python:3.12-slim` + `python3-numpy` | ~1.5 GB | The canonical fan-out demo; what the chart on the front README measures |
-| [`e2b-codeinterpreter/`](./e2b-codeinterpreter/) | `e2bdev/code-interpreter` | ~600 MB | AI code-execution agents (Anthropic / OpenAI tutorials use this image). Lightest "agent ready" option |
-| [`jupyter-kernel/`](./jupyter-kernel/) | `quay.io/jupyter/scipy-notebook` | ~3 GB | Code-interpreter / notebook-style agents — full SciPy stack pre-imported, ~1 ms per fresh kernel instead of ~2 s |
-| [`coding-agent/`](./coding-agent/) | `python:3.12` + git + ruff + black + pytest | ~1.8 GB | SWE-style coding agents that need a real dev toolchain inside the sandbox |
-| [`nodejs/`](./nodejs/) | `node:22-slim` | ~250 MB | JavaScript / TypeScript workloads (Jest, Playwright fan-out) |
-| [`playwright-browser/`](./playwright-browser/) | `mcr.microsoft.com/playwright` (Node + Chromium pre-warmed) | ~2.5 GB | Browser-driving agents (computer-use, web research, UI test gen). Fork warmed headless Chromium at ~10 ms instead of ~2 s. **Alpha** |
-| [`agent-workbench/`](./agent-workbench/) | `agent-infra/sandbox` (browser + VSCode + Jupyter + MCP + shell) | ~5 GB | Kitchen-sink agent workbench when you want every tool already mounted; trades a bigger memory.bin for batteries-included |
-| [`postgres-fixture/`](./postgres-fixture/) | `postgres:16` (initdb done, postmaster pre-launched) | ~500 MB | Fork-per-test isolated databases; each child gets a ready-to-query postgres in ~10 ms instead of ~2 s for fresh initdb |
-
-## Choosing a recipe
-
-**By framework / driver (integration recipes):**
-- **Claude Desktop / Cursor / Cline** → `mcp-agent/`
-- **CrewAI multi-agent crew** → `crewai-fanout/`
-- **AutoGen ConversableAgent / GroupChat** → `autogen-branch/`
-- **OpenAI Swarm / Agents SDK** → `openai-swarm/`
-
-**By workload (rootfs recipes):**
-- **You're benchmarking** → `python-numpy/`
-- **You're running an AI code interpreter** → `e2b-codeinterpreter/`
-- **You need the full SciPy / notebook stack** → `jupyter-kernel/`
-- **You're running a coding agent (SWE-bench style)** → `coding-agent/`
-- **JS / TS only** → `nodejs/`
-- **Browser-driving agent (computer-use, scraping, UI testing)** → `playwright-browser/`
-- **You want browser + IDE + everything in one box** → `agent-workbench/`
-- **You're running a test suite that needs an isolated DB per test** → `postgres-fixture/`
+| [`python-numpy/`](./python-numpy/) | `python:3.12-slim` + `python3-numpy` | ~1.5 GB | **The canonical fan-out benchmark** — what the chart on the front README measures |
+| [`postgres-fixture/`](./postgres-fixture/) | `postgres:16` (initdb done, postmaster pre-launched) | ~500 MB | **Fork-per-test isolated databases** — each child gets a ready-to-query postgres in ~10 ms vs ~2 s for fresh initdb |
+| [`ci-parallel-pytest/`](./ci-parallel-pytest/) | `python:3.12-slim` + numpy/pandas/sklearn + your test suite | ~2 GB | **CI test fan-out** — parallel pytest workers without per-worker container cold-start |
+| [`playwright-browser/`](./playwright-browser/) | `mcr.microsoft.com/playwright` (Node + Chromium pre-warmed) | ~2.5 GB | **Browser automation farms** — warmed headless Chromium at ~10 ms instead of ~2 s. **Alpha** |
+| [`jupyter-kernel/`](./jupyter-kernel/) | `quay.io/jupyter/scipy-notebook` | ~3 GB | **Code-interpreter / notebook agents** — full SciPy stack pre-imported, ~1 ms per fresh kernel |
+| [`e2b-codeinterpreter/`](./e2b-codeinterpreter/) | `e2bdev/code-interpreter` | ~600 MB | **AI code-execution agents** (Anthropic / OpenAI tutorials use this image). Lightest "agent ready" option |
+| [`coding-agent/`](./coding-agent/) | `python:3.12` + git + ruff + black + pytest | ~1.8 GB | **SWE-style coding agents** that need a real dev toolchain inside the sandbox |
+| [`nodejs/`](./nodejs/) | `node:22-slim` | ~250 MB | **JavaScript / TypeScript workloads** (Jest, Playwright fan-out, scraping) |
+| [`agent-workbench/`](./agent-workbench/) | `agent-infra/sandbox` (browser + VSCode + Jupyter + MCP + shell) | ~5 GB | **Kitchen-sink workbench** when you want every tool already mounted |
 
 ## Notes
 
 - Recipes are tested on Ubuntu 24.04 / Linux 6.14 / x86_64. Other distros
   may need adjustments to `scripts/build-rootfs.sh`.
-- The first-time `build.sh` of each recipe takes a few minutes (pulling
-  the Docker image + converting to ext4). The snapshot step is ~10 s.
-  After that, forking children is the published benchmark cost.
 - Each recipe is self-contained — pick one, run it; you don't need to
   understand the others.
+- The "AI agent" framing on the project front page is the dominant use
+  case **today** but not the only one — the technology is `fork(2)` for
+  KVM microVMs. If your workload needs N hardware-isolated children
+  spawned from a warmed parent in milliseconds, forkd is the primitive.
diff --git a/recipes/ci-parallel-pytest/README.md b/recipes/ci-parallel-pytest/README.md
@@ -0,0 +1,174 @@
+# `ci-parallel-pytest`
+
+**Run your pytest suite across N forkd microVMs in parallel,
+without paying per-worker container cold-start or dependency
+import cost.**
+
+A typical Python CI job re-imports numpy / pandas / scikit-learn on
+every fresh worker container — ~1-2 s of pure overhead before the
+first test runs. With forkd, those imports live in the warmed
+parent's snapshot; every fork inherits them via `mmap MAP_PRIVATE`
+copy-on-write. Per-worker fixed cost drops to ~50-100 ms.
+
+## Architecture
+
+```
+                 ┌──────────────────────────────────────┐
+                 │  parent snapshot `ci-pytest`         │
+                 │  python:3.12-slim                    │
+                 │  + pytest 8.3 numpy 2.0 pandas 2.2   │
+                 │  + scikit-learn 1.5                  │
+                 │  + your /opt/test_project            │
+                 │  (heavy imports already paid)        │
+                 └────────────────┬─────────────────────┘
+                                  │  mmap MAP_PRIVATE (CoW)
+            ┌─────────────────────┼─────────────────────┐
+            │                     │                     │
+       ┌────▼───────┐       ┌─────▼──────┐       ┌──────▼─────┐
+       │ worker 1   │       │ worker 2   │       │ worker N   │
+       │ pytest     │       │ pytest     │       │ pytest     │
+       │ slice 1/N  │  ...  │ slice 2/N  │  ...  │ slice N/N  │
+       └────────────┘       └────────────┘       └────────────┘
+                            run in parallel
+```
+
+## What ships in this recipe
+
+| File | What it does |
+|---|---|
+| [`build.sh`](./build.sh) | Builds a forkd parent rootfs: `python:3.12-slim` + pinned pytest/numpy/pandas/sklearn, the demo test project copied to `/opt/test_project`, and a pre-warm step that imports the heavy deps so they're in the snapshot's page cache |
+| [`test_project/`](./test_project/) | A representative pytest project — ~30 tests across 5 files (arithmetic, numpy, pandas, sklearn, text). Replace with your own |
+| [`demo.py`](./demo.py) | Fan-out driver: slices test files across N forkd workers, runs each slice in a child sandbox, reports per-worker spawn/exec timing + total wall-clock + sequential-baseline comparison |
+
+## When to use this
+
+- **CI pipelines with 100s of pytest tests** that re-import heavy
+  ML libs every run. The savings compound: every PR run, every
+  retry, every nightly.
+- **PR-preview environments** where each PR needs its own clean
+  pytest run with fresh side-effects (DB, filesystem, env). forkd's
+  per-child KVM isolation means workers truly don't see each other.
+- **Sharded fuzz / property testing**: split a 10 000-iteration
+  Hypothesis run across N microVMs without setup tax.
+
+## When NOT to use this
+
+- Your test suite is < 30 tests and finishes in < 2 s sequentially —
+  parallelism overhead exceeds the gain.
+- You don't actually need per-worker isolation (e.g. pure-function
+  unit tests with no shared state) — `pytest -n <N>` (pytest-xdist)
+  in a single container is simpler.
+- You can't run forkd on your CI host (managed CI like default
+  GitHub Actions, no KVM). For self-hosted runners with bare-metal
+  Linux + KVM this works great.
+
+## Quickstart
+
+```bash
+# 1. Build the parent (one-time, ~5 min — pip install pandas + sklearn
+#    dominates the time)
+sudo bash recipes/ci-parallel-pytest/build.sh
+
+# 2. Snapshot the warmed parent (one-time, ~10 s)
+sudo forkd snapshot --tag ci-pytest \
+    --kernel /var/lib/forkd/kernels/vmlinux \
+    --rootfs recipes/ci-parallel-pytest/parent.ext4 \
+    --tap forkd-tap0
+
+# 3. Fan out — 4 workers in parallel
+FORKD_TOKEN=$(sudo cat /tmp/bench-pause/token) \
+    python3 recipes/ci-parallel-pytest/demo.py --workers 4 \
+                                               --sequential-baseline
+```
+
+Output from the dev box (Intel i7-12700, ext4, 2026-06-06):
+
+```
+Plan: 4 worker(s) × pytest slice off `ci-pytest`.
+  worker 0: 2 file(s) — test_arithmetic.py, test_text_processing.py
+  worker 1: 1 file(s) — test_numpy_ops.py
+  worker 2: 1 file(s) — test_pandas_etl.py
+  worker 3: 1 file(s) — test_sklearn_models.py
+
+=== fan-out: 4 workers in parallel ===
+  batch spawn (4 children): 81 ms
+  [0] PASS  exec= 232 ms  files=test_arithmetic.py,test_text_processing.py
+  [1] PASS  exec= 304 ms  files=test_numpy_ops.py
+  [2] PASS  exec= 546 ms  files=test_pandas_etl.py
+  [3] PASS  exec=1458 ms  files=test_sklearn_models.py
+
+fan-out wall-clock:  1601 ms   (batch spawn=81 ms = ~20 ms/worker,
+                                slowest worker exec=1458 ms)
+
+=== sequential baseline: one child runs the whole suite ===
+  [0] PASS  spawn=61 ms  exec=1507 ms
+sequential wall-clock: 1625 ms   (fan-out speedup: 1.01×)
+```
+
+The 1.01× fan-out-vs-sequential figure is honest: this demo suite
+only has ~30 tests and is dominated by one sklearn slice (1458 ms).
+Fan-out shines when **your suite has many slow slices of comparable
+size** — e.g. 8 sklearn-heavy slices each taking ~1.5 s would fan
+out to ~1.5 s wall, vs ~12 s sequentially.
+
+**The number that matters across suite shapes is the batch spawn
+cost: 81 ms for 4 children — ~20 ms per worker.** That's the
+forkd-vs-container comparison: ~20 ms to start a forkd worker vs
+~2-3 s to start a fresh container.
+
+## GitHub Actions integration
+
+Drop this into your workflow on a self-hosted runner that has forkd
++ a `ci-pytest` snapshot pre-built:
+
+```yaml
+jobs:
+  test:
+    runs-on: [self-hosted, linux, x64, forkd]
+    steps:
+      - uses: actions/checkout@v4
+      - name: Refresh the parent snapshot
+        run: |
+          sudo cp -r ./tests /opt/test_project/tests   # mount your tests into the snap dir
+          # or rebuild the parent if your deps changed:
+          # sudo bash recipes/ci-parallel-pytest/build.sh
+      - name: Fan out
+        env:
+          FORKD_TOKEN: ${{ secrets.FORKD_TOKEN }}
+        run: |
+          python3 recipes/ci-parallel-pytest/demo.py \
+              --workers 8 \
+              --snapshot-tag ci-pytest
+```
+
+For a hosted-runner setup, the equivalent is one forkd daemon on
+your CI infrastructure, exposed over a port the runner can reach.
+
+## How it compares
+
+| Approach | Per-worker fixed cost | Notes |
+|---|---|---|
+| `pytest` sequential, fresh container | ~2 s container cold + ~1.5 s `import numpy/pandas/sklearn` | Each PR run / retry / nightly re-pays both |
+| `pytest-xdist -n 4` in one container | ~3.5 s container cold + ~1.5 s imports (shared across workers) | Single shared kernel; one test crash takes the host down |
+| `docker run` × 4 fresh containers | ~3.5 s × 4 cold-starts, parallelized | Per-container isolation, but slow to spawn |
+| **forkd fan-out (this recipe)** | **~20 ms batch spawn + 0 ms imports** | Per-child KVM isolation, warmed Python deps inherited via mmap CoW |
+
+The break-even point is roughly: if your sequential test slice is
+slower than your container cold-start (~3 s), container
+parallelism is fine. If your slice is **comparable to or shorter
+than** the ~3 s container tax, forkd wins outright. ML / data
+science suites where you re-pay sklearn / torch import on every
+worker fall squarely in the forkd-wins zone.
+
+## Caveats
+
+- **`pip install` inside snapshots requires v0.5.1+** — the guest
+  kernel rebuild that landed in #226 closed #218 (CRNG starvation
+  blocked OpenSSL → pip hung). Confirm your kernel:
+  `forkd snapshot-info ci-pytest`
+- **Per-worker netns is on by default** — children get their own
+  `lo`, no cross-talk. If your tests need to hit a shared DB, use
+  `--per-child-netns=false` or put the DB on the host tap.
+- **Worker count vs vCPU**: forkd's per-vCPU policy is "share the
+  host's cores". On a 20-core host, 8 workers is comfortable; 50
+  is over-subscribed.