Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 49 additions & 56 deletions recipes/README.md
Original file line number Diff line number Diff line change
@@ -1,89 +1,82 @@
# Recipes

Two flavors:
forkd is a fork-on-write microVM primitive — the "for AI agents"
framing on the front page is one prominent use case, not the
ceiling. Anything that wants **N isolated children spawned from a
warmed parent in milliseconds** fits.

1. **Integration recipes** (host-side, no rootfs build) — Python scripts
that drive forkd from an agent framework, demonstrating the
BRANCH + fanout pattern with that framework's idioms. Start here if
you want to plug forkd into CrewAI / AutoGen / Swarm / Claude
Desktop in five minutes.
2. **Rootfs recipes** (parent images) — `build.sh` scripts that turn
public OCI images into forkd parent snapshots. Start here if you
want a custom warmed image to fork from.
## Pick your starting point

## Integration recipes (host-side)
### By problem you're solving

Read the script, copy the helper, drop it into your project. Each is
~150-250 lines of Python with a `--dry-run` mode so you can verify
the forkd plumbing without an LLM key.

| Recipe | Framework | Forkd-specific value |
| Problem | Recipes | What forkd buys you |
|---|---|---|
| [`mcp-agent/`](./mcp-agent/) | Claude Desktop / Cursor / Cline (via MCP) | End-to-end MCP protocol verification — same JSON-RPC framing a real LLM client uses |
| [`crewai-fanout/`](./crewai-fanout/) | CrewAI | N agents on N microVMs from one warmed parent — per-agent isolation without Docker cold-start |
| [`autogen-branch/`](./autogen-branch/) | AutoGen | Forkd-backed `CodeExecutor` + mid-conversation BRANCH that fans out N alternatives from the same warmed state |
| [`openai-swarm/`](./openai-swarm/) | OpenAI Swarm / Agents SDK | Handoff = BRANCH: agent B inherits agent A's full VM state (filesystem, imports, env) on handoff |
| **AI agent fan-out** — try N approaches, branch a thinking agent | [`langgraph-react/`](./langgraph-react/) · [`crewai-fanout/`](./crewai-fanout/) · [`autogen-branch/`](./autogen-branch/) · [`openai-swarm/`](./openai-swarm/) · [`mcp-agent/`](./mcp-agent/) · [`speculative-agent/`](./speculative-agent/) · [`coding-agent-fork/`](./coding-agent-fork/) | Per-child KVM isolation + warmed runtime inheritance. The "fork mid-thought" story. |
| **CI test parallelism** — run 100 pytest workers from a warmed parent | [`postgres-fixture/`](./postgres-fixture/) (DB-per-test) · [`ci-parallel-pytest/`](./ci-parallel-pytest/) (worker fan-out) | Skip per-worker container cold-start + dependency install. ~50 ms / worker instead of ~3 s. |
| **Database test fixtures** — fresh, isolated postgres per test | [`postgres-fixture/`](./postgres-fixture/) | `initdb` runs **once** at parent build; every fork inherits the post-init state. ~200× faster than per-test container. |
| **Browser automation farms** — Playwright / Puppeteer fan-out at scale | [`playwright-browser/`](./playwright-browser/) | Fork warmed headless Chromium at ~10 ms instead of ~2 s cold-boot. |
| **Notebook / code interpreter** — Jupyter kernel per session | [`jupyter-kernel/`](./jupyter-kernel/) · [`e2b-codeinterpreter/`](./e2b-codeinterpreter/) | Full SciPy stack pre-imported. ~1 ms per fresh kernel. |
| **General-purpose compute fan-out** — anything that needs N warmed sandboxes | [`python-numpy/`](./python-numpy/) · [`coding-agent/`](./coding-agent/) · [`nodejs/`](./nodejs/) · [`agent-workbench/`](./agent-workbench/) | Pre-baked language runtime + canonical fan-out benchmark. |

### By integration framework (host-side Python scripts)

## Rootfs recipes
If you're plugging forkd into an existing agent framework, these
are ~150-250 lines of Python with a `--dry-run` mode so you can
verify the forkd plumbing without an LLM key.

Ready-made parent-rootfs recipes for common workbench images.
Each recipe takes a public Docker / OCI image and turns it into a
forkd parent snapshot, so you can fork N warmed children from it
in milliseconds.
| Framework | Recipe |
|---|---|
| Claude Desktop / Cursor / Cline (via MCP) | [`mcp-agent/`](./mcp-agent/) |
| CrewAI multi-agent crew | [`crewai-fanout/`](./crewai-fanout/) |
| AutoGen ConversableAgent / GroupChat | [`autogen-branch/`](./autogen-branch/) |
| OpenAI Swarm / Agents SDK | [`openai-swarm/`](./openai-swarm/) |
| LangGraph ReAct (the front-page demo) | [`langgraph-react/`](./langgraph-react/) |

The pattern is the same across recipes:
## How rootfs recipes work

Rootfs recipes turn a public Docker / OCI image into a forkd parent
snapshot. Same shape across all of them:

```bash
# 1. Build a parent rootfs from an upstream image
sudo bash recipes/<name>/build.sh

# 2. Snapshot the warmed parent (one-time per image version)
sudo forkd snapshot --tag <name> \
--kernel ./vmlinux-6.1.141 \
--kernel /var/lib/forkd/kernels/vmlinux \
--rootfs recipes/<name>/parent.ext4 \
--tap forkd-tap0

# 3. Fork N children, fan-out workload
sudo -E forkd fork --tag <name> -n 100 --per-child-netns
```

The first-time `build.sh` of each recipe takes a few minutes
(pulling the Docker image + converting to ext4). The snapshot step
is ~10 s. After that, forking children is the published benchmark
cost.

### Available rootfs recipes

| Recipe | Parent image | Size | Audience |
| Recipe | Parent image | Size | Best for |
|---|---|---|---|
| [`python-numpy/`](./python-numpy/) | `python:3.12-slim` + `python3-numpy` | ~1.5 GB | The canonical fan-out demo; what the chart on the front README measures |
| [`e2b-codeinterpreter/`](./e2b-codeinterpreter/) | `e2bdev/code-interpreter` | ~600 MB | AI code-execution agents (Anthropic / OpenAI tutorials use this image). Lightest "agent ready" option |
| [`jupyter-kernel/`](./jupyter-kernel/) | `quay.io/jupyter/scipy-notebook` | ~3 GB | Code-interpreter / notebook-style agents — full SciPy stack pre-imported, ~1 ms per fresh kernel instead of ~2 s |
| [`coding-agent/`](./coding-agent/) | `python:3.12` + git + ruff + black + pytest | ~1.8 GB | SWE-style coding agents that need a real dev toolchain inside the sandbox |
| [`nodejs/`](./nodejs/) | `node:22-slim` | ~250 MB | JavaScript / TypeScript workloads (Jest, Playwright fan-out) |
| [`playwright-browser/`](./playwright-browser/) | `mcr.microsoft.com/playwright` (Node + Chromium pre-warmed) | ~2.5 GB | Browser-driving agents (computer-use, web research, UI test gen). Fork warmed headless Chromium at ~10 ms instead of ~2 s. **Alpha** |
| [`agent-workbench/`](./agent-workbench/) | `agent-infra/sandbox` (browser + VSCode + Jupyter + MCP + shell) | ~5 GB | Kitchen-sink agent workbench when you want every tool already mounted; trades a bigger memory.bin for batteries-included |
| [`postgres-fixture/`](./postgres-fixture/) | `postgres:16` (initdb done, postmaster pre-launched) | ~500 MB | Fork-per-test isolated databases; each child gets a ready-to-query postgres in ~10 ms instead of ~2 s for fresh initdb |

## Choosing a recipe

**By framework / driver (integration recipes):**
- **Claude Desktop / Cursor / Cline** → `mcp-agent/`
- **CrewAI multi-agent crew** → `crewai-fanout/`
- **AutoGen ConversableAgent / GroupChat** → `autogen-branch/`
- **OpenAI Swarm / Agents SDK** → `openai-swarm/`

**By workload (rootfs recipes):**
- **You're benchmarking** → `python-numpy/`
- **You're running an AI code interpreter** → `e2b-codeinterpreter/`
- **You need the full SciPy / notebook stack** → `jupyter-kernel/`
- **You're running a coding agent (SWE-bench style)** → `coding-agent/`
- **JS / TS only** → `nodejs/`
- **Browser-driving agent (computer-use, scraping, UI testing)** → `playwright-browser/`
- **You want browser + IDE + everything in one box** → `agent-workbench/`
- **You're running a test suite that needs an isolated DB per test** → `postgres-fixture/`
| [`python-numpy/`](./python-numpy/) | `python:3.12-slim` + `python3-numpy` | ~1.5 GB | **The canonical fan-out benchmark** — what the chart on the front README measures |
| [`postgres-fixture/`](./postgres-fixture/) | `postgres:16` (initdb done, postmaster pre-launched) | ~500 MB | **Fork-per-test isolated databases** — each child gets a ready-to-query postgres in ~10 ms vs ~2 s for fresh initdb |
| [`ci-parallel-pytest/`](./ci-parallel-pytest/) | `python:3.12-slim` + numpy/pandas/sklearn + your test suite | ~2 GB | **CI test fan-out** — parallel pytest workers without per-worker container cold-start |
| [`playwright-browser/`](./playwright-browser/) | `mcr.microsoft.com/playwright` (Node + Chromium pre-warmed) | ~2.5 GB | **Browser automation farms** — warmed headless Chromium at ~10 ms instead of ~2 s. **Alpha** |
| [`jupyter-kernel/`](./jupyter-kernel/) | `quay.io/jupyter/scipy-notebook` | ~3 GB | **Code-interpreter / notebook agents** — full SciPy stack pre-imported, ~1 ms per fresh kernel |
| [`e2b-codeinterpreter/`](./e2b-codeinterpreter/) | `e2bdev/code-interpreter` | ~600 MB | **AI code-execution agents** (Anthropic / OpenAI tutorials use this image). Lightest "agent ready" option |
| [`coding-agent/`](./coding-agent/) | `python:3.12` + git + ruff + black + pytest | ~1.8 GB | **SWE-style coding agents** that need a real dev toolchain inside the sandbox |
| [`nodejs/`](./nodejs/) | `node:22-slim` | ~250 MB | **JavaScript / TypeScript workloads** (Jest, Playwright fan-out, scraping) |
| [`agent-workbench/`](./agent-workbench/) | `agent-infra/sandbox` (browser + VSCode + Jupyter + MCP + shell) | ~5 GB | **Kitchen-sink workbench** when you want every tool already mounted |

## Notes

- Recipes are tested on Ubuntu 24.04 / Linux 6.14 / x86_64. Other distros
may need adjustments to `scripts/build-rootfs.sh`.
- The first-time `build.sh` of each recipe takes a few minutes (pulling
the Docker image + converting to ext4). The snapshot step is ~10 s.
After that, forking children is the published benchmark cost.
- Each recipe is self-contained — pick one, run it; you don't need to
understand the others.
- The "AI agent" framing on the project front page is the dominant use
case **today** but not the only one — the technology is `fork(2)` for
KVM microVMs. If your workload needs N hardware-isolated children
spawned from a warmed parent in milliseconds, forkd is the primitive.
174 changes: 174 additions & 0 deletions recipes/ci-parallel-pytest/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# `ci-parallel-pytest`

**Run your pytest suite across N forkd microVMs in parallel,
without paying per-worker container cold-start or dependency
import cost.**

A typical Python CI job re-imports numpy / pandas / scikit-learn on
every fresh worker container — ~1-2 s of pure overhead before the
first test runs. With forkd, those imports live in the warmed
parent's snapshot; every fork inherits them via `mmap MAP_PRIVATE`
copy-on-write. Per-worker fixed cost drops to ~50-100 ms.

## Architecture

```
┌──────────────────────────────────────┐
│ parent snapshot `ci-pytest` │
│ python:3.12-slim │
│ + pytest 8.3 numpy 2.0 pandas 2.2 │
│ + scikit-learn 1.5 │
│ + your /opt/test_project │
│ (heavy imports already paid) │
└────────────────┬─────────────────────┘
│ mmap MAP_PRIVATE (CoW)
┌─────────────────────┼─────────────────────┐
│ │ │
┌────▼───────┐ ┌─────▼──────┐ ┌──────▼─────┐
│ worker 1 │ │ worker 2 │ │ worker N │
│ pytest │ │ pytest │ │ pytest │
│ slice 1/N │ ... │ slice 2/N │ ... │ slice N/N │
└────────────┘ └────────────┘ └────────────┘
run in parallel
```

## What ships in this recipe

| File | What it does |
|---|---|
| [`build.sh`](./build.sh) | Builds a forkd parent rootfs: `python:3.12-slim` + pinned pytest/numpy/pandas/sklearn, the demo test project copied to `/opt/test_project`, and a pre-warm step that imports the heavy deps so they're in the snapshot's page cache |
| [`test_project/`](./test_project/) | A representative pytest project — ~30 tests across 5 files (arithmetic, numpy, pandas, sklearn, text). Replace with your own |
| [`demo.py`](./demo.py) | Fan-out driver: slices test files across N forkd workers, runs each slice in a child sandbox, reports per-worker spawn/exec timing + total wall-clock + sequential-baseline comparison |

## When to use this

- **CI pipelines with 100s of pytest tests** that re-import heavy
ML libs every run. The savings compound: every PR run, every
retry, every nightly.
- **PR-preview environments** where each PR needs its own clean
pytest run with fresh side-effects (DB, filesystem, env). forkd's
per-child KVM isolation means workers truly don't see each other.
- **Sharded fuzz / property testing**: split a 10 000-iteration
Hypothesis run across N microVMs without setup tax.

## When NOT to use this

- Your test suite is < 30 tests and finishes in < 2 s sequentially —
parallelism overhead exceeds the gain.
- You don't actually need per-worker isolation (e.g. pure-function
unit tests with no shared state) — `pytest -n <N>` (pytest-xdist)
in a single container is simpler.
- You can't run forkd on your CI host (managed CI like default
GitHub Actions, no KVM). For self-hosted runners with bare-metal
Linux + KVM this works great.

## Quickstart

```bash
# 1. Build the parent (one-time, ~5 min — pip install pandas + sklearn
# dominates the time)
sudo bash recipes/ci-parallel-pytest/build.sh

# 2. Snapshot the warmed parent (one-time, ~10 s)
sudo forkd snapshot --tag ci-pytest \
--kernel /var/lib/forkd/kernels/vmlinux \
--rootfs recipes/ci-parallel-pytest/parent.ext4 \
--tap forkd-tap0

# 3. Fan out — 4 workers in parallel
FORKD_TOKEN=$(sudo cat /tmp/bench-pause/token) \
python3 recipes/ci-parallel-pytest/demo.py --workers 4 \
--sequential-baseline
```

Output from the dev box (Intel i7-12700, ext4, 2026-06-06):

```
Plan: 4 worker(s) × pytest slice off `ci-pytest`.
worker 0: 2 file(s) — test_arithmetic.py, test_text_processing.py
worker 1: 1 file(s) — test_numpy_ops.py
worker 2: 1 file(s) — test_pandas_etl.py
worker 3: 1 file(s) — test_sklearn_models.py

=== fan-out: 4 workers in parallel ===
batch spawn (4 children): 81 ms
[0] PASS exec= 232 ms files=test_arithmetic.py,test_text_processing.py
[1] PASS exec= 304 ms files=test_numpy_ops.py
[2] PASS exec= 546 ms files=test_pandas_etl.py
[3] PASS exec=1458 ms files=test_sklearn_models.py

fan-out wall-clock: 1601 ms (batch spawn=81 ms = ~20 ms/worker,
slowest worker exec=1458 ms)

=== sequential baseline: one child runs the whole suite ===
[0] PASS spawn=61 ms exec=1507 ms
sequential wall-clock: 1625 ms (fan-out speedup: 1.01×)
```

The 1.01× fan-out-vs-sequential figure is honest: this demo suite
only has ~30 tests and is dominated by one sklearn slice (1458 ms).
Fan-out shines when **your suite has many slow slices of comparable
size** — e.g. 8 sklearn-heavy slices each taking ~1.5 s would fan
out to ~1.5 s wall, vs ~12 s sequentially.

**The number that matters across suite shapes is the batch spawn
cost: 81 ms for 4 children — ~20 ms per worker.** That's the
forkd-vs-container comparison: ~20 ms to start a forkd worker vs
~2-3 s to start a fresh container.

## GitHub Actions integration

Drop this into your workflow on a self-hosted runner that has forkd
+ a `ci-pytest` snapshot pre-built:

```yaml
jobs:
test:
runs-on: [self-hosted, linux, x64, forkd]
steps:
- uses: actions/checkout@v4
- name: Refresh the parent snapshot
run: |
sudo cp -r ./tests /opt/test_project/tests # mount your tests into the snap dir
# or rebuild the parent if your deps changed:
# sudo bash recipes/ci-parallel-pytest/build.sh
- name: Fan out
env:
FORKD_TOKEN: ${{ secrets.FORKD_TOKEN }}
run: |
python3 recipes/ci-parallel-pytest/demo.py \
--workers 8 \
--snapshot-tag ci-pytest
```

For a hosted-runner setup, the equivalent is one forkd daemon on
your CI infrastructure, exposed over a port the runner can reach.

## How it compares

| Approach | Per-worker fixed cost | Notes |
|---|---|---|
| `pytest` sequential, fresh container | ~2 s container cold + ~1.5 s `import numpy/pandas/sklearn` | Each PR run / retry / nightly re-pays both |
| `pytest-xdist -n 4` in one container | ~3.5 s container cold + ~1.5 s imports (shared across workers) | Single shared kernel; one test crash takes the host down |
| `docker run` × 4 fresh containers | ~3.5 s × 4 cold-starts, parallelized | Per-container isolation, but slow to spawn |
| **forkd fan-out (this recipe)** | **~20 ms batch spawn + 0 ms imports** | Per-child KVM isolation, warmed Python deps inherited via mmap CoW |

The break-even point is roughly: if your sequential test slice is
slower than your container cold-start (~3 s), container
parallelism is fine. If your slice is **comparable to or shorter
than** the ~3 s container tax, forkd wins outright. ML / data
science suites where you re-pay sklearn / torch import on every
worker fall squarely in the forkd-wins zone.

## Caveats

- **`pip install` inside snapshots requires v0.5.1+** — the guest
kernel rebuild that landed in #226 closed #218 (CRNG starvation
blocked OpenSSL → pip hung). Confirm your kernel:
`forkd snapshot-info ci-pytest`
- **Per-worker netns is on by default** — children get their own
`lo`, no cross-talk. If your tests need to hit a shared DB, use
`--per-child-netns=false` or put the DB on the host tap.
- **Worker count vs vCPU**: forkd's per-vCPU policy is "share the
host's cores". On a 20-core host, 8 workers is comfortable; 50
is over-subscribed.
Loading
Loading