diff --git a/recipes/README.md b/recipes/README.md index fb06b97..728e972 100644 --- a/recipes/README.md +++ b/recipes/README.md @@ -1,37 +1,41 @@ # Recipes -Two flavors: +forkd is a fork-on-write microVM primitive — the "for AI agents" +framing on the front page is one prominent use case, not the +ceiling. Anything that wants **N isolated children spawned from a +warmed parent in milliseconds** fits. -1. **Integration recipes** (host-side, no rootfs build) — Python scripts - that drive forkd from an agent framework, demonstrating the - BRANCH + fanout pattern with that framework's idioms. Start here if - you want to plug forkd into CrewAI / AutoGen / Swarm / Claude - Desktop in five minutes. -2. **Rootfs recipes** (parent images) — `build.sh` scripts that turn - public OCI images into forkd parent snapshots. Start here if you - want a custom warmed image to fork from. +## Pick your starting point -## Integration recipes (host-side) +### By problem you're solving -Read the script, copy the helper, drop it into your project. Each is -~150-250 lines of Python with a `--dry-run` mode so you can verify -the forkd plumbing without an LLM key. - -| Recipe | Framework | Forkd-specific value | +| Problem | Recipes | What forkd buys you | |---|---|---| -| [`mcp-agent/`](./mcp-agent/) | Claude Desktop / Cursor / Cline (via MCP) | End-to-end MCP protocol verification — same JSON-RPC framing a real LLM client uses | -| [`crewai-fanout/`](./crewai-fanout/) | CrewAI | N agents on N microVMs from one warmed parent — per-agent isolation without Docker cold-start | -| [`autogen-branch/`](./autogen-branch/) | AutoGen | Forkd-backed `CodeExecutor` + mid-conversation BRANCH that fans out N alternatives from the same warmed state | -| [`openai-swarm/`](./openai-swarm/) | OpenAI Swarm / Agents SDK | Handoff = BRANCH: agent B inherits agent A's full VM state (filesystem, imports, env) on handoff | +| **AI agent fan-out** — try N approaches, branch a thinking agent | [`langgraph-react/`](./langgraph-react/) · [`crewai-fanout/`](./crewai-fanout/) · [`autogen-branch/`](./autogen-branch/) · [`openai-swarm/`](./openai-swarm/) · [`mcp-agent/`](./mcp-agent/) · [`speculative-agent/`](./speculative-agent/) · [`coding-agent-fork/`](./coding-agent-fork/) | Per-child KVM isolation + warmed runtime inheritance. The "fork mid-thought" story. | +| **CI test parallelism** — run 100 pytest workers from a warmed parent | [`postgres-fixture/`](./postgres-fixture/) (DB-per-test) · [`ci-parallel-pytest/`](./ci-parallel-pytest/) (worker fan-out) | Skip per-worker container cold-start + dependency install. ~50 ms / worker instead of ~3 s. | +| **Database test fixtures** — fresh, isolated postgres per test | [`postgres-fixture/`](./postgres-fixture/) | `initdb` runs **once** at parent build; every fork inherits the post-init state. ~200× faster than per-test container. | +| **Browser automation farms** — Playwright / Puppeteer fan-out at scale | [`playwright-browser/`](./playwright-browser/) | Fork warmed headless Chromium at ~10 ms instead of ~2 s cold-boot. | +| **Notebook / code interpreter** — Jupyter kernel per session | [`jupyter-kernel/`](./jupyter-kernel/) · [`e2b-codeinterpreter/`](./e2b-codeinterpreter/) | Full SciPy stack pre-imported. ~1 ms per fresh kernel. | +| **General-purpose compute fan-out** — anything that needs N warmed sandboxes | [`python-numpy/`](./python-numpy/) · [`coding-agent/`](./coding-agent/) · [`nodejs/`](./nodejs/) · [`agent-workbench/`](./agent-workbench/) | Pre-baked language runtime + canonical fan-out benchmark. | + +### By integration framework (host-side Python scripts) -## Rootfs recipes +If you're plugging forkd into an existing agent framework, these +are ~150-250 lines of Python with a `--dry-run` mode so you can +verify the forkd plumbing without an LLM key. -Ready-made parent-rootfs recipes for common workbench images. -Each recipe takes a public Docker / OCI image and turns it into a -forkd parent snapshot, so you can fork N warmed children from it -in milliseconds. +| Framework | Recipe | +|---|---| +| Claude Desktop / Cursor / Cline (via MCP) | [`mcp-agent/`](./mcp-agent/) | +| CrewAI multi-agent crew | [`crewai-fanout/`](./crewai-fanout/) | +| AutoGen ConversableAgent / GroupChat | [`autogen-branch/`](./autogen-branch/) | +| OpenAI Swarm / Agents SDK | [`openai-swarm/`](./openai-swarm/) | +| LangGraph ReAct (the front-page demo) | [`langgraph-react/`](./langgraph-react/) | -The pattern is the same across recipes: +## How rootfs recipes work + +Rootfs recipes turn a public Docker / OCI image into a forkd parent +snapshot. Same shape across all of them: ```bash # 1. Build a parent rootfs from an upstream image @@ -39,7 +43,7 @@ sudo bash recipes//build.sh # 2. Snapshot the warmed parent (one-time per image version) sudo forkd snapshot --tag \ - --kernel ./vmlinux-6.1.141 \ + --kernel /var/lib/forkd/kernels/vmlinux \ --rootfs recipes//parent.ext4 \ --tap forkd-tap0 @@ -47,43 +51,32 @@ sudo forkd snapshot --tag \ sudo -E forkd fork --tag -n 100 --per-child-netns ``` +The first-time `build.sh` of each recipe takes a few minutes +(pulling the Docker image + converting to ext4). The snapshot step +is ~10 s. After that, forking children is the published benchmark +cost. + ### Available rootfs recipes -| Recipe | Parent image | Size | Audience | +| Recipe | Parent image | Size | Best for | |---|---|---|---| -| [`python-numpy/`](./python-numpy/) | `python:3.12-slim` + `python3-numpy` | ~1.5 GB | The canonical fan-out demo; what the chart on the front README measures | -| [`e2b-codeinterpreter/`](./e2b-codeinterpreter/) | `e2bdev/code-interpreter` | ~600 MB | AI code-execution agents (Anthropic / OpenAI tutorials use this image). Lightest "agent ready" option | -| [`jupyter-kernel/`](./jupyter-kernel/) | `quay.io/jupyter/scipy-notebook` | ~3 GB | Code-interpreter / notebook-style agents — full SciPy stack pre-imported, ~1 ms per fresh kernel instead of ~2 s | -| [`coding-agent/`](./coding-agent/) | `python:3.12` + git + ruff + black + pytest | ~1.8 GB | SWE-style coding agents that need a real dev toolchain inside the sandbox | -| [`nodejs/`](./nodejs/) | `node:22-slim` | ~250 MB | JavaScript / TypeScript workloads (Jest, Playwright fan-out) | -| [`playwright-browser/`](./playwright-browser/) | `mcr.microsoft.com/playwright` (Node + Chromium pre-warmed) | ~2.5 GB | Browser-driving agents (computer-use, web research, UI test gen). Fork warmed headless Chromium at ~10 ms instead of ~2 s. **Alpha** | -| [`agent-workbench/`](./agent-workbench/) | `agent-infra/sandbox` (browser + VSCode + Jupyter + MCP + shell) | ~5 GB | Kitchen-sink agent workbench when you want every tool already mounted; trades a bigger memory.bin for batteries-included | -| [`postgres-fixture/`](./postgres-fixture/) | `postgres:16` (initdb done, postmaster pre-launched) | ~500 MB | Fork-per-test isolated databases; each child gets a ready-to-query postgres in ~10 ms instead of ~2 s for fresh initdb | - -## Choosing a recipe - -**By framework / driver (integration recipes):** -- **Claude Desktop / Cursor / Cline** → `mcp-agent/` -- **CrewAI multi-agent crew** → `crewai-fanout/` -- **AutoGen ConversableAgent / GroupChat** → `autogen-branch/` -- **OpenAI Swarm / Agents SDK** → `openai-swarm/` - -**By workload (rootfs recipes):** -- **You're benchmarking** → `python-numpy/` -- **You're running an AI code interpreter** → `e2b-codeinterpreter/` -- **You need the full SciPy / notebook stack** → `jupyter-kernel/` -- **You're running a coding agent (SWE-bench style)** → `coding-agent/` -- **JS / TS only** → `nodejs/` -- **Browser-driving agent (computer-use, scraping, UI testing)** → `playwright-browser/` -- **You want browser + IDE + everything in one box** → `agent-workbench/` -- **You're running a test suite that needs an isolated DB per test** → `postgres-fixture/` +| [`python-numpy/`](./python-numpy/) | `python:3.12-slim` + `python3-numpy` | ~1.5 GB | **The canonical fan-out benchmark** — what the chart on the front README measures | +| [`postgres-fixture/`](./postgres-fixture/) | `postgres:16` (initdb done, postmaster pre-launched) | ~500 MB | **Fork-per-test isolated databases** — each child gets a ready-to-query postgres in ~10 ms vs ~2 s for fresh initdb | +| [`ci-parallel-pytest/`](./ci-parallel-pytest/) | `python:3.12-slim` + numpy/pandas/sklearn + your test suite | ~2 GB | **CI test fan-out** — parallel pytest workers without per-worker container cold-start | +| [`playwright-browser/`](./playwright-browser/) | `mcr.microsoft.com/playwright` (Node + Chromium pre-warmed) | ~2.5 GB | **Browser automation farms** — warmed headless Chromium at ~10 ms instead of ~2 s. **Alpha** | +| [`jupyter-kernel/`](./jupyter-kernel/) | `quay.io/jupyter/scipy-notebook` | ~3 GB | **Code-interpreter / notebook agents** — full SciPy stack pre-imported, ~1 ms per fresh kernel | +| [`e2b-codeinterpreter/`](./e2b-codeinterpreter/) | `e2bdev/code-interpreter` | ~600 MB | **AI code-execution agents** (Anthropic / OpenAI tutorials use this image). Lightest "agent ready" option | +| [`coding-agent/`](./coding-agent/) | `python:3.12` + git + ruff + black + pytest | ~1.8 GB | **SWE-style coding agents** that need a real dev toolchain inside the sandbox | +| [`nodejs/`](./nodejs/) | `node:22-slim` | ~250 MB | **JavaScript / TypeScript workloads** (Jest, Playwright fan-out, scraping) | +| [`agent-workbench/`](./agent-workbench/) | `agent-infra/sandbox` (browser + VSCode + Jupyter + MCP + shell) | ~5 GB | **Kitchen-sink workbench** when you want every tool already mounted | ## Notes - Recipes are tested on Ubuntu 24.04 / Linux 6.14 / x86_64. Other distros may need adjustments to `scripts/build-rootfs.sh`. -- The first-time `build.sh` of each recipe takes a few minutes (pulling - the Docker image + converting to ext4). The snapshot step is ~10 s. - After that, forking children is the published benchmark cost. - Each recipe is self-contained — pick one, run it; you don't need to understand the others. +- The "AI agent" framing on the project front page is the dominant use + case **today** but not the only one — the technology is `fork(2)` for + KVM microVMs. If your workload needs N hardware-isolated children + spawned from a warmed parent in milliseconds, forkd is the primitive. diff --git a/recipes/ci-parallel-pytest/README.md b/recipes/ci-parallel-pytest/README.md new file mode 100644 index 0000000..cba1042 --- /dev/null +++ b/recipes/ci-parallel-pytest/README.md @@ -0,0 +1,174 @@ +# `ci-parallel-pytest` + +**Run your pytest suite across N forkd microVMs in parallel, +without paying per-worker container cold-start or dependency +import cost.** + +A typical Python CI job re-imports numpy / pandas / scikit-learn on +every fresh worker container — ~1-2 s of pure overhead before the +first test runs. With forkd, those imports live in the warmed +parent's snapshot; every fork inherits them via `mmap MAP_PRIVATE` +copy-on-write. Per-worker fixed cost drops to ~50-100 ms. + +## Architecture + +``` + ┌──────────────────────────────────────┐ + │ parent snapshot `ci-pytest` │ + │ python:3.12-slim │ + │ + pytest 8.3 numpy 2.0 pandas 2.2 │ + │ + scikit-learn 1.5 │ + │ + your /opt/test_project │ + │ (heavy imports already paid) │ + └────────────────┬─────────────────────┘ + │ mmap MAP_PRIVATE (CoW) + ┌─────────────────────┼─────────────────────┐ + │ │ │ + ┌────▼───────┐ ┌─────▼──────┐ ┌──────▼─────┐ + │ worker 1 │ │ worker 2 │ │ worker N │ + │ pytest │ │ pytest │ │ pytest │ + │ slice 1/N │ ... │ slice 2/N │ ... │ slice N/N │ + └────────────┘ └────────────┘ └────────────┘ + run in parallel +``` + +## What ships in this recipe + +| File | What it does | +|---|---| +| [`build.sh`](./build.sh) | Builds a forkd parent rootfs: `python:3.12-slim` + pinned pytest/numpy/pandas/sklearn, the demo test project copied to `/opt/test_project`, and a pre-warm step that imports the heavy deps so they're in the snapshot's page cache | +| [`test_project/`](./test_project/) | A representative pytest project — ~30 tests across 5 files (arithmetic, numpy, pandas, sklearn, text). Replace with your own | +| [`demo.py`](./demo.py) | Fan-out driver: slices test files across N forkd workers, runs each slice in a child sandbox, reports per-worker spawn/exec timing + total wall-clock + sequential-baseline comparison | + +## When to use this + +- **CI pipelines with 100s of pytest tests** that re-import heavy + ML libs every run. The savings compound: every PR run, every + retry, every nightly. +- **PR-preview environments** where each PR needs its own clean + pytest run with fresh side-effects (DB, filesystem, env). forkd's + per-child KVM isolation means workers truly don't see each other. +- **Sharded fuzz / property testing**: split a 10 000-iteration + Hypothesis run across N microVMs without setup tax. + +## When NOT to use this + +- Your test suite is < 30 tests and finishes in < 2 s sequentially — + parallelism overhead exceeds the gain. +- You don't actually need per-worker isolation (e.g. pure-function + unit tests with no shared state) — `pytest -n ` (pytest-xdist) + in a single container is simpler. +- You can't run forkd on your CI host (managed CI like default + GitHub Actions, no KVM). For self-hosted runners with bare-metal + Linux + KVM this works great. + +## Quickstart + +```bash +# 1. Build the parent (one-time, ~5 min — pip install pandas + sklearn +# dominates the time) +sudo bash recipes/ci-parallel-pytest/build.sh + +# 2. Snapshot the warmed parent (one-time, ~10 s) +sudo forkd snapshot --tag ci-pytest \ + --kernel /var/lib/forkd/kernels/vmlinux \ + --rootfs recipes/ci-parallel-pytest/parent.ext4 \ + --tap forkd-tap0 + +# 3. Fan out — 4 workers in parallel +FORKD_TOKEN=$(sudo cat /tmp/bench-pause/token) \ + python3 recipes/ci-parallel-pytest/demo.py --workers 4 \ + --sequential-baseline +``` + +Output from the dev box (Intel i7-12700, ext4, 2026-06-06): + +``` +Plan: 4 worker(s) × pytest slice off `ci-pytest`. + worker 0: 2 file(s) — test_arithmetic.py, test_text_processing.py + worker 1: 1 file(s) — test_numpy_ops.py + worker 2: 1 file(s) — test_pandas_etl.py + worker 3: 1 file(s) — test_sklearn_models.py + +=== fan-out: 4 workers in parallel === + batch spawn (4 children): 81 ms + [0] PASS exec= 232 ms files=test_arithmetic.py,test_text_processing.py + [1] PASS exec= 304 ms files=test_numpy_ops.py + [2] PASS exec= 546 ms files=test_pandas_etl.py + [3] PASS exec=1458 ms files=test_sklearn_models.py + +fan-out wall-clock: 1601 ms (batch spawn=81 ms = ~20 ms/worker, + slowest worker exec=1458 ms) + +=== sequential baseline: one child runs the whole suite === + [0] PASS spawn=61 ms exec=1507 ms +sequential wall-clock: 1625 ms (fan-out speedup: 1.01×) +``` + +The 1.01× fan-out-vs-sequential figure is honest: this demo suite +only has ~30 tests and is dominated by one sklearn slice (1458 ms). +Fan-out shines when **your suite has many slow slices of comparable +size** — e.g. 8 sklearn-heavy slices each taking ~1.5 s would fan +out to ~1.5 s wall, vs ~12 s sequentially. + +**The number that matters across suite shapes is the batch spawn +cost: 81 ms for 4 children — ~20 ms per worker.** That's the +forkd-vs-container comparison: ~20 ms to start a forkd worker vs +~2-3 s to start a fresh container. + +## GitHub Actions integration + +Drop this into your workflow on a self-hosted runner that has forkd ++ a `ci-pytest` snapshot pre-built: + +```yaml +jobs: + test: + runs-on: [self-hosted, linux, x64, forkd] + steps: + - uses: actions/checkout@v4 + - name: Refresh the parent snapshot + run: | + sudo cp -r ./tests /opt/test_project/tests # mount your tests into the snap dir + # or rebuild the parent if your deps changed: + # sudo bash recipes/ci-parallel-pytest/build.sh + - name: Fan out + env: + FORKD_TOKEN: ${{ secrets.FORKD_TOKEN }} + run: | + python3 recipes/ci-parallel-pytest/demo.py \ + --workers 8 \ + --snapshot-tag ci-pytest +``` + +For a hosted-runner setup, the equivalent is one forkd daemon on +your CI infrastructure, exposed over a port the runner can reach. + +## How it compares + +| Approach | Per-worker fixed cost | Notes | +|---|---|---| +| `pytest` sequential, fresh container | ~2 s container cold + ~1.5 s `import numpy/pandas/sklearn` | Each PR run / retry / nightly re-pays both | +| `pytest-xdist -n 4` in one container | ~3.5 s container cold + ~1.5 s imports (shared across workers) | Single shared kernel; one test crash takes the host down | +| `docker run` × 4 fresh containers | ~3.5 s × 4 cold-starts, parallelized | Per-container isolation, but slow to spawn | +| **forkd fan-out (this recipe)** | **~20 ms batch spawn + 0 ms imports** | Per-child KVM isolation, warmed Python deps inherited via mmap CoW | + +The break-even point is roughly: if your sequential test slice is +slower than your container cold-start (~3 s), container +parallelism is fine. If your slice is **comparable to or shorter +than** the ~3 s container tax, forkd wins outright. ML / data +science suites where you re-pay sklearn / torch import on every +worker fall squarely in the forkd-wins zone. + +## Caveats + +- **`pip install` inside snapshots requires v0.5.1+** — the guest + kernel rebuild that landed in #226 closed #218 (CRNG starvation + blocked OpenSSL → pip hung). Confirm your kernel: + `forkd snapshot-info ci-pytest` +- **Per-worker netns is on by default** — children get their own + `lo`, no cross-talk. If your tests need to hit a shared DB, use + `--per-child-netns=false` or put the DB on the host tap. +- **Worker count vs vCPU**: forkd's per-vCPU policy is "share the + host's cores". On a 20-core host, 8 workers is comfortable; 50 + is over-subscribed. diff --git a/recipes/ci-parallel-pytest/build.sh b/recipes/ci-parallel-pytest/build.sh new file mode 100644 index 0000000..cc7e3f1 --- /dev/null +++ b/recipes/ci-parallel-pytest/build.sh @@ -0,0 +1,57 @@ +#!/usr/bin/env bash +# Build a forkd parent rootfs for CI test parallelism — pytest + +# numpy/pandas/sklearn pre-imported, the demo test project under +# /opt/test_project. Children fork from this, each running a slice +# of the test suite from the warmed parent. +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)" + +IMAGE="${IMAGE:-python:3.12-slim}" +SIZE_MIB="${SIZE_MIB:-2048}" +OUT="$SCRIPT_DIR/parent.ext4" + +[ "$(id -u)" -eq 0 ] || { echo "run as root" >&2; exit 1; } + +# Heavy deps baked in so children inherit the import cost. Pinned so +# the benchmark numbers in README.md stay reproducible across builds. +PIP_PKGS="pytest==8.3.4 numpy==2.0.2 pandas==2.2.3 scikit-learn==1.5.2" + +WRAPPED_TAG="forkd-ci-pytest:tmp-$$" +TMP_CTX="$(mktemp -d)" +trap "rm -rf '$TMP_CTX' && docker image rm -f '$WRAPPED_TAG' >/dev/null 2>&1 || true" EXIT + +# Copy the test project into the build context so it's baked into +# the rootfs at /opt/test_project. Real users would `cp -r` their +# own project here instead. +cp -r "$SCRIPT_DIR/test_project" "$TMP_CTX/test_project" + +cat > "$TMP_CTX/Dockerfile" < dict: + data = json.dumps(body).encode() if body is not None else None + headers = {"Authorization": f"Bearer {token}"} + if body is not None: + headers["Content-Type"] = "application/json" + req = urllib.request.Request( + f"{DEFAULT_URL}{path}", data=data, method=method, headers=headers + ) + try: + with urllib.request.urlopen(req, timeout=timeout) as resp: + raw = resp.read() + return json.loads(raw) if raw else {} + except urllib.error.HTTPError as e: + body = e.read().decode("utf-8", "replace") + raise RuntimeError(f"{method} {path} → HTTP {e.code} {body[:400]}") from e + + +# The set of test files baked into /opt/test_project/tests/ in the +# `ci-pytest` snapshot. In a real CI setup this would come from +# `pytest --collect-only -q` against the user's project. +TEST_FILES = [ + "tests/test_arithmetic.py", + "tests/test_numpy_ops.py", + "tests/test_pandas_etl.py", + "tests/test_sklearn_models.py", + "tests/test_text_processing.py", +] + + +def slice_tests(n_workers: int) -> list[list[str]]: + """Round-robin assign test files to N worker slices.""" + slices: list[list[str]] = [[] for _ in range(n_workers)] + for i, f in enumerate(TEST_FILES): + slices[i % n_workers].append(f) + return [s for s in slices if s] + + +def batch_spawn(n: int, snap_tag: str, token: str) -> tuple[list[str], float]: + """One POST /v1/sandboxes with n=N. The daemon's `restore_many` + spawns all N children atomically — this avoids the + 'operation not supported after starting the microVM' race that + bites if multiple POST /v1/sandboxes calls fire concurrently + against the same snapshot. + + Returns (sandbox_ids, total_spawn_wall_clock_ms). + """ + t0 = time.monotonic() + spawned = http( + "POST", + "/v1/sandboxes", + token, + # per_child_netns: each child gets its own network namespace + # (forkd-child-) so workers don't compete for forkd-tap0. + {"snapshot_tag": snap_tag, "n": n, "per_child_netns": True}, + ) + spawn_ms = (time.monotonic() - t0) * 1000 + return [s["id"] for s in spawned], spawn_ms + + +def run_pytest_in_sandbox( + idx: int, sb_id: str, files: list[str], token: str +) -> dict: + """Drive an already-spawned child: ping until ready → exec pytest + → delete. Returns per-worker timing. + """ + # Wait for the guest agent. + deadline = time.monotonic() + 30 + while time.monotonic() < deadline: + try: + http("POST", f"/v1/sandboxes/{sb_id}/ping", token, body={}, timeout=2) + break + except Exception: + time.sleep(0.1) + + cmd = "cd /opt/test_project && python3 -m pytest -v --tb=short " + " ".join(files) + args = ["sh", "-c", cmd] + t_exec = time.monotonic() + try: + result = http( + "POST", + f"/v1/sandboxes/{sb_id}/exec", + token, + {"args": args, "timeout_secs": 120}, + timeout=130, + ) + exec_ms = (time.monotonic() - t_exec) * 1000 + return { + "worker_idx": idx, + "files": files, + "exec_ms": round(exec_ms, 1), + "exit_code": result.get("exit_code", -1), + "stdout_tail": (result.get("stdout") or "").strip().split("\n")[-3:], + } + except Exception as e: + return { + "worker_idx": idx, + "files": files, + "exec_ms": round((time.monotonic() - t_exec) * 1000, 1), + "exit_code": -1, + "stdout_tail": [f"ERR: {e}"], + } + finally: + try: + http("DELETE", f"/v1/sandboxes/{sb_id}", token, timeout=15) + except Exception: + pass + + +def main() -> int: + ap = argparse.ArgumentParser() + ap.add_argument("--workers", type=int, default=4) + ap.add_argument("--snapshot-tag", default=DEFAULT_TAG) + ap.add_argument( + "--sequential-baseline", + action="store_true", + help="Also run the full suite in one child for comparison", + ) + ap.add_argument( + "--token", + default=os.environ.get("FORKD_TOKEN", ""), + help="Bearer token (or FORKD_TOKEN env)", + ) + args = ap.parse_args() + + if not args.token: + print("ERROR: set FORKD_TOKEN env or pass --token") + return 2 + + slices = slice_tests(args.workers) + print( + f"Plan: {len(slices)} worker(s) × pytest slice off `{args.snapshot_tag}`." + ) + for i, s in enumerate(slices): + print(f" worker {i}: {len(s)} file(s) — {', '.join(f.split('/')[-1] for f in s)}") + print() + + print(f"=== fan-out: {len(slices)} workers in parallel ===") + t_wall0 = time.monotonic() + sb_ids, batch_spawn_ms = batch_spawn(len(slices), args.snapshot_tag, args.token) + print(f" batch spawn ({len(slices)} children): {batch_spawn_ms:.0f} ms") + + with futures.ThreadPoolExecutor(max_workers=len(slices)) as pool: + results = list( + pool.map( + lambda p: run_pytest_in_sandbox(*p), + [ + (i, sb_ids[i], slices[i], args.token) + for i in range(len(slices)) + ], + ) + ) + wall_ms = (time.monotonic() - t_wall0) * 1000 + + fail = 0 + for r in results: + status = "PASS" if r["exit_code"] == 0 else f"FAIL({r['exit_code']})" + files_short = ",".join(f.split("/")[-1] for f in r["files"]) + print( + f" [{r['worker_idx']}] {status} exec={r['exec_ms']:>5.0f}ms " + f"files={files_short}" + ) + if r["exit_code"] != 0: + fail += 1 + for line in r["stdout_tail"]: + print(f" | {line}") + + exec_ms = [r["exec_ms"] for r in results] + spawn_per_worker = batch_spawn_ms / len(slices) + print() + print( + f"fan-out wall-clock: {wall_ms:.0f} ms " + f"(batch spawn={batch_spawn_ms:.0f} ms = ~{spawn_per_worker:.0f} ms/worker, " + f"slowest worker exec={max(exec_ms):.0f} ms)" + ) + + if args.sequential_baseline: + print() + print("=== sequential baseline: one child runs the whole suite ===") + t0 = time.monotonic() + seq_ids, seq_spawn_ms = batch_spawn(1, args.snapshot_tag, args.token) + seq = run_pytest_in_sandbox(0, seq_ids[0], TEST_FILES, args.token) + seq_wall_ms = (time.monotonic() - t0) * 1000 + status = "PASS" if seq["exit_code"] == 0 else f"FAIL({seq['exit_code']})" + print( + f" [0] {status} spawn={seq_spawn_ms:.0f}ms " + f"exec={seq['exec_ms']:.0f}ms" + ) + speedup = seq_wall_ms / wall_ms if wall_ms > 0 else 0 + print( + f"sequential wall-clock: {seq_wall_ms:.0f} ms " + f"(fan-out speedup: {speedup:.2f}×)" + ) + + return fail + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/recipes/ci-parallel-pytest/test_project/pyproject.toml b/recipes/ci-parallel-pytest/test_project/pyproject.toml new file mode 100644 index 0000000..4ea4a7e --- /dev/null +++ b/recipes/ci-parallel-pytest/test_project/pyproject.toml @@ -0,0 +1,9 @@ +[project] +name = "forkd-ci-pytest-demo" +version = "0.1.0" +description = "Pytest test suite used by the forkd ci-parallel-pytest recipe" +requires-python = ">=3.10" + +[tool.pytest.ini_options] +testpaths = ["tests"] +addopts = "-v --tb=short" diff --git a/recipes/ci-parallel-pytest/test_project/tests/test_arithmetic.py b/recipes/ci-parallel-pytest/test_project/tests/test_arithmetic.py new file mode 100644 index 0000000..b03ecd4 --- /dev/null +++ b/recipes/ci-parallel-pytest/test_project/tests/test_arithmetic.py @@ -0,0 +1,27 @@ +"""Tiny arithmetic suite — exercises pytest startup + import overhead. + +Realistic CI suites have hundreds of these; they're individually cheap +but the per-test fixed overhead (`pytest` startup + test collection + +fixture setup) is what eats wall-clock when run sequentially. +""" +import pytest + + +@pytest.mark.parametrize("a,b,expected", [(1, 2, 3), (5, 7, 12), (-1, 1, 0), (0, 0, 0)]) +def test_addition(a: int, b: int, expected: int) -> None: + assert a + b == expected + + +@pytest.mark.parametrize("a,b,expected", [(10, 3, 30), (-2, 4, -8), (0, 999, 0)]) +def test_multiplication(a: int, b: int, expected: int) -> None: + assert a * b == expected + + +def test_division_by_zero_raises() -> None: + with pytest.raises(ZeroDivisionError): + _ = 1 / 0 + + +def test_modulo_invariant() -> None: + for n in range(20): + assert n % 7 in range(7) diff --git a/recipes/ci-parallel-pytest/test_project/tests/test_numpy_ops.py b/recipes/ci-parallel-pytest/test_project/tests/test_numpy_ops.py new file mode 100644 index 0000000..8d96db0 --- /dev/null +++ b/recipes/ci-parallel-pytest/test_project/tests/test_numpy_ops.py @@ -0,0 +1,52 @@ +"""numpy array tests — the import alone is ~150 ms cold. + +The whole point of forkd for CI: numpy is already imported in the +parent snapshot, so children inherit the import for free. Each +child's pytest startup skips the import cost. +""" +import numpy as np +import pytest + + +def test_zeros_shape() -> None: + a = np.zeros((4, 5)) + assert a.shape == (4, 5) + assert a.sum() == 0 + + +def test_linspace_endpoints() -> None: + a = np.linspace(0.0, 1.0, 11) + assert a[0] == pytest.approx(0.0) + assert a[-1] == pytest.approx(1.0) + assert len(a) == 11 + + +def test_dot_product_associative() -> None: + rng = np.random.default_rng(seed=0) + a = rng.standard_normal((3, 3)) + b = rng.standard_normal((3, 3)) + c = rng.standard_normal((3, 3)) + np.testing.assert_allclose((a @ b) @ c, a @ (b @ c), rtol=1e-10) + + +def test_solve_smoke() -> None: + a = np.array([[2.0, 1.0], [1.0, 3.0]]) + b = np.array([1.0, 2.0]) + x = np.linalg.solve(a, b) + np.testing.assert_allclose(a @ x, b) + + +def test_eigvals_real_for_symmetric() -> None: + rng = np.random.default_rng(seed=1) + a = rng.standard_normal((8, 8)) + sym = (a + a.T) / 2 + vals = np.linalg.eigvalsh(sym) + assert np.allclose(vals.imag, 0.0) + assert len(vals) == 8 + + +def test_fft_inverse_round_trips() -> None: + rng = np.random.default_rng(seed=2) + a = rng.standard_normal(64) + recovered = np.fft.ifft(np.fft.fft(a)).real + np.testing.assert_allclose(a, recovered, atol=1e-10) diff --git a/recipes/ci-parallel-pytest/test_project/tests/test_pandas_etl.py b/recipes/ci-parallel-pytest/test_project/tests/test_pandas_etl.py new file mode 100644 index 0000000..8a51a43 --- /dev/null +++ b/recipes/ci-parallel-pytest/test_project/tests/test_pandas_etl.py @@ -0,0 +1,57 @@ +"""pandas DataFrame tests — `import pandas` is ~400-800 ms cold.""" +import numpy as np +import pandas as pd +import pytest + + +@pytest.fixture +def sample_df() -> pd.DataFrame: + return pd.DataFrame({ + "user": ["alice", "bob", "carol", "dave", "eve"], + "age": [29, 41, 35, 22, 58], + "score": [0.81, 0.65, 0.92, 0.74, 0.88], + }) + + +def test_dataframe_construction(sample_df: pd.DataFrame) -> None: + assert len(sample_df) == 5 + assert list(sample_df.columns) == ["user", "age", "score"] + + +def test_filter_by_age(sample_df: pd.DataFrame) -> None: + adults = sample_df[sample_df["age"] >= 30] + assert len(adults) == 3 + assert set(adults["user"]) == {"bob", "carol", "eve"} + + +def test_groupby_aggregation() -> None: + df = pd.DataFrame({ + "team": ["a", "a", "b", "b", "c"], + "score": [1, 2, 3, 4, 5], + }) + means = df.groupby("team")["score"].mean() + assert means["a"] == pytest.approx(1.5) + assert means["b"] == pytest.approx(3.5) + assert means["c"] == pytest.approx(5.0) + + +def test_merge_inner() -> None: + left = pd.DataFrame({"id": [1, 2, 3], "x": ["a", "b", "c"]}) + right = pd.DataFrame({"id": [2, 3, 4], "y": [20, 30, 40]}) + merged = left.merge(right, on="id", how="inner") + assert len(merged) == 2 + assert list(merged["y"]) == [20, 30] + + +def test_to_csv_roundtrip(tmp_path) -> None: + df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}) + p = tmp_path / "x.csv" + df.to_csv(p, index=False) + loaded = pd.read_csv(p) + pd.testing.assert_frame_equal(df, loaded) + + +def test_numeric_describe(sample_df: pd.DataFrame) -> None: + stats = sample_df[["age", "score"]].describe() + assert stats.loc["count", "age"] == 5 + assert stats.loc["min", "score"] == pytest.approx(0.65) diff --git a/recipes/ci-parallel-pytest/test_project/tests/test_sklearn_models.py b/recipes/ci-parallel-pytest/test_project/tests/test_sklearn_models.py new file mode 100644 index 0000000..5b91a4c --- /dev/null +++ b/recipes/ci-parallel-pytest/test_project/tests/test_sklearn_models.py @@ -0,0 +1,53 @@ +"""sklearn model tests — `import sklearn` is ~600-1200 ms cold. + +This is the worst per-test fixed cost in a typical Python ML CI: +every fresh pytest invocation re-pays it. In a forkd parent, the +import is part of the warmed snapshot. +""" +import numpy as np +import pytest +from sklearn.cluster import KMeans +from sklearn.datasets import make_classification +from sklearn.linear_model import LinearRegression, LogisticRegression +from sklearn.metrics import accuracy_score, r2_score +from sklearn.model_selection import train_test_split + + +def test_linear_regression_exact_fit() -> None: + rng = np.random.default_rng(seed=0) + x = rng.standard_normal((100, 3)) + coef_true = np.array([1.5, -2.0, 0.5]) + y = x @ coef_true + 7.0 + model = LinearRegression().fit(x, y) + np.testing.assert_allclose(model.coef_, coef_true, atol=1e-9) + assert model.intercept_ == pytest.approx(7.0) + assert r2_score(y, model.predict(x)) == pytest.approx(1.0) + + +def test_logistic_regression_separable() -> None: + x, y = make_classification( + n_samples=200, n_features=8, n_informative=4, random_state=42, + ) + x_tr, x_te, y_tr, y_te = train_test_split(x, y, test_size=0.25, random_state=0) + model = LogisticRegression(max_iter=500).fit(x_tr, y_tr) + acc = accuracy_score(y_te, model.predict(x_te)) + assert acc > 0.75 + + +def test_kmeans_two_clusters() -> None: + rng = np.random.default_rng(seed=7) + cluster_a = rng.standard_normal((50, 2)) + [0, 0] + cluster_b = rng.standard_normal((50, 2)) + [10, 10] + x = np.vstack([cluster_a, cluster_b]) + km = KMeans(n_clusters=2, random_state=0, n_init=10).fit(x) + centers = sorted(km.cluster_centers_.tolist(), key=lambda c: c[0]) + assert centers[0][0] < 2.0 + assert centers[1][0] > 8.0 + + +def test_pipeline_predict_shape() -> None: + rng = np.random.default_rng(seed=3) + x = rng.standard_normal((50, 4)) + y = x @ np.array([1.0, -1.0, 0.5, 0.0]) + 2.0 + pred = LinearRegression().fit(x, y).predict(x[:10]) + assert pred.shape == (10,) diff --git a/recipes/ci-parallel-pytest/test_project/tests/test_text_processing.py b/recipes/ci-parallel-pytest/test_project/tests/test_text_processing.py new file mode 100644 index 0000000..b4ea4db --- /dev/null +++ b/recipes/ci-parallel-pytest/test_project/tests/test_text_processing.py @@ -0,0 +1,45 @@ +"""String / regex tests — stdlib-only, very fast per test. + +The point of including these is to mirror real CI suites: a mix of +heavy ML tests and fast unit tests. forkd's per-worker fixed cost +is amortized across whatever slice each worker gets. +""" +import re + +import pytest + + +@pytest.mark.parametrize("s,expected", [ + ("hello world", 11), + ("forkd", 5), + ("", 0), +]) +def test_string_length(s: str, expected: int) -> None: + assert len(s) == expected + + +def test_split_join_roundtrip() -> None: + s = "one two three four" + assert " ".join(s.split()) == s + + +def test_regex_email_basic() -> None: + pattern = re.compile(r"^[\w.+-]+@[\w.-]+\.[a-z]{2,}$", re.IGNORECASE) + assert pattern.match("alice@example.com") + assert pattern.match("user+tag@sub.example.co.uk") + assert not pattern.match("not-an-email") + assert not pattern.match("@example.com") + + +def test_dict_comprehension() -> None: + src = {"a": 1, "b": 2, "c": 3} + doubled = {k: v * 2 for k, v in src.items()} + assert doubled == {"a": 2, "b": 4, "c": 6} + + +def test_set_operations() -> None: + a = {1, 2, 3, 4} + b = {3, 4, 5, 6} + assert a & b == {3, 4} + assert a | b == {1, 2, 3, 4, 5, 6} + assert a - b == {1, 2}