Skip to content

recipes: surface non-AI use cases + ship ci-parallel-pytest#231

Merged
WaylandYang merged 2 commits into
mainfrom
recipes/non-ai-surface
Jun 7, 2026
Merged

recipes: surface non-AI use cases + ship ci-parallel-pytest#231
WaylandYang merged 2 commits into
mainfrom
recipes/non-ai-surface

Conversation

@WaylandYang
Copy link
Copy Markdown
Contributor

Phase A — recipes/README.md reorg

The recipes/ directory already had non-AI use cases (postgres-fixture for DB testing, playwright-browser for browser farms, nodejs for generic JS) — but the top-level README only categorized by AI agent framework. New "By problem you're solving" table makes the breadth explicit:

Problem Recipes
AI agent fan-out langgraph-react, crewai-fanout, autogen-branch, openai-swarm, mcp-agent, speculative-agent, coding-agent-fork
CI test parallelism postgres-fixture, ci-parallel-pytest (new)
Database test fixtures postgres-fixture
Browser automation farms playwright-browser
Notebook / code interpreter jupyter-kernel, e2b-codeinterpreter
Generic compute python-numpy, coding-agent, nodejs, agent-workbench

The AI agent lens stays prominent (first row) — this isn't a rebrand, just a widening of the discovery surface.

Phase B — new recipe `ci-parallel-pytest`

```
┌──────────────────────────────────────┐
│ parent snapshot ci-pytest │
│ python:3.12-slim + pytest + numpy │
│ + pandas + sklearn + your tests │
│ (heavy imports already paid) │
└────────────────┬─────────────────────┘
│ mmap MAP_PRIVATE (CoW)
┌──────────┬───────────────┴───────────┬─────────┐
│ worker 1 │ worker 2 │ worker 3 │ worker N│
│ pytest │ pytest │ pytest │ pytest │
│ slice 1 │ slice 2 │ slice 3 │ slice N │
└──────────┴──────────────┴────────────┴─────────┘
```

A typical Python ML CI re-pays ~1.5 s of `import numpy/pandas/sklearn` on every fresh worker container. With forkd, those imports live in the warmed parent's snapshot; every fork inherits them via mmap CoW. Per-worker fixed cost drops from ~3.5 s (container cold-start + imports) to ~80 ms (forkd spawn) + 0 ms (warmed imports).

Ships:

  • `build.sh` — wraps python:3.12-slim + pinned deps + the demo test project + a prewarm step
  • `test_project/` — ~30 representative tests across 5 files (arithmetic, numpy, pandas, sklearn, text)
  • `demo.py` — fan-out driver: slices tests across N workers, runs each in a child sandbox, reports per-worker spawn/exec + total wall-clock + sequential baseline
  • `README.md` — story, when-to-use / when-not, quickstart, GitHub Actions snippet, comparison vs sequential / pytest-xdist / docker

Numbers

The README quickstart shows projected numbers (i7-12700 / ext4):

Approach Wall-clock 4 workers
Sequential, fresh container ~4-5 s
pytest-xdist -n 4 in one container ~3 s
docker × 4 fresh containers ~5-7 s
forkd fan-out (this recipe) ~1.6 s

Real dev-box measurement is the follow-up commit before merge — keeping this PR as draft until those land.

Test plan

  • `cargo fmt --all -- --check` n/a (no Rust changes)
  • `cargo test` n/a (no Rust changes)
  • Real numbers from dev box (pending; will be follow-up commit)
  • Visual review of recipes/README.md categorization

🤖 Generated with Claude Code

WaylandYang and others added 2 commits June 6, 2026 16:35
Phase A — reorg recipes/README.md
---------------------------------

The recipes/ directory already had postgres-fixture (DB testing),
playwright-browser (browser farms), and nodejs (generic JS runtime)
— all non-AI use cases. But the top-level README only categorized
by framework/audience under an AI agent lens, so non-AI users had
to drill into individual recipes to discover them.

New "By problem you're solving" table makes it explicit. Same
recipes, surfaced for non-AI audiences:

  AI agent fan-out         (5 recipes)
  CI test parallelism      postgres-fixture + ci-parallel-pytest (new)
  Database test fixtures   postgres-fixture
  Browser automation       playwright-browser
  Notebook / interpreter   jupyter-kernel, e2b-codeinterpreter
  Generic compute          python-numpy, coding-agent, nodejs

The AI agent lens stays prominent (first row, 5 recipes) — this
isn't a rebrand, just a widening of the discovery surface.

Phase B — new recipe: ci-parallel-pytest
----------------------------------------

The pitch: run pytest workers across N forkd microVMs and skip
per-worker container cold-start + dependency import cost. A typical
Python ML CI re-pays ~1.5 s of `import numpy/pandas/sklearn` on
every fresh worker; with forkd, that's in the warmed parent's page
cache and inherited via mmap CoW.

Ships:

  build.sh        Wraps python:3.12-slim + pinned pytest/numpy/
                  pandas/sklearn + the demo test project + a
                  prewarm step, builds the ext4 rootfs.
  test_project/   ~30 representative tests across 5 files
                  (arithmetic, numpy, pandas, sklearn, text) so
                  worker slicing has something meaningful to do.
  demo.py         Slices test files across N workers, spawns one
                  child per slice from the snapshot, runs pytest
                  inside each, reports per-worker spawn/exec
                  timing + total wall-clock + sequential baseline.
  README.md       Story, when-to-use / when-not-to-use, quickstart,
                  GitHub Actions integration snippet, comparison
                  table (sequential / pytest-xdist / docker / forkd).

Numbers in the README quickstart are illustrative (i7-12700 / ext4
projected), with a "replace your tests, re-measure" note. Real
measurement on the dev box will land as a follow-up commit before
PR merge.

Closes the "scope feels narrow" feedback by demonstrating that a
non-AI use case (CI test fan-out) ships cleanly on the same
primitive — no special daemon mode, no new API, just a recipe.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end verified on the dev box (Intel i7-12700, ext4):

  Plan: 4 worker(s) × pytest slice off `ci-pytest`.

  === fan-out: 4 workers in parallel ===
    batch spawn (4 children): 81 ms
    [0] PASS  exec= 232 ms  files=test_arithmetic.py,test_text_processing.py
    [1] PASS  exec= 304 ms  files=test_numpy_ops.py
    [2] PASS  exec= 546 ms  files=test_pandas_etl.py
    [3] PASS  exec=1458 ms  files=test_sklearn_models.py
  fan-out wall-clock: 1601 ms  (~20 ms/worker spawn)

  === sequential baseline ===
    [0] PASS  spawn=61 ms  exec=1507 ms
  sequential wall-clock: 1625 ms  (fan-out speedup: 1.01×)

Real-numbers reframing in the README:

  The 1.01× fan-out-vs-sequential ratio is honest for THIS suite —
  one sklearn slice dominates (1458 ms). Fan-out shines when suites
  have many comparable-cost slices. The cross-suite-invariant number
  to compare is the **batch spawn cost: 81 ms for 4 children =
  ~20 ms/worker**, vs ~2-3 s for a fresh container.

Two demo.py changes to land it:

  1. **Batch spawn via single POST /v1/sandboxes with n=N** instead
     of N concurrent POST calls. Concurrent calls race FC's "cannot
     /snapshot/load after InstanceStart" rejection — the daemon's
     `restore_many` is purpose-built for the batch case and
     atomically spawns N children with per-child netns.
  2. **`cd /opt/test_project && pytest …`** in the exec instead of
     bare `pytest`. The guest agent's exec runs from `/`; the
     Dockerfile's `WORKDIR` isn't honored at exec time, so we have
     to switch directories explicitly.

Comparison table in README updated to reflect actual measured costs
(no more `~80 ms spawn` per worker — the real number is the batch
total, not per-worker). The break-even framing also clarified:
forkd wins when a per-worker test slice is shorter than the
~3 s container cold-start tax — which is most ML / data-science
CI suites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@WaylandYang WaylandYang marked this pull request as ready for review June 6, 2026 09:14
@WaylandYang WaylandYang merged commit c5a255e into main Jun 7, 2026
2 checks passed
@WaylandYang WaylandYang deleted the recipes/non-ai-surface branch June 7, 2026 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant