CF artifacts: agent-loop rehydration integration test by ivarvong · Pull Request #2 · ivarvong/vfs

ivarvong · 2026-05-06T21:45:51Z

Summary

End-to-end integration test that proves the artifact-backed agent-loop story: provision a fresh CF artifacts repo + write token via the management API, seed initial state, boot agent 1 by cloning + mounting through VFS, modify the working tree via VFS.write_file, commit + push, drop all in-memory state, then rehydrate agent 2 from a fresh clone and verify session 1's edits persisted.

Bumps :exgit to pull in Exgit.Workspace + its VFS.Mountable defimpl, so writes flow through the unified %VFS{} surface (no manual blob/tree/commit composition).
Tagged :integration_network + :cloudflare; self-skips when CF_API_TOKEN / CF_ACCOUNT_ID aren't set, and deletes the provisioned repo on teardown.
When CF_BENCH_LOG=path is set, each run appends a JSONL record (timestamp, repo name, per-op timings, total) to that path. bench/cf_aggregate.exs <jsonl> reads it and prints min/p50/mean/p90/p95/p99/max per op.

The compile-order workaround in `test_helper.exs`

Exgit.Workspace's VFS.Mountable defimpl is guarded by Code.ensure_loaded?(VFS.Mountable). Mix always builds deps before the host project's lib, so when :exgit compiles inside the vfs repo, our protocol module doesn't exist yet → the guard fails → no defimpl beam. Downstream consumers don't hit this — mix orders vfs before exgit via the optional edge declared in exgit/mix.exs. The breakage is specific to self-hosting the protocol's own repo. The test_helper compiles the defimpl source in-place once both modules are loaded.

Per-op timings (n=50)

Caveat: measured from my laptop over home internet, not a controlled benchmark. Numbers reflect end-user RTT to Cloudflare's edge from this host; not service SLA, not steady-state from a colocated worker. p99 values with n=50 are very noisy estimators — treat the tail as "this happened once or twice in 50," not as an SLO target.

op	min	p50	mean	p90	p95	p99	max
`create_repo`	925	1129	1360	2369	2730	3084	3084
`mint_token`	227	337	360	495	535	891	891
`push_seed`	690	884	1052	1211	2458	3678	3678
`clone_boot`	158	209	220	271	290	373	373
`push_modified`	103	163	163	209	216	266	266
`clone_rehydrate`	151	229	249	316	362	590	590
`delete_repo`	676	946	1074	1376	2294	3623	3623
total	3262	4156	4479	5626	7284	9142	9142

(ms)

Read of the data:

Control plane (create_repo, mint_token, delete_repo) dominates and has heavy right-tails. p50/p99 ratios are 2.7×, 2.6×, 3.8× respectively. create_repo and delete_repo p99 ≈ 3s.
Data plane (clone, push) is fast and tight. All four ops 100–600ms across the entire run; p50/p99 under 2.6×. Predictable enough to budget per agent turn.
push_seed has the worst tail — p50 884ms, p99 3.7s. First push to a freshly-created repo competes with whatever post-create work the server is still doing.
End-to-end: p50 4.2s, p95 7.3s, p99 9.1s. Rehydration is single-digit seconds.
Steady-state turn cost (just clone_rehydrate + push_modified, no create/delete): p50 ≈ 390ms, p99 ≈ 860ms.

Test plan

mix test --include integration_network test/integration/cloudflare_artifacts_test.exs (with CF env vars set)
Default mix test still green (325 tests, 46 properties, 36 doctests, 0 failures)
50-run bench (CF_BENCH_LOG=...) + bench/cf_aggregate.exs aggregator
CI green

Proves the artifact-backed agent-loop story end-to-end against real CF: seed a branch, "boot" agent 1 by cloning + mounting via VFS, modify the working tree through `VFS.write_file`, commit + push back to CF, drop all in-memory state, then "rehydrate" agent 2 from a fresh clone and verify session 1's edits persisted. Tagged `:integration_network` and `:cloudflare`. Self-skips when `CF_ARTIFACT_REMOTE` / `CF_ARTIFACT_TOKEN` aren't set. Cleans up its test branch on teardown so `ci.git` doesn't accumulate refs. Drive-bys: - Bump exgit to 6252d8e to pull in `Exgit.Workspace` and its `VFS.Mountable` defimpl. - `test/test_helper.exs` loads `.env` and works around a self-host-only compile-order issue: when vfs is the host project, `:exgit` builds before our `VFS.Mountable` exists, so its `Code.ensure_loaded?`-guarded workspace defimpl is skipped; we recompile the source in-place once both modules are loaded. Downstream consumers don't hit this — mix orders vfs before exgit via the optional edge in exgit/mix.exs. - Gitignore `.env`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switches the integration test from "long-lived ci.git + ambient bearer" to provisioning a fresh repo + minting a write-scoped token per run via the `Exgit.CloudflareArtifacts` management client. This covers the control plane (create_repo, mint_token, delete_repo) on top of the data plane (push, clone), which is the full agent-loop boot sequence. Each network op is wrapped in `:timer.tc` and printed. Across 10 runs: op min median mean p95 max create_repo 954 1120 1293 1703 2105 ms mint_token 242 283 515 514 2345 ms push_seed 690 856 1291 2962 3300 ms clone_boot 184 203 217 252 298 ms push_modified 121 165 170 199 207 ms clone_rehydrate 230 264 279 325 357 ms delete_repo 724 1006 1160 1598 1990 ms total 3875 4614 4926 6182 6730 ms Control-plane ops dominate (~1s each); git data plane is fast (<300ms per op). End-to-end rehydration round-trip is single-digit seconds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When `CF_BENCH_LOG=path` is set, the test appends one JSON line per run (timestamp, repo name, per-op timings, total) to that path. When unset, it falls back to the human-readable stdout summary. `bench/cf_aggregate.exs <jsonl>` reads the file and prints min / median / mean / p90 / p95 / p99 / max per op. Replaces the previous "loop the test, grep IO.puts, awk a histogram" workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ivarvong and others added 4 commits May 6, 2026 17:45

Silence unused-var warning in cf_aggregate

d4c6f2b

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CF artifacts: agent-loop rehydration integration test#2

CF artifacts: agent-loop rehydration integration test#2
ivarvong wants to merge 4 commits into
mainfrom
cf-artifacts-rehydration-demo

ivarvong commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ivarvong commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The compile-order workaround in test_helper.exs

Per-op timings (n=50)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ivarvong commented May 6, 2026 •

edited

Loading

The compile-order workaround in `test_helper.exs`