Skip to content

CF artifacts: agent-loop rehydration integration test#2

Open
ivarvong wants to merge 4 commits into
mainfrom
cf-artifacts-rehydration-demo
Open

CF artifacts: agent-loop rehydration integration test#2
ivarvong wants to merge 4 commits into
mainfrom
cf-artifacts-rehydration-demo

Conversation

@ivarvong
Copy link
Copy Markdown
Owner

@ivarvong ivarvong commented May 6, 2026

Summary

End-to-end integration test that proves the artifact-backed agent-loop story: provision a fresh CF artifacts repo + write token via the management API, seed initial state, boot agent 1 by cloning + mounting through VFS, modify the working tree via VFS.write_file, commit + push, drop all in-memory state, then rehydrate agent 2 from a fresh clone and verify session 1's edits persisted.

  • Bumps :exgit to pull in Exgit.Workspace + its VFS.Mountable defimpl, so writes flow through the unified %VFS{} surface (no manual blob/tree/commit composition).
  • Tagged :integration_network + :cloudflare; self-skips when CF_API_TOKEN / CF_ACCOUNT_ID aren't set, and deletes the provisioned repo on teardown.
  • When CF_BENCH_LOG=path is set, each run appends a JSONL record (timestamp, repo name, per-op timings, total) to that path. bench/cf_aggregate.exs <jsonl> reads it and prints min/p50/mean/p90/p95/p99/max per op.

The compile-order workaround in test_helper.exs

Exgit.Workspace's VFS.Mountable defimpl is guarded by Code.ensure_loaded?(VFS.Mountable). Mix always builds deps before the host project's lib, so when :exgit compiles inside the vfs repo, our protocol module doesn't exist yet → the guard fails → no defimpl beam. Downstream consumers don't hit this — mix orders vfs before exgit via the optional edge declared in exgit/mix.exs. The breakage is specific to self-hosting the protocol's own repo. The test_helper compiles the defimpl source in-place once both modules are loaded.

Per-op timings (n=50)

Caveat: measured from my laptop over home internet, not a controlled benchmark. Numbers reflect end-user RTT to Cloudflare's edge from this host; not service SLA, not steady-state from a colocated worker. p99 values with n=50 are very noisy estimators — treat the tail as "this happened once or twice in 50," not as an SLO target.

op min p50 mean p90 p95 p99 max
create_repo 925 1129 1360 2369 2730 3084 3084
mint_token 227 337 360 495 535 891 891
push_seed 690 884 1052 1211 2458 3678 3678
clone_boot 158 209 220 271 290 373 373
push_modified 103 163 163 209 216 266 266
clone_rehydrate 151 229 249 316 362 590 590
delete_repo 676 946 1074 1376 2294 3623 3623
total 3262 4156 4479 5626 7284 9142 9142

(ms)

Read of the data:

  • Control plane (create_repo, mint_token, delete_repo) dominates and has heavy right-tails. p50/p99 ratios are 2.7×, 2.6×, 3.8× respectively. create_repo and delete_repo p99 ≈ 3s.
  • Data plane (clone, push) is fast and tight. All four ops 100–600ms across the entire run; p50/p99 under 2.6×. Predictable enough to budget per agent turn.
  • push_seed has the worst tail — p50 884ms, p99 3.7s. First push to a freshly-created repo competes with whatever post-create work the server is still doing.
  • End-to-end: p50 4.2s, p95 7.3s, p99 9.1s. Rehydration is single-digit seconds.
  • Steady-state turn cost (just clone_rehydrate + push_modified, no create/delete): p50 ≈ 390ms, p99 ≈ 860ms.

Test plan

  • mix test --include integration_network test/integration/cloudflare_artifacts_test.exs (with CF env vars set)
  • Default mix test still green (325 tests, 46 properties, 36 doctests, 0 failures)
  • 50-run bench (CF_BENCH_LOG=...) + bench/cf_aggregate.exs aggregator
  • CI green

ivarvong and others added 4 commits May 6, 2026 17:45
Proves the artifact-backed agent-loop story end-to-end against real CF:
seed a branch, "boot" agent 1 by cloning + mounting via VFS, modify the
working tree through `VFS.write_file`, commit + push back to CF, drop
all in-memory state, then "rehydrate" agent 2 from a fresh clone and
verify session 1's edits persisted.

Tagged `:integration_network` and `:cloudflare`. Self-skips when
`CF_ARTIFACT_REMOTE` / `CF_ARTIFACT_TOKEN` aren't set. Cleans up its
test branch on teardown so `ci.git` doesn't accumulate refs.

Drive-bys:
- Bump exgit to 6252d8e to pull in `Exgit.Workspace` and its
  `VFS.Mountable` defimpl.
- `test/test_helper.exs` loads `.env` and works around a self-host-only
  compile-order issue: when vfs is the host project, `:exgit` builds
  before our `VFS.Mountable` exists, so its `Code.ensure_loaded?`-guarded
  workspace defimpl is skipped; we recompile the source in-place once
  both modules are loaded. Downstream consumers don't hit this — mix
  orders vfs before exgit via the optional edge in exgit/mix.exs.
- Gitignore `.env`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the integration test from "long-lived ci.git + ambient bearer"
to provisioning a fresh repo + minting a write-scoped token per run via
the `Exgit.CloudflareArtifacts` management client. This covers the
control plane (create_repo, mint_token, delete_repo) on top of the data
plane (push, clone), which is the full agent-loop boot sequence.

Each network op is wrapped in `:timer.tc` and printed. Across 10 runs:

  op                  min     median    mean      p95      max
  create_repo         954     1120      1293      1703     2105 ms
  mint_token          242     283       515       514      2345 ms
  push_seed           690     856       1291      2962     3300 ms
  clone_boot          184     203       217       252      298 ms
  push_modified       121     165       170       199      207 ms
  clone_rehydrate     230     264       279       325      357 ms
  delete_repo         724     1006     1160       1598     1990 ms
  total              3875    4614      4926       6182     6730 ms

Control-plane ops dominate (~1s each); git data plane is fast (<300ms
per op). End-to-end rehydration round-trip is single-digit seconds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When `CF_BENCH_LOG=path` is set, the test appends one JSON line per
run (timestamp, repo name, per-op timings, total) to that path. When
unset, it falls back to the human-readable stdout summary.

`bench/cf_aggregate.exs <jsonl>` reads the file and prints min /
median / mean / p90 / p95 / p99 / max per op. Replaces the previous
"loop the test, grep IO.puts, awk a histogram" workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant