CF artifacts: agent-loop rehydration integration test#2
Open
ivarvong wants to merge 4 commits into
Open
Conversation
Proves the artifact-backed agent-loop story end-to-end against real CF: seed a branch, "boot" agent 1 by cloning + mounting via VFS, modify the working tree through `VFS.write_file`, commit + push back to CF, drop all in-memory state, then "rehydrate" agent 2 from a fresh clone and verify session 1's edits persisted. Tagged `:integration_network` and `:cloudflare`. Self-skips when `CF_ARTIFACT_REMOTE` / `CF_ARTIFACT_TOKEN` aren't set. Cleans up its test branch on teardown so `ci.git` doesn't accumulate refs. Drive-bys: - Bump exgit to 6252d8e to pull in `Exgit.Workspace` and its `VFS.Mountable` defimpl. - `test/test_helper.exs` loads `.env` and works around a self-host-only compile-order issue: when vfs is the host project, `:exgit` builds before our `VFS.Mountable` exists, so its `Code.ensure_loaded?`-guarded workspace defimpl is skipped; we recompile the source in-place once both modules are loaded. Downstream consumers don't hit this — mix orders vfs before exgit via the optional edge in exgit/mix.exs. - Gitignore `.env`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switches the integration test from "long-lived ci.git + ambient bearer" to provisioning a fresh repo + minting a write-scoped token per run via the `Exgit.CloudflareArtifacts` management client. This covers the control plane (create_repo, mint_token, delete_repo) on top of the data plane (push, clone), which is the full agent-loop boot sequence. Each network op is wrapped in `:timer.tc` and printed. Across 10 runs: op min median mean p95 max create_repo 954 1120 1293 1703 2105 ms mint_token 242 283 515 514 2345 ms push_seed 690 856 1291 2962 3300 ms clone_boot 184 203 217 252 298 ms push_modified 121 165 170 199 207 ms clone_rehydrate 230 264 279 325 357 ms delete_repo 724 1006 1160 1598 1990 ms total 3875 4614 4926 6182 6730 ms Control-plane ops dominate (~1s each); git data plane is fast (<300ms per op). End-to-end rehydration round-trip is single-digit seconds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When `CF_BENCH_LOG=path` is set, the test appends one JSON line per run (timestamp, repo name, per-op timings, total) to that path. When unset, it falls back to the human-readable stdout summary. `bench/cf_aggregate.exs <jsonl>` reads the file and prints min / median / mean / p90 / p95 / p99 / max per op. Replaces the previous "loop the test, grep IO.puts, awk a histogram" workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end integration test that proves the artifact-backed agent-loop story: provision a fresh CF artifacts repo + write token via the management API, seed initial state, boot agent 1 by cloning + mounting through VFS, modify the working tree via
VFS.write_file, commit + push, drop all in-memory state, then rehydrate agent 2 from a fresh clone and verify session 1's edits persisted.:exgitto pull inExgit.Workspace+ itsVFS.Mountabledefimpl, so writes flow through the unified%VFS{}surface (no manual blob/tree/commit composition).:integration_network+:cloudflare; self-skips whenCF_API_TOKEN/CF_ACCOUNT_IDaren't set, and deletes the provisioned repo on teardown.CF_BENCH_LOG=pathis set, each run appends a JSONL record (timestamp, repo name, per-op timings, total) to that path.bench/cf_aggregate.exs <jsonl>reads it and prints min/p50/mean/p90/p95/p99/max per op.The compile-order workaround in
test_helper.exsExgit.Workspace'sVFS.Mountabledefimpl is guarded byCode.ensure_loaded?(VFS.Mountable). Mix always builds deps before the host project's lib, so when:exgitcompiles inside the vfs repo, our protocol module doesn't exist yet → the guard fails → no defimpl beam. Downstream consumers don't hit this — mix orders vfs before exgit via theoptionaledge declared inexgit/mix.exs. The breakage is specific to self-hosting the protocol's own repo. The test_helper compiles the defimpl source in-place once both modules are loaded.Per-op timings (n=50)
create_repomint_tokenpush_seedclone_bootpush_modifiedclone_rehydratedelete_repo(ms)
Read of the data:
create_repo,mint_token,delete_repo) dominates and has heavy right-tails. p50/p99 ratios are 2.7×, 2.6×, 3.8× respectively.create_repoanddelete_repop99 ≈ 3s.clone,push) is fast and tight. All four ops 100–600ms across the entire run; p50/p99 under 2.6×. Predictable enough to budget per agent turn.push_seedhas the worst tail — p50 884ms, p99 3.7s. First push to a freshly-created repo competes with whatever post-create work the server is still doing.clone_rehydrate+push_modified, no create/delete): p50 ≈ 390ms, p99 ≈ 860ms.Test plan
mix test --include integration_network test/integration/cloudflare_artifacts_test.exs(with CF env vars set)mix teststill green (325 tests, 46 properties, 36 doctests, 0 failures)CF_BENCH_LOG=...) +bench/cf_aggregate.exsaggregator