test: live e2e tests + Test - Live workflow (port of pytest -m live)#34
Merged
Conversation
Prerequisite to removing the Python client: replace the Python live tier
(linux-live-tests.yml, test_cli_live.py, test_drivers_live.py) with a Rust port.
- tests/live.rs: per-agent end-to-end tests that run the real `agentcap run`
binary (via CARGO_BIN_EXE) against a live OpenAI-compatible server, asserting
the wire path — run.json shape, completed_turns, captures landed, and pi's
streamed JSONL trace. Subsumes both Python live files (CLI e2e + per-driver).
`#[ignore]`d so `cargo test` stays hermetic; each test skips (passes) when no
server is reachable. opencode omitted (same reason as the Python skip).
- .github/workflows/live.yml ("Test - Live"): ports the proven Python live
setup — install podman, cache + download the Qwen3-1.7B GGUF, cache the
rootless image store, spawn the pinned llama.cpp server, then
`cargo test --test live -- --ignored` (serial). Sandbox images build on
demand via the binary.
Gated by AGENTCAP_TEST_LLM_URL (else a :8000/:8080 probe). Verified locally:
fmt/clippy green, full suite hermetic (live ignored), live tests skip-pass with
no server. The live workflow itself needs podman + GGUF, so it's validated in CI.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…default reqwest::blocking defaults to a 30s total-request timeout; a slow streamed generation (e.g. an agent turn on a CPU runner) blows past it, so the proxy's upstream read errors mid-stream and the agent's turn never completes. Cap at a generous-but-finite 900s instead (synth follow-up: 300s). Restores the live workflow to all agents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s not on `run`) hermes' base prompt exceeds the tiny CI model's budget and bails before any model call; the Python suite only ran hermes at the driver level with ignore_rules/toolsets, which `agentcap run` doesn't expose. Keep pi + goose, which cover the full stack across both trace mechanisms.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ports agentcap's live tier to Rust — the prerequisite to removing the Python client. Replaces
test_cli_live.py+test_drivers_live.py+linux-live-tests.yml.tests/live.rsPer-agent end-to-end tests that run the real
agentcap runbinary (viaCARGO_BIN_EXE_agentcap) against a live OpenAI-compatible server and assert the wire path (not task quality):run.jsonshape (agent/model/upstream/turns),completed_turns == 1, a mintedsession_id;<run>/captures/;.jsonltrace.This single e2e per agent subsumes both Python live files (the CLI run test and the per-driver tests —
completed_turns==1+ captures ⇒ the agent reached the model through the proxy and the turn succeeded). Covers pi, hermes, goose; opencode is omitted for the same reason it's@pytest.mark.skip'd (1.15.x doesn't pick up the bakedagent.minimal).Gating:
#[ignore]d socargo teststays hermetic; resolves the server fromAGENTCAP_TEST_LLM_URL(else a:8000/:8080probe) and skips (passes) when none is reachable..github/workflows/live.yml— "Test - Live"Ports the proven Python live setup: install podman, cache + download the pinned Qwen3-1.7B GGUF, cache the rootless image store, spawn the pinned
llama.cpp:servercontainer (wait for/v1/models), setAGENTCAP_TEST_LLM_URL, thencargo test --test live -- --ignored --test-threads=1. Per-agent sandbox images build on demand via the binary (cached across runs). Triggers on push/PR/dispatch like the Python one.Validated locally
fmt/clippy green; full suite hermetic (live shows as ignored);
cargo test --test live -- --ignoredskip-passes with no server.Test - Liverun is its first real execution; I'll watch it and fix anything that trips. Once it's green, the Python live tier can be removed in the cutover.🤖 Generated with Claude Code