test: live e2e tests + Test - Live workflow (port of pytest -m live) by dacorvo · Pull Request #34 · huggingface/agentcap

dacorvo · 2026-06-25T08:34:37Z

Ports agentcap's live tier to Rust — the prerequisite to removing the Python client. Replaces test_cli_live.py + test_drivers_live.py + linux-live-tests.yml.

`tests/live.rs`

Per-agent end-to-end tests that run the real agentcap run binary (via CARGO_BIN_EXE_agentcap) against a live OpenAI-compatible server and assert the wire path (not task quality):

run.json shape (agent/model/upstream/turns), completed_turns == 1, a minted session_id;
captures landed in <run>/captures/;
for pi, the streamed .jsonl trace.

This single e2e per agent subsumes both Python live files (the CLI run test and the per-driver tests — completed_turns==1 + captures ⇒ the agent reached the model through the proxy and the turn succeeded). Covers pi, hermes, goose; opencode is omitted for the same reason it's @pytest.mark.skip'd (1.15.x doesn't pick up the baked agent.minimal).

Gating: #[ignore]d so cargo test stays hermetic; resolves the server from AGENTCAP_TEST_LLM_URL (else a :8000/:8080 probe) and skips (passes) when none is reachable.

`.github/workflows/live.yml` — "Test - Live"

Ports the proven Python live setup: install podman, cache + download the pinned Qwen3-1.7B GGUF, cache the rootless image store, spawn the pinned llama.cpp:server container (wait for /v1/models), set AGENTCAP_TEST_LLM_URL, then cargo test --test live -- --ignored --test-threads=1. Per-agent sandbox images build on demand via the binary (cached across runs). Triggers on push/PR/dispatch like the Python one.

Validated locally

fmt/clippy green; full suite hermetic (live shows as ignored); cargo test --test live -- --ignored skip-passes with no server.

⚠️ The live workflow itself (podman + GGUF + real inference) can only be exercised in CI — I can't run podman/GGUF here. This PR's Test - Live run is its first real execution; I'll watch it and fix anything that trips. Once it's green, the Python live tier can be removed in the cutover.

🤖 Generated with Claude Code

Prerequisite to removing the Python client: replace the Python live tier (linux-live-tests.yml, test_cli_live.py, test_drivers_live.py) with a Rust port. - tests/live.rs: per-agent end-to-end tests that run the real `agentcap run` binary (via CARGO_BIN_EXE) against a live OpenAI-compatible server, asserting the wire path — run.json shape, completed_turns, captures landed, and pi's streamed JSONL trace. Subsumes both Python live files (CLI e2e + per-driver). `#[ignore]`d so `cargo test` stays hermetic; each test skips (passes) when no server is reachable. opencode omitted (same reason as the Python skip). - .github/workflows/live.yml ("Test - Live"): ports the proven Python live setup — install podman, cache + download the Qwen3-1.7B GGUF, cache the rootless image store, spawn the pinned llama.cpp server, then `cargo test --test live -- --ignored` (serial). Sandbox images build on demand via the binary. Gated by AGENTCAP_TEST_LLM_URL (else a :8000/:8080 probe). Verified locally: fmt/clippy green, full suite hermetic (live ignored), live tests skip-pass with no server. The live workflow itself needs podman + GGUF, so it's validated in CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…default reqwest::blocking defaults to a 30s total-request timeout; a slow streamed generation (e.g. an agent turn on a CPU runner) blows past it, so the proxy's upstream read errors mid-stream and the agent's turn never completes. Cap at a generous-but-finite 900s instead (synth follow-up: 300s). Restores the live workflow to all agents. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s not on `run`) hermes' base prompt exceeds the tiny CI model's budget and bails before any model call; the Python suite only ran hermes at the driver level with ignore_rules/toolsets, which `agentcap run` doesn't expose. Keep pi + goose, which cover the full stack across both trace mechanisms.

dacorvo and others added 5 commits June 25, 2026 08:34

test(live): add failure diagnostics; scope CI to pi while debugging

e7ac69c

fix(proxy): lower per-request timeout cap 900s -> 300s

22dbc39

dacorvo merged commit 4eab149 into main Jun 25, 2026
9 checks passed

dacorvo deleted the rust-live-tests branch June 26, 2026 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: live e2e tests + Test - Live workflow (port of pytest -m live)#34

test: live e2e tests + Test - Live workflow (port of pytest -m live)#34
dacorvo merged 5 commits into
mainfrom
rust-live-tests

dacorvo commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dacorvo commented Jun 25, 2026

tests/live.rs

.github/workflows/live.yml — "Test - Live"

Validated locally

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`tests/live.rs`

`.github/workflows/live.yml` — "Test - Live"