fix(pipeline): align subagent-mode suite fallback with CLI mode by christso · Pull Request #1151 · EntityProcess/agentv

christso · 2026-04-23T05:11:41Z

Summary

Follow-up to #1148. Two connected changes:

1. Align subagent-mode suite fallback with CLI mode (code)

Before this PR, pipeline input and pipeline run resolved the suite (evalset) directory name from suite.metadata?.name only. If the eval.yaml had no name field, suite.metadata was undefined and the artifact layout skipped the suite segment entirely — writing to <run-dir>/<test-id>/... directly.

CLI mode (agentv eval, via artifact-writer.ts:buildArtifactSubdir consuming test.suite) already falls back through metadata.name → eval-file basename (stripping .eval.yaml) → 'eval'. This fallback is applied by the loaders (yaml-parser.ts:317-324, jsonl-parser.ts:165) and attached to every test as test.suite.

Fix: switch pipeline input:137 and pipeline run:161 to read tests[0]?.suite. Both modes now produce the same <run-dir>/<evalset>/<test-id>/ layout. pipeline bench and pipeline grade consume manifest.suite (written by input/run), so they pick up the change automatically — no consumer edits needed.

Red/green evidence (eval.yaml with no name field):

Before:

/tmp/testout/
├── manifest.json  ← no "suite" field
└── test-01/
    └── code_graders/

After:

/tmp/testout/
├── manifest.json  ← "suite": "no-name"
└── no-name/       ← derived from no-name.eval.yaml basename
    └── test-01/
        └── code_graders/

Regression test: apps/cli/test/commands/eval/pipeline/input.test.ts — "falls back to eval file basename for suite directory when name is absent".

2. Doc cleanups in agentv-bench skill

Three small clarifications that came up while reviewing #1148:

grader.md:21 — stale bench-dir example (results/export/ → results/runs/) and clarify the evalset segment is always present (after the fix in BbEval TypeScript Migration #1).
grader.md Field Descriptions — split into fields consumed by pipeline bench (score, assertions[]) vs kept for traceability (summary, claims, user_notes_summary, eval_feedback). Per bench.ts:86-107 only the first two are merged. Also removed orphaned execution_metrics/timing entries that fix(agentv-bench): fix grader subagent pipeline bugs #1148 dropped from the JSON example but not from the description list.
SKILL.md Phase 1 — added a one-liner on how orchestrators detect tests that don't need Phase 2: check whether <test-id>/llm_graders/ has any .json files. pipeline input only populates it for llm-grader assertions (per input.ts:239-263).
subagent-pipeline.md — updated artifact tree to reflect always-present evalset segment.

Test plan

bun run build — passed (pre-push hook)
bun run typecheck — passed (pre-push hook)
bun run lint — passed (pre-push hook)
bun run test — passed (pre-push hook)
bun run validate:examples — passed (pre-push hook)
Manual red/green UAT with no-name.eval.yaml — documented above
Regression test added for the no-name fallback

🤖 Generated with Claude Code

Follow-up to #1148. Three small clarifications: - grader.md: fix stale `bench-dir` example (`results/export/` → `results/runs/`) and clarify that `bench-dir` already includes the `<evalset>` segment. The output path spec in Step 9 assumes this. - grader.md: annotate Field Descriptions to distinguish fields consumed by `pipeline bench` (`score`, `assertions[]`) from fields kept on disk for traceability. Also remove `execution_metrics` and `timing` from the list — #1148 dropped them from the JSON example but the descriptions still referenced them. - SKILL.md Phase 1: add a one-liner on how orchestrators detect which tests need Phase 2 — check whether `<test-id>/llm_graders/` has any `.json` files. `pipeline input` only populates it for `llm-grader` assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-04-23T05:12:00Z

Deploying agentv with Cloudflare Pages

Latest commit:	`d4c8f02`
Status:	✅ Deploy successful!
Preview URL:	https://721eec9a.agentv.pages.dev
Branch Preview URL:	https://docs-grader-followup-stale-r.agentv.pages.dev

View logs

Before this change, `pipeline input` and `pipeline run` resolved the suite (evalset) directory name from `suite.metadata?.name` only. If the eval.yaml had no `name` field, `suite.metadata` was undefined and the artifact layout skipped the suite segment entirely — writing to `<run-dir>/<test-id>/...` directly. CLI mode (`agentv eval`, via `artifact-writer.ts:buildArtifactSubdir` consuming `test.suite`) already falls back through `metadata.name` → eval-file basename (stripping `.eval.yaml`) → `'eval'`. This fallback is applied by the loaders (`yaml-parser.ts:317-324`, `jsonl-parser.ts:165`) and attached to every test as `test.suite`. Switch `pipeline input` and `pipeline run` to read `tests[0]?.suite` so subagent-mode runs produce the same `<evalset>/<test-id>/` layout CLI mode produces. `pipeline bench` and `pipeline grade` consume `manifest.suite` which is written by these two, so they pick up the change automatically — no consumer edits needed. Docs in the agentv-bench skill updated to match: `<evalset>` is now always present in the artifact tree, not conditional. Regression test covers the no-`name` case — previously resolved to `<run-dir>/<test-id>/`, now resolves to `<run-dir>/no-name/<test-id>/` matching `no-name.eval.yaml`'s basename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

christso marked this pull request as ready for review April 23, 2026 05:12

christso changed the title ~~docs(agentv-bench): clean up stale grader references after #1148~~ fix(pipeline): align subagent-mode suite fallback with CLI mode Apr 23, 2026

christso merged commit 8116897 into main Apr 23, 2026
4 checks passed

christso deleted the docs/grader-followup-stale-refs branch April 23, 2026 07:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pipeline): align subagent-mode suite fallback with CLI mode#1151

fix(pipeline): align subagent-mode suite fallback with CLI mode#1151
christso merged 2 commits intomainfrom
docs/grader-followup-stale-refs

christso commented Apr 23, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

christso commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Align subagent-mode suite fallback with CLI mode (code)

2. Doc cleanups in agentv-bench skill

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Apr 23, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Apr 23, 2026 •

edited

Loading