Skip to content

fix(pipeline): align subagent-mode suite fallback with CLI mode#1151

Merged
christso merged 2 commits intomainfrom
docs/grader-followup-stale-refs
Apr 23, 2026
Merged

fix(pipeline): align subagent-mode suite fallback with CLI mode#1151
christso merged 2 commits intomainfrom
docs/grader-followup-stale-refs

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 23, 2026

Summary

Follow-up to #1148. Two connected changes:

1. Align subagent-mode suite fallback with CLI mode (code)

Before this PR, pipeline input and pipeline run resolved the suite (evalset) directory name from suite.metadata?.name only. If the eval.yaml had no name field, suite.metadata was undefined and the artifact layout skipped the suite segment entirely — writing to <run-dir>/<test-id>/... directly.

CLI mode (agentv eval, via artifact-writer.ts:buildArtifactSubdir consuming test.suite) already falls back through metadata.name → eval-file basename (stripping .eval.yaml) → 'eval'. This fallback is applied by the loaders (yaml-parser.ts:317-324, jsonl-parser.ts:165) and attached to every test as test.suite.

Fix: switch pipeline input:137 and pipeline run:161 to read tests[0]?.suite. Both modes now produce the same <run-dir>/<evalset>/<test-id>/ layout. pipeline bench and pipeline grade consume manifest.suite (written by input/run), so they pick up the change automatically — no consumer edits needed.

Red/green evidence (eval.yaml with no name field):

Before:

/tmp/testout/
├── manifest.json  ← no "suite" field
└── test-01/
    └── code_graders/

After:

/tmp/testout/
├── manifest.json  ← "suite": "no-name"
└── no-name/       ← derived from no-name.eval.yaml basename
    └── test-01/
        └── code_graders/

Regression test: apps/cli/test/commands/eval/pipeline/input.test.ts — "falls back to eval file basename for suite directory when name is absent".

2. Doc cleanups in agentv-bench skill

Three small clarifications that came up while reviewing #1148:

  • grader.md:21 — stale bench-dir example (results/export/results/runs/) and clarify the evalset segment is always present (after the fix in BbEval TypeScript Migration #1).
  • grader.md Field Descriptions — split into fields consumed by pipeline bench (score, assertions[]) vs kept for traceability (summary, claims, user_notes_summary, eval_feedback). Per bench.ts:86-107 only the first two are merged. Also removed orphaned execution_metrics/timing entries that fix(agentv-bench): fix grader subagent pipeline bugs #1148 dropped from the JSON example but not from the description list.
  • SKILL.md Phase 1 — added a one-liner on how orchestrators detect tests that don't need Phase 2: check whether <test-id>/llm_graders/ has any .json files. pipeline input only populates it for llm-grader assertions (per input.ts:239-263).
  • subagent-pipeline.md — updated artifact tree to reflect always-present evalset segment.

Test plan

  • bun run build — passed (pre-push hook)
  • bun run typecheck — passed (pre-push hook)
  • bun run lint — passed (pre-push hook)
  • bun run test — passed (pre-push hook)
  • bun run validate:examples — passed (pre-push hook)
  • Manual red/green UAT with no-name.eval.yaml — documented above
  • Regression test added for the no-name fallback

🤖 Generated with Claude Code

Follow-up to #1148. Three small clarifications:

- grader.md: fix stale `bench-dir` example (`results/export/` →
  `results/runs/`) and clarify that `bench-dir` already includes the
  `<evalset>` segment. The output path spec in Step 9 assumes this.

- grader.md: annotate Field Descriptions to distinguish fields
  consumed by `pipeline bench` (`score`, `assertions[]`) from fields
  kept on disk for traceability. Also remove `execution_metrics` and
  `timing` from the list — #1148 dropped them from the JSON example
  but the descriptions still referenced them.

- SKILL.md Phase 1: add a one-liner on how orchestrators detect
  which tests need Phase 2 — check whether `<test-id>/llm_graders/`
  has any `.json` files. `pipeline input` only populates it for
  `llm-grader` assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 23, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: d4c8f02
Status: ✅  Deploy successful!
Preview URL: https://721eec9a.agentv.pages.dev
Branch Preview URL: https://docs-grader-followup-stale-r.agentv.pages.dev

View logs

@christso christso marked this pull request as ready for review April 23, 2026 05:12
Before this change, `pipeline input` and `pipeline run` resolved the
suite (evalset) directory name from `suite.metadata?.name` only. If the
eval.yaml had no `name` field, `suite.metadata` was undefined and the
artifact layout skipped the suite segment entirely — writing to
`<run-dir>/<test-id>/...` directly.

CLI mode (`agentv eval`, via `artifact-writer.ts:buildArtifactSubdir`
consuming `test.suite`) already falls back through `metadata.name` →
eval-file basename (stripping `.eval.yaml`) → `'eval'`. This fallback
is applied by the loaders (`yaml-parser.ts:317-324`,
`jsonl-parser.ts:165`) and attached to every test as `test.suite`.

Switch `pipeline input` and `pipeline run` to read `tests[0]?.suite`
so subagent-mode runs produce the same `<evalset>/<test-id>/` layout
CLI mode produces. `pipeline bench` and `pipeline grade` consume
`manifest.suite` which is written by these two, so they pick up the
change automatically — no consumer edits needed.

Docs in the agentv-bench skill updated to match: `<evalset>` is now
always present in the artifact tree, not conditional.

Regression test covers the no-`name` case — previously resolved to
`<run-dir>/<test-id>/`, now resolves to `<run-dir>/no-name/<test-id>/`
matching `no-name.eval.yaml`'s basename.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@christso christso changed the title docs(agentv-bench): clean up stale grader references after #1148 fix(pipeline): align subagent-mode suite fallback with CLI mode Apr 23, 2026
@christso christso merged commit 8116897 into main Apr 23, 2026
4 checks passed
@christso christso deleted the docs/grader-followup-stale-refs branch April 23, 2026 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant