fix(pipeline): align subagent-mode suite fallback with CLI mode#1151
Merged
fix(pipeline): align subagent-mode suite fallback with CLI mode#1151
Conversation
Follow-up to #1148. Three small clarifications: - grader.md: fix stale `bench-dir` example (`results/export/` → `results/runs/`) and clarify that `bench-dir` already includes the `<evalset>` segment. The output path spec in Step 9 assumes this. - grader.md: annotate Field Descriptions to distinguish fields consumed by `pipeline bench` (`score`, `assertions[]`) from fields kept on disk for traceability. Also remove `execution_metrics` and `timing` from the list — #1148 dropped them from the JSON example but the descriptions still referenced them. - SKILL.md Phase 1: add a one-liner on how orchestrators detect which tests need Phase 2 — check whether `<test-id>/llm_graders/` has any `.json` files. `pipeline input` only populates it for `llm-grader` assertions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deploying agentv with
|
| Latest commit: |
d4c8f02
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://721eec9a.agentv.pages.dev |
| Branch Preview URL: | https://docs-grader-followup-stale-r.agentv.pages.dev |
Before this change, `pipeline input` and `pipeline run` resolved the suite (evalset) directory name from `suite.metadata?.name` only. If the eval.yaml had no `name` field, `suite.metadata` was undefined and the artifact layout skipped the suite segment entirely — writing to `<run-dir>/<test-id>/...` directly. CLI mode (`agentv eval`, via `artifact-writer.ts:buildArtifactSubdir` consuming `test.suite`) already falls back through `metadata.name` → eval-file basename (stripping `.eval.yaml`) → `'eval'`. This fallback is applied by the loaders (`yaml-parser.ts:317-324`, `jsonl-parser.ts:165`) and attached to every test as `test.suite`. Switch `pipeline input` and `pipeline run` to read `tests[0]?.suite` so subagent-mode runs produce the same `<evalset>/<test-id>/` layout CLI mode produces. `pipeline bench` and `pipeline grade` consume `manifest.suite` which is written by these two, so they pick up the change automatically — no consumer edits needed. Docs in the agentv-bench skill updated to match: `<evalset>` is now always present in the artifact tree, not conditional. Regression test covers the no-`name` case — previously resolved to `<run-dir>/<test-id>/`, now resolves to `<run-dir>/no-name/<test-id>/` matching `no-name.eval.yaml`'s basename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 23, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #1148. Two connected changes:
1. Align subagent-mode suite fallback with CLI mode (code)
Before this PR,
pipeline inputandpipeline runresolved the suite (evalset) directory name fromsuite.metadata?.nameonly. If the eval.yaml had nonamefield,suite.metadatawas undefined and the artifact layout skipped the suite segment entirely — writing to<run-dir>/<test-id>/...directly.CLI mode (
agentv eval, viaartifact-writer.ts:buildArtifactSubdirconsumingtest.suite) already falls back throughmetadata.name→ eval-file basename (stripping.eval.yaml) →'eval'. This fallback is applied by the loaders (yaml-parser.ts:317-324,jsonl-parser.ts:165) and attached to every test astest.suite.Fix: switch
pipeline input:137andpipeline run:161to readtests[0]?.suite. Both modes now produce the same<run-dir>/<evalset>/<test-id>/layout.pipeline benchandpipeline gradeconsumemanifest.suite(written byinput/run), so they pick up the change automatically — no consumer edits needed.Red/green evidence (eval.yaml with no
namefield):Before:
After:
Regression test:
apps/cli/test/commands/eval/pipeline/input.test.ts— "falls back to eval file basename for suite directory when name is absent".2. Doc cleanups in agentv-bench skill
Three small clarifications that came up while reviewing #1148:
grader.md:21— stalebench-direxample (results/export/→results/runs/) and clarify the evalset segment is always present (after the fix in BbEval TypeScript Migration #1).grader.mdField Descriptions — split into fields consumed bypipeline bench(score,assertions[]) vs kept for traceability (summary,claims,user_notes_summary,eval_feedback). Perbench.ts:86-107only the first two are merged. Also removed orphanedexecution_metrics/timingentries that fix(agentv-bench): fix grader subagent pipeline bugs #1148 dropped from the JSON example but not from the description list.SKILL.mdPhase 1 — added a one-liner on how orchestrators detect tests that don't need Phase 2: check whether<test-id>/llm_graders/has any.jsonfiles.pipeline inputonly populates it forllm-graderassertions (perinput.ts:239-263).subagent-pipeline.md— updated artifact tree to reflect always-present evalset segment.Test plan
bun run build— passed (pre-push hook)bun run typecheck— passed (pre-push hook)bun run lint— passed (pre-push hook)bun run test— passed (pre-push hook)bun run validate:examples— passed (pre-push hook)no-name.eval.yaml— documented abovenamefallback🤖 Generated with Claude Code