Skip to content

feat(config): add hooks.pre_run for pre-eval environment injection#1150

Merged
christso merged 7 commits intomainfrom
feat/1149-pre-run-hook
Apr 23, 2026
Merged

feat(config): add hooks.pre_run for pre-eval environment injection#1150
christso merged 7 commits intomainfrom
feat/1149-pre-run-hook

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 23, 2026

Summary

  • Adds hooks.pre_run to .agentv/config.yaml and hooks.preRun to agentv.config.ts (defineConfig)
  • Runs a shell command before the eval starts; parses stdout for env vars and injects them into process.env
  • Existing env vars are never overwritten — process env always wins
  • Stderr from the hook is forwarded to the user; non-zero exit aborts the eval
  • Bonus fix: --dry-run mock response now satisfies all LLM grader schemas — no more parse errors when running end-to-end harness tests with graders

Closes #1149

What was implemented

New file: packages/core/src/evaluation/hooks.ts

  • parseEnvOutput(stdout) — parses export KEY="value" and KEY=value lines from stdout
  • runPreRunHook(command) — spawns via sh -c, captures stdout, parses and injects env vars, forwards stderr

Config schema changes

  • packages/core/src/evaluation/config.ts: added hooks.preRun (optional string) to AgentVConfigSchema
  • packages/core/src/evaluation/loaders/config-loader.ts: added HooksConfig type, parseHooksConfig(), wired into loadConfig()
  • plugins/agentv-dev/skills/agentv-eval-builder/references/config-schema.json: added hooks.pre_run for YAML IDE autocomplete

Wired into CLI

  • apps/cli/src/commands/eval/run-eval.ts: calls runPreRunHook after version check, before normalizeOptions — so secrets are available for all subsequent env lookups

Precedence

YAML config hooks.pre_run takes priority over TS config hooks.preRun, matching the existing pattern for other settings.

Dry-run mock response fix

  • apps/cli/src/commands/eval/targets.ts + run-eval.ts: changed dry-run mock response from {"answer":"Mock dry-run response"} (invalid grader response) to {"score":1,"assertions":[],"checks":[],"overall_reasoning":"dry-run mock"} — satisfies all three LLM grader schemas (freeform, rubric, score-range)
  • packages/core/src/evaluation/graders/llm-grader.ts: exported scoreRangeEvaluationSchema
  • packages/core/test/evaluation/graders/dry-run-mock-response.test.ts: regression test verifying all three schema validations

Test plan

  • 12 unit tests for parseEnvOutput covering dotenv, shell-export, quoted/unquoted, equals-in-value, comments, blanks, invalid lines
  • 4 regression tests for dry-run mock response schema compatibility (freeform, rubric, score-range)
  • All 2275 existing tests pass (bun run test)
  • Build passes (bun run build)
  • Lint passes (bun run lint)
  • Pre-push hook (build + typecheck + lint + test + validate-examples) passes

Red/green UAT evidence

Pre-run hook

Red (before — no hook support):

Artifact directory: .agentv/results/runs/default/...
Using target: llm → llm-dry-run
0/7   🔄 code-review-javascript | llm → llm-dry-run
...

No hook output — hooks.pre_run field did not exist.

Green (with hooks.pre_run: "sh /tmp/test-agentv-hook.sh" in .agentv/config.yaml):

Running pre-run hook: sh /tmp/test-agentv-hook.sh
Pre-run hook injected 2 environment variable(s).
Artifact directory: .agentv/results/runs/default/...
Using target: llm → llm-dry-run
0/7   🔄 code-review-javascript | llm → llm-dry-run
...

Hook fires, 2 env vars injected (HOOK_FIRED=1, TEST_SECRET=hello-from-hook), eval proceeds normally.

Dry-run LLM grader fix

Red (before fix):

⚠ LLM grader "llm-grader" failed after 3 attempts (Failed to parse evaluator response after 3 attempts and 1 structure-fix attempt: [
  { "code": "invalid_type", "expected": "number", "received": "undefined", "path": ["score"], "message": "Required" }
]) — skipped
1/7   ⚠️ code-review-javascript | llm → llm-dry-run | 0% FAIL
2/7   ⚠️ feature-proposal-brainstorm | llm → llm-dry-run | 0% FAIL
...

Green (after fix):

1/7   ✅ code-review-javascript | llm → llm-dry-run | 100% PASS
2/7   ✅ feature-proposal-brainstorm | llm → llm-dry-run | 100% PASS
...
RESULT: FAIL  (6/7 scored >= 80%, mean: 93%)

No LLM grader parse errors. All tests run cleanly through the full grader pipeline.

Adds a hooks.pre_run field to both agentv.config.ts (as hooks.preRun)
and .agentv/config.yaml (as hooks.pre_run) that runs a shell command
before an eval starts and injects its exported env vars into process.env.

- New hooks.ts utility: parseEnvOutput + runPreRunHook
- Parses both `export KEY="value"` and `KEY=value` stdout formats
- Only injects keys not already set in process.env (existing env wins)
- Forwards stderr to process.stderr; non-zero exit aborts the eval
- Wired into runEvalCommand before normalizeOptions so secrets are
  available for all subsequent config and env lookups
- JSON schema updated for .agentv/config.yaml IDE autocomplete

Closes #1149

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 23, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 30ee637
Status: ✅  Deploy successful!
Preview URL: https://332cce72.agentv.pages.dev
Branch Preview URL: https://feat-1149-pre-run-hook.agentv.pages.dev

View logs

christso and others added 6 commits April 23, 2026 12:24
… claim tracking

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
--dry-run previously returned '{"answer":"Mock dry-run response"}' which
caused LLM graders to fail with 'Required: score' parse errors after 3
attempts. The mock response now satisfies all three grader schemas
(freeform, rubric, score-range) so --dry-run works end-to-end including
grader plumbing without real LLM calls.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ypoint

Hook now runs once per agentv invocation (not once per eval run), covering
all commands including interactive mode. Equivalent to the project-level
wrapper script pattern from the end user's perspective.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@christso christso marked this pull request as ready for review April 23, 2026 03:16
@christso christso merged commit 12c1dd9 into main Apr 23, 2026
4 checks passed
@christso christso deleted the feat/1149-pre-run-hook branch April 23, 2026 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(config): add hooks.pre_run for pre-eval environment injection

1 participant