Summary
When running multiple eval files in a single agentv eval run invocation against the claude-cli target, the provider crashes after ~4 tests with a SessionStart hook error. All subsequent tests fail with the same error.
Reproduction
# This fails after ~4 tests:
agentv eval run --target claude-cli --workers 1 "evals/hivespec/*.eval.yaml"
# This works — each invocation gets a fresh session:
agentv eval run --target claude-cli --workers 1 evals/hivespec/hs-claim.eval.yaml
agentv eval run --target claude-cli --workers 1 evals/hivespec/hs-explore.eval.yaml
# etc.
Error
{"type":"system","subtype":"hook_started","hook_id":"...","hook_name":"SessionStart:startup","hook_event":"SessionStart","session_id":"..."}
Claude CLI exits with code 1. The error is a raw JSON event from the session hook, not a proper error message.
Observed behavior
- Tests 1-4: all PASS (1.000)
- Tests 5-17: all ERROR with the same SessionStart hook failure
- When the same 17 tests are run as 5 separate invocations (one per eval file), all complete successfully
Expected behavior
All 17 tests should complete when run in a single invocation. The claude-cli provider should clean up sessions between tests and handle hook errors gracefully.
Investigation starting points
- Claude CLI provider: `packages/core/src/evaluation/providers/` — find the claude-cli provider implementation
- Check how sessions are created and torn down between test cases
- Check if there's a session cleanup or process kill between tests
- Compare with pi-cli provider which handles 17 sequential tests without issues
Acceptance signals
Non-goals
- Parallel Claude CLI sessions (workers > 1) — that's a separate concern
- Changing how Claude CLI hooks work — the fix should be in agentv's provider layer
Summary
When running multiple eval files in a single
agentv eval runinvocation against theclaude-clitarget, the provider crashes after ~4 tests with aSessionStarthook error. All subsequent tests fail with the same error.Reproduction
Error
{"type":"system","subtype":"hook_started","hook_id":"...","hook_name":"SessionStart:startup","hook_event":"SessionStart","session_id":"..."}Claude CLI exits with code 1. The error is a raw JSON event from the session hook, not a proper error message.
Observed behavior
Expected behavior
All 17 tests should complete when run in a single invocation. The claude-cli provider should clean up sessions between tests and handle hook errors gracefully.
Investigation starting points
Acceptance signals
Non-goals