Skip to content

feat(cli): extend --retry-errors to resume missing cases from crashed runs#976

Merged
christso merged 3 commits intomainfrom
feat/975-retry-resume-missing
Apr 8, 2026
Merged

feat(cli): extend --retry-errors to resume missing cases from crashed runs#976
christso merged 3 commits intomainfrom
feat/975-retry-resume-missing

Conversation

@christso
Copy link
Copy Markdown
Collaborator

@christso christso commented Apr 8, 2026

Closes #975

Summary

  • Extends --retry-errors to also re-run test cases that are missing from the previous output (e.g., due to a crash or Ctrl+C mid-run), not just execution_error cases
  • Uses a micromatch negation filter to exclude completed (non-error) test IDs, so any test not in the previous output runs naturally
  • No new checkpoint system, CLI commands, or flags — reuses existing index.jsonl artifacts
  • Matrix-safe: only excludes test IDs where ALL results across targets are non-error
  • Escapes glob metacharacters in test IDs for correct literal matching

How it works

Previously --retry-errors built an inclusion filter matching only error test IDs. Now it builds an exclusion filter (!{completed-1,completed-2,...}) that skips already-completed tests. This naturally includes:

  1. Tests that errored (execution_error)
  2. Tests that never ran (missing from output due to crash)

Test plan

  • Unit tests for loadFullyCompletedTestIds(), buildExclusionFilter() (12 tests pass)
  • Build, typecheck, lint all pass
  • Manual e2e: partial run (2/5 completed) → --retry-errors runs 3 missing cases, merges 2 preserved
  • Manual e2e: all completed → prints "No execution errors or missing cases. Nothing to retry."
  • Manual e2e: matrix case (same test ID ok on one target, errored on another) → re-runs on all targets (conservative)

🤖 Generated with Claude Code

christso and others added 2 commits April 8, 2026 13:16
… runs (#975)

Previously --retry-errors only re-ran execution_error test cases. If the
process crashed mid-run, cases that never started were lost. Now uses an
exclusion filter (negating completed test IDs) so both error cases AND
missing cases are re-run, enabling crash recovery without a new checkpoint
system.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 8, 2026

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9b392e8
Status: ✅  Deploy successful!
Preview URL: https://83b6df70.agentv.pages.dev
Branch Preview URL: https://feat-975-retry-resume-missin.agentv.pages.dev

View logs

@christso
Copy link
Copy Markdown
Collaborator Author

christso commented Apr 8, 2026

E2E Verification Results

Test 1: Crash recovery — missing cases are re-run

Red (before): --retry-errors on a partial output (2/5 tests completed) would print "No execution errors found. Nothing to retry." and exit, losing the 3 missing cases.

Green (after): --retry-errors on the same partial output:

Skipping 2 already-completed test(s).
0/3   🔄 code-quality-multi-eval | llm
0/3   🔄 summary-task | llm
0/3   🔄 summary-multi-criteria-score-ranges-proposed | llm
...
Merged 2 non-error result(s) from previous output.
Total tests: 5

Correctly ran the 3 missing cases and merged the 2 preserved results.

Test 2: Error cases still re-run (backward compat)

--retry-errors on output with execution_error cases:

Found 3 execution-error test(s): code-explanation-simple, technical-writing-detailed, code-quality-multi-eval
Skipping 1 already-completed test(s).
0/4   🔄 code-explanation-simple | llm
...
Merged 1 non-error result(s) from previous output.
Total tests: 5

Error cases re-run, completed cases preserved, total correct.

Test 3: All completed — nothing to retry

--retry-errors on output where all 5 tests passed:

Skipping 5 already-completed test(s).
All tests completed successfully in the previous run. Nothing to retry.

Exits cleanly without re-running anything.

1. Matrix safety: loadFullyCompletedTestIds only excludes test IDs where
   ALL results are non-error. If case-1 succeeded on target A but errored
   on target B, case-1 re-runs on both targets (conservative).

2. Glob escaping: buildExclusionFilter escapes micromatch metacharacters
   in test IDs so IDs with *, ?, [], {}, ! match literally.

3. Early-return message: changed from "All tests completed successfully"
   to "No execution errors or missing cases" since non-error results
   include quality_failure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@christso christso marked this pull request as ready for review April 8, 2026 14:35
@christso christso merged commit 5d28cf7 into main Apr 8, 2026
4 checks passed
@christso christso deleted the feat/975-retry-resume-missing branch April 8, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: mid-run checkpointing for crash recovery in eval orchestrator

1 participant