Objective
Add a "Stop run" affordance — both a UI button on /jobs/:runId and graceful CLI signal handling — so users can interrupt a long-running eval without orphaning the subprocess or losing partial results. Today there is no programmatic stop:
- Studio: the launch endpoint stores
process: ChildProcess per run but exposes no DELETE/stop route. Closing the browser tab leaves the CLI subprocess running until it completes naturally.
- CLI: no top-level SIGINT/SIGTERM handler.
Ctrl+C hard-kills the eval mid-test. The only child.kill() calls in the codebase live inside agent providers (claude-cli, codex-cli, pi-cli) terminating their own per-test subprocess on timeout — not the orchestrator handling user interrupt.
This pairs naturally with the resume feature shipped in #1220: today the workflow for "I want to bail on this run" is kill the terminal → resume in Studio. With a Stop button it becomes click Stop → click Resume, all without leaving the browser.
Current state — what already works
- Per-test results are flushed row-by-row into
index.jsonl as tests complete, so any partial state is durable on disk and resumable. The "stop" feature does not need to invent persistence — only graceful termination.
eval-runner.ts already retains a process: ChildProcess reference per Studio-launched run, so the server can process.kill('SIGTERM') once an endpoint is added.
Proposed changes
1. CLI signal handler
Register SIGINT / SIGTERM handlers at the top of apps/cli/src/commands/eval/run-eval.ts (or wherever the orchestrator entry point lives):
- On first signal: set a
stopRequested flag, allow in-flight tests to finish (they're already isolated), then exit cleanly with a non-zero code distinguishable from "crashed."
- On second signal: hard exit (so users can still escape if a test is hung).
- Print a concise message:
Stop requested — waiting for N in-flight test(s) to finish (Ctrl+C again to force-quit).
2. Studio API: DELETE /api/eval/run/:id
Add a route that:
- 404s if the run id is unknown.
- 403s in read-only mode (matches the existing guard on POST).
- 409s (or 200 with
{stopped: false}) if the run is already terminal.
- Otherwise calls
run.process?.kill('SIGTERM'), sets run.status = 'stopping', returns 202.
The existing child.on('close') handler will flip the status to failed/finished when the CLI exits.
Add benchmark-scoped variant DELETE /api/benchmarks/:benchmarkId/eval/run/:id matching the existing pattern.
3. UI: "Stop run" button on /jobs/:runId
In apps/studio/src/routes/jobs/$runId.tsx:
- Render a destructive-style button (red outline) when
status === 'starting' or 'running' and not in read-only mode.
- On click:
DELETE /api/eval/run/:id, optimistic-flip the status indicator to "Stopping…".
- After the run hits a terminal state, the existing UI already updates correctly.
- Disable in read-only mode (UI-level, the API also 403s).
4. Tests
- Server: in
apps/cli/test/commands/results/serve.test.ts, add cases for unknown id (404), read-only (403), and a happy-path stop using a fake long-running child.
- CLI: a small test that sends SIGINT to a multi-test eval run and asserts (a) exit code is the "stopped" sentinel and (b)
index.jsonl contains the rows for tests completed before the signal.
- UI: pure helper for "should the stop button render?" —
shouldShowStopButton(status, isReadOnly).
Acceptance signals
Non-goals
- No "Pause" semantics. Stop fully terminates; resume is the way to continue.
- No queue management. This is for one running job at a time — multi-job orchestration is out of scope.
- No SIGINT-to-grader translation. If a grader is mid-flight when the signal arrives, let it finish or time out per existing rules.
Related
Estimate
~half a day. CLI signal handling is the biggest unknown (need to thread the flag through the worker pool); the UI + API changes are small.
Objective
Add a "Stop run" affordance — both a UI button on
/jobs/:runIdand graceful CLI signal handling — so users can interrupt a long-running eval without orphaning the subprocess or losing partial results. Today there is no programmatic stop:process: ChildProcessper run but exposes no DELETE/stop route. Closing the browser tab leaves the CLI subprocess running until it completes naturally.Ctrl+Chard-kills the eval mid-test. The onlychild.kill()calls in the codebase live inside agent providers (claude-cli, codex-cli, pi-cli) terminating their own per-test subprocess on timeout — not the orchestrator handling user interrupt.This pairs naturally with the resume feature shipped in #1220: today the workflow for "I want to bail on this run" is kill the terminal → resume in Studio. With a Stop button it becomes click Stop → click Resume, all without leaving the browser.
Current state — what already works
index.jsonlas tests complete, so any partial state is durable on disk and resumable. The "stop" feature does not need to invent persistence — only graceful termination.eval-runner.tsalready retains aprocess: ChildProcessreference per Studio-launched run, so the server canprocess.kill('SIGTERM')once an endpoint is added.Proposed changes
1. CLI signal handler
Register
SIGINT/SIGTERMhandlers at the top ofapps/cli/src/commands/eval/run-eval.ts(or wherever the orchestrator entry point lives):stopRequestedflag, allow in-flight tests to finish (they're already isolated), then exit cleanly with a non-zero code distinguishable from "crashed."Stop requested — waiting for N in-flight test(s) to finish (Ctrl+C again to force-quit).2. Studio API:
DELETE /api/eval/run/:idAdd a route that:
{stopped: false}) if the run is already terminal.run.process?.kill('SIGTERM'), setsrun.status = 'stopping', returns202.The existing
child.on('close')handler will flip the status tofailed/finishedwhen the CLI exits.Add benchmark-scoped variant
DELETE /api/benchmarks/:benchmarkId/eval/run/:idmatching the existing pattern.3. UI: "Stop run" button on
/jobs/:runIdIn
apps/studio/src/routes/jobs/$runId.tsx:status === 'starting'or'running'and not in read-only mode.DELETE /api/eval/run/:id, optimistic-flip the status indicator to "Stopping…".4. Tests
apps/cli/test/commands/results/serve.test.ts, add cases for unknown id (404), read-only (403), and a happy-path stop using a fake long-running child.index.jsonlcontains the rows for tests completed before the signal.shouldShowStopButton(status, isReadOnly).Acceptance signals
index.jsonlcontaining all tests completed before the signal.DELETE /api/eval/run/:idexists and is 403-guarded in read-only mode; benchmark-scoped variant works the same./jobs/:runIdwhile running, hidden when terminal, hidden in read-only./runs/:runId, shows the partial run with at least oneexecution_status: okrow and the Resume run button from feat(studio): expose eval resumability — API + Resume action on run detail #1220 visible.main, killing terminal mid-eval is the only way to stop; green = on this branch, the Stop button on/jobs/<id>terminates cleanly and the partial run is resumable in one click.Non-goals
Related
packages/core/src/evaluation/providers/{claude-cli,codex-cli,pi-cli}.tsEstimate
~half a day. CLI signal handling is the biggest unknown (need to thread the flag through the worker pool); the UI + API changes are small.