Skip to content

feat(studio): add Stop run button + graceful CLI interrupt — pairs with eval resume #1222

@christso

Description

@christso

Objective

Add a "Stop run" affordance — both a UI button on /jobs/:runId and graceful CLI signal handling — so users can interrupt a long-running eval without orphaning the subprocess or losing partial results. Today there is no programmatic stop:

  • Studio: the launch endpoint stores process: ChildProcess per run but exposes no DELETE/stop route. Closing the browser tab leaves the CLI subprocess running until it completes naturally.
  • CLI: no top-level SIGINT/SIGTERM handler. Ctrl+C hard-kills the eval mid-test. The only child.kill() calls in the codebase live inside agent providers (claude-cli, codex-cli, pi-cli) terminating their own per-test subprocess on timeout — not the orchestrator handling user interrupt.

This pairs naturally with the resume feature shipped in #1220: today the workflow for "I want to bail on this run" is kill the terminal → resume in Studio. With a Stop button it becomes click Stop → click Resume, all without leaving the browser.

Current state — what already works

  • Per-test results are flushed row-by-row into index.jsonl as tests complete, so any partial state is durable on disk and resumable. The "stop" feature does not need to invent persistence — only graceful termination.
  • eval-runner.ts already retains a process: ChildProcess reference per Studio-launched run, so the server can process.kill('SIGTERM') once an endpoint is added.

Proposed changes

1. CLI signal handler

Register SIGINT / SIGTERM handlers at the top of apps/cli/src/commands/eval/run-eval.ts (or wherever the orchestrator entry point lives):

  • On first signal: set a stopRequested flag, allow in-flight tests to finish (they're already isolated), then exit cleanly with a non-zero code distinguishable from "crashed."
  • On second signal: hard exit (so users can still escape if a test is hung).
  • Print a concise message: Stop requested — waiting for N in-flight test(s) to finish (Ctrl+C again to force-quit).

2. Studio API: DELETE /api/eval/run/:id

Add a route that:

  • 404s if the run id is unknown.
  • 403s in read-only mode (matches the existing guard on POST).
  • 409s (or 200 with {stopped: false}) if the run is already terminal.
  • Otherwise calls run.process?.kill('SIGTERM'), sets run.status = 'stopping', returns 202.

The existing child.on('close') handler will flip the status to failed/finished when the CLI exits.

Add benchmark-scoped variant DELETE /api/benchmarks/:benchmarkId/eval/run/:id matching the existing pattern.

3. UI: "Stop run" button on /jobs/:runId

In apps/studio/src/routes/jobs/$runId.tsx:

  • Render a destructive-style button (red outline) when status === 'starting' or 'running' and not in read-only mode.
  • On click: DELETE /api/eval/run/:id, optimistic-flip the status indicator to "Stopping…".
  • After the run hits a terminal state, the existing UI already updates correctly.
  • Disable in read-only mode (UI-level, the API also 403s).

4. Tests

  • Server: in apps/cli/test/commands/results/serve.test.ts, add cases for unknown id (404), read-only (403), and a happy-path stop using a fake long-running child.
  • CLI: a small test that sends SIGINT to a multi-test eval run and asserts (a) exit code is the "stopped" sentinel and (b) index.jsonl contains the rows for tests completed before the signal.
  • UI: pure helper for "should the stop button render?" — shouldShowStopButton(status, isReadOnly).

Acceptance signals

  • CLI: SIGINT during a multi-test eval produces a clean exit and a partial index.jsonl containing all tests completed before the signal.
  • CLI: a second SIGINT within 1s force-quits.
  • Server: DELETE /api/eval/run/:id exists and is 403-guarded in read-only mode; benchmark-scoped variant works the same.
  • UI: a "Stop run" button renders on /jobs/:runId while running, hidden when terminal, hidden in read-only.
  • UI: clicking Stop, then navigating to the originating /runs/:runId, shows the partial run with at least one execution_status: ok row and the Resume run button from feat(studio): expose eval resumability — API + Resume action on run detail #1220 visible.
  • Manual red/green: red = on main, killing terminal mid-eval is the only way to stop; green = on this branch, the Stop button on /jobs/<id> terminates cleanly and the partial run is resumable in one click.

Non-goals

  • No "Pause" semantics. Stop fully terminates; resume is the way to continue.
  • No queue management. This is for one running job at a time — multi-job orchestration is out of scope.
  • No SIGINT-to-grader translation. If a grader is mid-flight when the signal arrives, let it finish or time out per existing rules.

Related

Estimate

~half a day. CLI signal handling is the biggest unknown (need to thread the flag through the worker pool); the UI + API changes are small.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions