Create runner and job API by epompeii · Pull Request #659 · bencherdev/bencher

epompeii · 2026-02-05T05:57:50Z

Create a bare metal runner and job API

github-actions · 2026-02-05T06:14:51Z

Bencher Report

Branch	u/ep/jobs
Testbed	ubuntu-22.04

Click to view all benchmark results

Benchmark	Latency	Benchmark Result microseconds (µs) (Result Δ%)	Upper Boundary microseconds (µs) (Limit %)
Adapter::Json	📈 view plot 🚷 view threshold	3.76 µs (+10.22%) Baseline: 3.41 µs	4.56 µs (82.44%)
Adapter::Magic (JSON)	📈 view plot 🚷 view threshold	3.68 µs (+7.58%) Baseline: 3.42 µs	4.51 µs (81.50%)
Adapter::Magic (Rust)	📈 view plot 🚷 view threshold	25.62 µs (-0.49%) Baseline: 25.75 µs	29.53 µs (86.78%)
Adapter::Rust	📈 view plot 🚷 view threshold	2.96 µs (+6.06%) Baseline: 2.79 µs	3.14 µs (94.37%)
Adapter::RustBench	📈 view plot 🚷 view threshold	2.96 µs (+5.77%) Baseline: 2.80 µs	3.14 µs (94.29%)
head_version_insert/batch/10	📈 view plot 🚷 view threshold	95.13 µs (+8.10%) Baseline: 88.00 µs	104.21 µs (91.29%)
head_version_insert/batch/100	📈 view plot 🚷 view threshold	235.11 µs (+4.13%) Baseline: 225.78 µs	248.62 µs (94.57%)
head_version_insert/batch/255	📈 view plot 🚷 view threshold	461.92 µs (+2.49%) Baseline: 450.69 µs	492.70 µs (93.75%)
head_version_insert/batch/50	📈 view plot 🚷 view threshold	157.26 µs (+5.97%) Baseline: 148.40 µs	170.27 µs (92.36%)
threshold_query/join/10	📈 view plot 🚷 view threshold	139.25 µs (+5.07%) Baseline: 132.54 µs	155.41 µs (89.60%)
threshold_query/join/20	📈 view plot 🚷 view threshold	152.80 µs (+3.82%) Baseline: 147.18 µs	167.49 µs (91.23%)
threshold_query/join/5	📈 view plot 🚷 view threshold	131.77 µs (+5.14%) Baseline: 125.33 µs	148.86 µs (88.52%)
threshold_query/join/50	📈 view plot 🚷 view threshold	192.91 µs (+2.40%) Baseline: 188.39 µs	212.40 µs (90.83%)

🐰 View full continuous benchmarking report in Bencher

github-actions · 2026-02-05T06:45:30Z

🤖 Claude Code Review

PR: #659
Base: devel
Head: u/ep/jobs
Commit: bb2ba5bdd642026d63c0497d997edd1f52f9c3b0

I have now reviewed a substantial portion of this PR. Let me compile the review.

Pull Request Review: Runner System Implementation

Overview

This PR implements a bare metal benchmark runner system for Bencher Plus. It's a significant feature addition (~10,000 lines) that introduces:

Server-scoped runners that can execute jobs from any project
Job queue with tier-based priority scheduling
WebSocket-based heartbeat and status updates
Token-based runner authentication

Architecture Assessment

The design follows well-established patterns:

Pull-based architecture where runners claim jobs from the API
State machine for job lifecycle: pending → claimed → running → completed/failed/canceled
SHA-256 hashed tokens for runner authentication (tokens never stored in plain text)
Per-job timeout tasks for stale job recovery instead of periodic reapers

Positive Observations

Security: Runner tokens use a proper prefix (bencher_runner_) with SHA-256 hashing. Tokens are shown once at creation.
Concurrency Control: The tier-based scheduling with NOT EXISTS subqueries for concurrency limits is well-implemented.
Testing: Comprehensive test coverage including:
- WebSocket channel tests
- Job lifecycle tests
- Priority scheduling tests
- Concurrency limit tests
Feature Gating: Properly gated behind the plus feature flag.
Documentation: Excellent RUNNER_DESIGN.md and updated CLAUDE.md.

Issues & Concerns

High Priority

Missing newline at end of migration file (lib/bencher_schema/migrations/2026-02-02-120000_runner/up.sql:62):

-CREATE INDEX index_job_report_id ON job(report_id);
\ No newline at end of file
+CREATE INDEX index_job_report_id ON job(report_id);

TOCTOU race in claim query (plus/api_runners/src/jobs.rs:256-294): The code correctly acquires the write lock before the SELECT, but the WHERE clause on the UPDATE only checks status.eq(JobStatus::Pending). If SQLite serialization isn't guaranteed, this could be strengthened. However, given SQLite's serialized writes and the lock held for the entire operation, this is likely fine.
Output storage not implemented (plus/api_runners/src/channel.rs:399):
```
// TODO: Store output somewhere (job table or separate results table)
drop(output);
```
The design doc mentions this is an open question. Benchmark output is silently dropped.

Medium Priority

Billing logic not implemented (plus/api_runners/src/channel.rs:352):
```
// TODO: Billing logic - check elapsed minutes and bill to Stripe
```
OTEL counter missing (plus/bencher_otel/src/api_meter.rs): RunnerRequestMax counter is referenced but only the variants are defined - need to verify this integrates correctly.
Reconnection to canceled job (plus/api_runners/src/channel.rs:89-94): The channel allows reconnection for Claimed | Running status, but if a job transitions to Canceled externally before reconnect, the runner won't be able to open a channel to acknowledge. This is documented in test test_channel_cancelled_acknowledgment - the acknowledgment path via WebSocket is not available for already-canceled jobs.

Low Priority

Magic numbers could be constants (plus/api_runners/src/jobs.rs:117-119):
```
const PRIORITY_UNLIMITED: i32 = 200;
const PRIORITY_FREE: i32 = 100;
```
These are good, but the design doc mentions 0/100/200/300 tiers. Consider defining all tier thresholds.
Error message inconsistency: Some error messages use periods, others don't. E.g., "Missing or invalid Authorization header" vs "Invalid runner token format".

Compliance with CLAUDE.md

✅ Uses #[expect(...)] instead of #[allow(...)]
✅ No select! macros - uses futures-concurrency
✅ Dependencies in workspace Cargo.toml
✅ Types in bencher_json/bencher_valid
✅ Migration properly structured
✅ Both Dockerfiles updated
✅ Plus feature gating applied correctly

Missing Items from Design Doc

The design doc has several unchecked TODO items:

Result storage
Output storage from WebSocket messages
Retry policy
OCI auth for runners

These are acknowledged as "Open Questions" and appropriate for a phased implementation.

Recommendations

Fix the missing newline in the migration file
Add a follow-up issue for implementing output storage before the runner feature is usable end-to-end
Consider adding billing metrics even if Stripe integration comes later
Run cargo check --no-default-features to ensure build works without Plus features
Run cargo gen-types to verify OpenAPI spec is up to date (appears to be included based on openapi.json changes)

Summary

This is a well-architected implementation of a complex feature. The code quality is high, with good separation of concerns, comprehensive testing, and proper security practices. The main gaps (output storage, billing) are acknowledged TODOs for future phases.

Verdict: Approve with minor fix for the missing newline in the migration file.

_{Model: claude-opus-4-5-20251101}

epompeii self-assigned this Feb 5, 2026

epompeii added the claude label Feb 5, 2026

epompeii added 26 commits February 6, 2026 06:40

runner_design

694689b

runner_migration

b0f54cd

model

d71c10f

runners

f45e798

job_update

850442f

api_runners

cb253e2

otel

5325c85

clippy

fce62b2

runner_tokern

c3116db

jobs_api

ca6e110

gen_types

0512868

from_request

ed70265

claude_review

54c80cd

long_poll

1de3a0c

claude_review

90bdf4e

dockerfile

af303d0

rate_limiting

4931008

ws

b421887

tests

2e5a4e9

claude_review

074e015

claude_md_update

78139a0

claude_review

743e961

rate_limiting

a7a3c76

job_cleanup

35a6dcc

json_job

4babb8f

event_cancelled

e1d9f66

epompeii added 4 commits February 6, 2026 06:40

job_spec

1f45446

job_spec_impl

ddca893

concurrenct_plan

cf8271b

concurrent_limit_impl

1dc6282

epompeii force-pushed the u/ep/jobs branch from 0a0dffc to 1dc6282 Compare February 6, 2026 06:41

epompeii added 4 commits February 7, 2026 13:27

review

f025550

claude

088a0a4

claude_2

a482e5a

claude_3

bb2ba5b

epompeii force-pushed the u/ep/jobs branch from eecd184 to bb2ba5b Compare February 7, 2026 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create runner and job API#659

Create runner and job API#659
epompeii wants to merge 34 commits intodevelfrom
u/ep/jobs

epompeii commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

epompeii commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

Uh oh!

github-actions bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Claude Code Review

Pull Request Review: Runner System Implementation

Overview

Architecture Assessment

Positive Observations

Issues & Concerns

High Priority

Medium Priority

Low Priority

Compliance with CLAUDE.md

Missing Items from Design Doc

Recommendations

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 5, 2026 •

edited

Loading

github-actions bot commented Feb 5, 2026 •

edited

Loading