Skip to content

Create runner and job API#659

Draft
epompeii wants to merge 34 commits intodevelfrom
u/ep/jobs
Draft

Create runner and job API#659
epompeii wants to merge 34 commits intodevelfrom
u/ep/jobs

Conversation

@epompeii
Copy link
Member

@epompeii epompeii commented Feb 5, 2026

Create a bare metal runner and job API

@epompeii epompeii self-assigned this Feb 5, 2026
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🐰 Bencher Report

Branchu/ep/jobs
Testbedubuntu-22.04
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
microseconds (µs)
(Result Δ%)
Upper Boundary
microseconds (µs)
(Limit %)
Adapter::Json📈 view plot
🚷 view threshold
3.76 µs
(+10.22%)Baseline: 3.41 µs
4.56 µs
(82.44%)
Adapter::Magic (JSON)📈 view plot
🚷 view threshold
3.68 µs
(+7.58%)Baseline: 3.42 µs
4.51 µs
(81.50%)
Adapter::Magic (Rust)📈 view plot
🚷 view threshold
25.62 µs
(-0.49%)Baseline: 25.75 µs
29.53 µs
(86.78%)
Adapter::Rust📈 view plot
🚷 view threshold
2.96 µs
(+6.06%)Baseline: 2.79 µs
3.14 µs
(94.37%)
Adapter::RustBench📈 view plot
🚷 view threshold
2.96 µs
(+5.77%)Baseline: 2.80 µs
3.14 µs
(94.29%)
head_version_insert/batch/10📈 view plot
🚷 view threshold
95.13 µs
(+8.10%)Baseline: 88.00 µs
104.21 µs
(91.29%)
head_version_insert/batch/100📈 view plot
🚷 view threshold
235.11 µs
(+4.13%)Baseline: 225.78 µs
248.62 µs
(94.57%)
head_version_insert/batch/255📈 view plot
🚷 view threshold
461.92 µs
(+2.49%)Baseline: 450.69 µs
492.70 µs
(93.75%)
head_version_insert/batch/50📈 view plot
🚷 view threshold
157.26 µs
(+5.97%)Baseline: 148.40 µs
170.27 µs
(92.36%)
threshold_query/join/10📈 view plot
🚷 view threshold
139.25 µs
(+5.07%)Baseline: 132.54 µs
155.41 µs
(89.60%)
threshold_query/join/20📈 view plot
🚷 view threshold
152.80 µs
(+3.82%)Baseline: 147.18 µs
167.49 µs
(91.23%)
threshold_query/join/5📈 view plot
🚷 view threshold
131.77 µs
(+5.14%)Baseline: 125.33 µs
148.86 µs
(88.52%)
threshold_query/join/50📈 view plot
🚷 view threshold
192.91 µs
(+2.40%)Baseline: 188.39 µs
212.40 µs
(90.83%)
🐰 View full continuous benchmarking report in Bencher

@epompeii epompeii added the claude label Feb 5, 2026
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🤖 Claude Code Review

PR: #659
Base: devel
Head: u/ep/jobs
Commit: bb2ba5bdd642026d63c0497d997edd1f52f9c3b0


I have now reviewed a substantial portion of this PR. Let me compile the review.

Pull Request Review: Runner System Implementation

Overview

This PR implements a bare metal benchmark runner system for Bencher Plus. It's a significant feature addition (~10,000 lines) that introduces:

  • Server-scoped runners that can execute jobs from any project
  • Job queue with tier-based priority scheduling
  • WebSocket-based heartbeat and status updates
  • Token-based runner authentication

Architecture Assessment

The design follows well-established patterns:

  • Pull-based architecture where runners claim jobs from the API
  • State machine for job lifecycle: pending → claimed → running → completed/failed/canceled
  • SHA-256 hashed tokens for runner authentication (tokens never stored in plain text)
  • Per-job timeout tasks for stale job recovery instead of periodic reapers

Positive Observations

  1. Security: Runner tokens use a proper prefix (bencher_runner_) with SHA-256 hashing. Tokens are shown once at creation.

  2. Concurrency Control: The tier-based scheduling with NOT EXISTS subqueries for concurrency limits is well-implemented.

  3. Testing: Comprehensive test coverage including:

    • WebSocket channel tests
    • Job lifecycle tests
    • Priority scheduling tests
    • Concurrency limit tests
  4. Feature Gating: Properly gated behind the plus feature flag.

  5. Documentation: Excellent RUNNER_DESIGN.md and updated CLAUDE.md.

Issues & Concerns

High Priority

  1. Missing newline at end of migration file (lib/bencher_schema/migrations/2026-02-02-120000_runner/up.sql:62):

    -CREATE INDEX index_job_report_id ON job(report_id);
    \ No newline at end of file
    +CREATE INDEX index_job_report_id ON job(report_id);
  2. TOCTOU race in claim query (plus/api_runners/src/jobs.rs:256-294): The code correctly acquires the write lock before the SELECT, but the WHERE clause on the UPDATE only checks status.eq(JobStatus::Pending). If SQLite serialization isn't guaranteed, this could be strengthened. However, given SQLite's serialized writes and the lock held for the entire operation, this is likely fine.

  3. Output storage not implemented (plus/api_runners/src/channel.rs:399):

    // TODO: Store output somewhere (job table or separate results table)
    drop(output);

    The design doc mentions this is an open question. Benchmark output is silently dropped.

Medium Priority

  1. Billing logic not implemented (plus/api_runners/src/channel.rs:352):

    // TODO: Billing logic - check elapsed minutes and bill to Stripe
  2. OTEL counter missing (plus/bencher_otel/src/api_meter.rs): RunnerRequestMax counter is referenced but only the variants are defined - need to verify this integrates correctly.

  3. Reconnection to canceled job (plus/api_runners/src/channel.rs:89-94): The channel allows reconnection for Claimed | Running status, but if a job transitions to Canceled externally before reconnect, the runner won't be able to open a channel to acknowledge. This is documented in test test_channel_cancelled_acknowledgment - the acknowledgment path via WebSocket is not available for already-canceled jobs.

Low Priority

  1. Magic numbers could be constants (plus/api_runners/src/jobs.rs:117-119):

    const PRIORITY_UNLIMITED: i32 = 200;
    const PRIORITY_FREE: i32 = 100;

    These are good, but the design doc mentions 0/100/200/300 tiers. Consider defining all tier thresholds.

  2. Error message inconsistency: Some error messages use periods, others don't. E.g., "Missing or invalid Authorization header" vs "Invalid runner token format".

Compliance with CLAUDE.md

✅ Uses #[expect(...)] instead of #[allow(...)]
✅ No select! macros - uses futures-concurrency
✅ Dependencies in workspace Cargo.toml
✅ Types in bencher_json/bencher_valid
✅ Migration properly structured
✅ Both Dockerfiles updated
✅ Plus feature gating applied correctly

Missing Items from Design Doc

The design doc has several unchecked TODO items:

  • Result storage
  • Output storage from WebSocket messages
  • Retry policy
  • OCI auth for runners

These are acknowledged as "Open Questions" and appropriate for a phased implementation.

Recommendations

  1. Fix the missing newline in the migration file
  2. Add a follow-up issue for implementing output storage before the runner feature is usable end-to-end
  3. Consider adding billing metrics even if Stripe integration comes later
  4. Run cargo check --no-default-features to ensure build works without Plus features
  5. Run cargo gen-types to verify OpenAPI spec is up to date (appears to be included based on openapi.json changes)

Summary

This is a well-architected implementation of a complex feature. The code quality is high, with good separation of concerns, comprehensive testing, and proper security practices. The main gaps (output storage, billing) are acknowledged TODOs for future phases.

Verdict: Approve with minor fix for the missing newline in the migration file.


Model: claude-opus-4-5-20251101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant