Skip to content

Add Bencher Bare Metal Runner#655

Draft
epompeii wants to merge 58 commits intou/ep/jobsfrom
u/ep/runner
Draft

Add Bencher Bare Metal Runner#655
epompeii wants to merge 58 commits intou/ep/jobsfrom
u/ep/runner

Conversation

@epompeii
Copy link
Member

@epompeii epompeii commented Feb 1, 2026

This changeset adds the Bencher Bare Metal Runner (runner). It uses a custom pico-VMM based on Firecracker to run portably on any 64 bit Linux machine (x86 and ARM) with virtually no overhead while still securely isolating the workload.

@epompeii epompeii self-assigned this Feb 1, 2026
@epompeii epompeii added the claude label Feb 1, 2026
@github-actions
Copy link

github-actions bot commented Feb 1, 2026

🤖 Claude Code Review

PR: #655
Base: devel
Head: u/ep/runner
Commit: 56919678651ad2f047ec6bdd52e03c82dcdcff0b


Based on my comprehensive review of this PR, here is my assessment:


Pull Request Review: Runner System Implementation

Overview

This PR adds a comprehensive bare metal benchmark runner system (~24,000 lines added across 130 files) to Bencher. It includes:

  • Server-side runner management API (plus/api_runners/)
  • Runner daemon with Firecracker VM isolation (plus/bencher_runner/)
  • OCI image pulling and unpacking (plus/bencher_oci/)
  • Database schema for runners and jobs
  • WebSocket channel for job heartbeats and status updates

Code Quality & Best Practices

Strengths ✓

  • Good separation of concerns: API endpoints, database models, and JSON types are properly separated
  • Comprehensive test coverage: Integration tests in plus/api_runners/tests/ cover happy paths and edge cases
  • Type safety: Uses typed IDs (RunnerId, JobId), validated types (ImageDigest, SourceIp), and proper state enums
  • Feature gating: All Plus-only code is behind the plus feature flag as required
  • Documentation: Design documented in RUNNER_DESIGN.md and CLAUDE.md updated with runner context

Concerns ⚠

  1. image_digest.rs:87 - Validation accepts uppercase hex chars but tests suggest lowercase expected:

    hex.len() == 64 && hex.chars().all(|c| c.is_ascii_hexdigit())

    is_ascii_hexdigit() accepts A-F but OCI spec requires lowercase. The test at line 100 uses uppercase which will pass but may cause issues with registries expecting lowercase.

  2. channel.rs:158-164 - Heartbeat timeout spawns duplicate tasks without cancellation:

    spawn_heartbeat_timeout(log, remaining, connection_clone, job_id);

    Consider adding a cancellation token to prevent accumulating timeout tasks on reconnection.

  3. Missing newline at EOF in lib/bencher_schema/migrations/2026-02-02-120000_runner/up.sql

Security Concerns

Good Security Practices ✓

  • Token hashing: Runner tokens are SHA-256 hashed before storage (runners.rs:297)
  • Token format validation: Prefix check (bencher_runner_) prevents use of other token types
  • Authorization verification: Jobs can only be updated by the runner that claimed them (jobs.rs:271-274)
  • Rate limiting: Per-runner rate limiting implemented (runner.rs)
  • Digest verification: OCI blob digests are verified after download (unpack.rs)

Concerns ⚠

  1. layer.rs:428-432 - Tar extraction with set_overwrite(true) could allow path traversal if tar contains ../ paths. While the OCI spec should prevent this, consider using Archive::unpack_in() with set_preserve_mtime(false) for safety, or validate paths don't escape target_dir.

  2. cgroup.rs:230-244 - Block device discovery exposes all devices to I/O limits. This is informational but could leak information about host configuration to logs.

  3. firecracker/client.rs - Unix socket client doesn't validate response integrity. Consider adding timeouts and response size limits.

Performance Implications

Good ✓

  • Efficient job claiming: Uses indexed query with priority DESC, created ASC ordering
  • Long-polling: Claim endpoint uses 1-second polling interval (reasonable)
  • Cgroup v2: Proper use of memory OOM group kill and swap disable for accurate benchmarks

Concerns ⚠

  1. jobs.rs:102-108 - Long-poll loop acquires DB connection on each iteration. Consider using a single connection lease with periodic release.

  2. No connection pooling for the runner API client (daemon/api_client.rs) - each request creates new connections.

Database Schema

Good ✓

  • Proper foreign key relationships with ON DELETE CASCADE/RESTRICT
  • Partial index for pending jobs (WHERE status = 0)
  • Indexes for concurrency limit checks

Concern ⚠

  • Missing index on job.runner_id for efficient job lookups by runner

Compliance with CLAUDE.md

✓ All dependency versions in workspace Cargo.toml
✓ Plus features properly gated
✓ Database macros used correctly (auth_conn!, write_conn!)
cargo gen-types should be run (OpenAPI updated)
⚠ Verify cargo check --no-default-features passes
⚠ Missing #[expect(...)] explanations in some places (e.g., jobs.rs:25-26 uses diesel::alias!)

Recommendations

  1. High Priority: Validate tar paths in layer.rs don't escape target directory
  2. Medium Priority: Add cancellation tokens to heartbeat timeout tasks
  3. Medium Priority: Add index on job.runner_id column
  4. Low Priority: Normalize hex digests to lowercase in ImageDigest::from_str
  5. Low Priority: Add newline at end of migration SQL file

Model: claude-opus-4-5-20251101

@github-actions
Copy link

github-actions bot commented Feb 1, 2026

🐰 Bencher Report

Branchu/ep/runner
Testbedubuntu-22.04
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
microseconds (µs)
(Result Δ%)
Upper Boundary
microseconds (µs)
(Limit %)
Adapter::Json📈 view plot
🚷 view threshold
3.85 µs
(+12.72%)Baseline: 3.41 µs
4.57 µs
(84.19%)
Adapter::Magic (JSON)📈 view plot
🚷 view threshold
3.69 µs
(+8.09%)Baseline: 3.42 µs
4.51 µs
(81.87%)
Adapter::Magic (Rust)📈 view plot
🚷 view threshold
25.48 µs
(-1.02%)Baseline: 25.75 µs
29.53 µs
(86.30%)
Adapter::Rust📈 view plot
🚷 view threshold
2.87 µs
(+2.82%)Baseline: 2.79 µs
3.13 µs
(91.71%)
Adapter::RustBench📈 view plot
🚷 view threshold
2.87 µs
(+2.63%)Baseline: 2.80 µs
3.13 µs
(91.70%)
head_version_insert/batch/10📈 view plot
🚷 view threshold
90.82 µs
(+3.71%)Baseline: 87.57 µs
97.61 µs
(93.04%)
head_version_insert/batch/100📈 view plot
🚷 view threshold
229.93 µs
(+2.07%)Baseline: 225.26 µs
241.38 µs
(95.26%)
head_version_insert/batch/255📈 view plot
🚷 view threshold
447.21 µs
(-0.45%)Baseline: 449.22 µs
484.93 µs
(92.22%)
head_version_insert/batch/50📈 view plot
🚷 view threshold
149.65 µs
(+1.36%)Baseline: 147.64 µs
161.04 µs
(92.93%)
threshold_query/join/10📈 view plot
🚷 view threshold
131.90 µs
(+0.08%)Baseline: 131.80 µs
150.30 µs
(87.76%)
threshold_query/join/20📈 view plot
🚷 view threshold
147.76 µs
(+0.74%)Baseline: 146.68 µs
163.72 µs
(90.25%)
threshold_query/join/5📈 view plot
🚷 view threshold
125.51 µs
(+0.64%)Baseline: 124.71 µs
144.45 µs
(86.89%)
threshold_query/join/50📈 view plot
🚷 view threshold
187.85 µs
(-0.02%)Baseline: 187.88 µs
210.11 µs
(89.40%)
🐰 View full continuous benchmarking report in Bencher

@epompeii epompeii changed the base branch from devel to u/ep/jobs February 6, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant