Add Bencher Bare Metal Runner by epompeii · Pull Request #655 · bencherdev/bencher

epompeii · 2026-02-01T16:18:13Z

This changeset adds the Bencher Bare Metal Runner (runner). It uses a custom pico-VMM based on Firecracker to run portably on any 64 bit Linux machine (x86 and ARM) with virtually no overhead while still securely isolating the workload.

github-actions · 2026-02-01T16:26:37Z

🤖 Claude Code Review

PR: #655
Base: devel
Head: u/ep/runner
Commit: 56919678651ad2f047ec6bdd52e03c82dcdcff0b

Based on my comprehensive review of this PR, here is my assessment:

Pull Request Review: Runner System Implementation

Overview

This PR adds a comprehensive bare metal benchmark runner system (~24,000 lines added across 130 files) to Bencher. It includes:

Server-side runner management API (plus/api_runners/)
Runner daemon with Firecracker VM isolation (plus/bencher_runner/)
OCI image pulling and unpacking (plus/bencher_oci/)
Database schema for runners and jobs
WebSocket channel for job heartbeats and status updates

Code Quality & Best Practices

Strengths ✓

Good separation of concerns: API endpoints, database models, and JSON types are properly separated
Comprehensive test coverage: Integration tests in plus/api_runners/tests/ cover happy paths and edge cases
Type safety: Uses typed IDs (RunnerId, JobId), validated types (ImageDigest, SourceIp), and proper state enums
Feature gating: All Plus-only code is behind the plus feature flag as required
Documentation: Design documented in RUNNER_DESIGN.md and CLAUDE.md updated with runner context

Concerns ⚠

image_digest.rs:87 - Validation accepts uppercase hex chars but tests suggest lowercase expected:
```
hex.len() == 64 && hex.chars().all(|c| c.is_ascii_hexdigit())
```
is_ascii_hexdigit() accepts A-F but OCI spec requires lowercase. The test at line 100 uses uppercase which will pass but may cause issues with registries expecting lowercase.
channel.rs:158-164 - Heartbeat timeout spawns duplicate tasks without cancellation:
```
spawn_heartbeat_timeout(log, remaining, connection_clone, job_id);
```
Consider adding a cancellation token to prevent accumulating timeout tasks on reconnection.
Missing newline at EOF in lib/bencher_schema/migrations/2026-02-02-120000_runner/up.sql

Security Concerns

Good Security Practices ✓

Token hashing: Runner tokens are SHA-256 hashed before storage (runners.rs:297)
Token format validation: Prefix check (bencher_runner_) prevents use of other token types
Authorization verification: Jobs can only be updated by the runner that claimed them (jobs.rs:271-274)
Rate limiting: Per-runner rate limiting implemented (runner.rs)
Digest verification: OCI blob digests are verified after download (unpack.rs)

Concerns ⚠

layer.rs:428-432 - Tar extraction with set_overwrite(true) could allow path traversal if tar contains ../ paths. While the OCI spec should prevent this, consider using Archive::unpack_in() with set_preserve_mtime(false) for safety, or validate paths don't escape target_dir.
cgroup.rs:230-244 - Block device discovery exposes all devices to I/O limits. This is informational but could leak information about host configuration to logs.
firecracker/client.rs - Unix socket client doesn't validate response integrity. Consider adding timeouts and response size limits.

Performance Implications

Good ✓

Efficient job claiming: Uses indexed query with priority DESC, created ASC ordering
Long-polling: Claim endpoint uses 1-second polling interval (reasonable)
Cgroup v2: Proper use of memory OOM group kill and swap disable for accurate benchmarks

Concerns ⚠

jobs.rs:102-108 - Long-poll loop acquires DB connection on each iteration. Consider using a single connection lease with periodic release.
No connection pooling for the runner API client (daemon/api_client.rs) - each request creates new connections.

Database Schema

Good ✓

Proper foreign key relationships with ON DELETE CASCADE/RESTRICT
Partial index for pending jobs (WHERE status = 0)
Indexes for concurrency limit checks

Concern ⚠

Missing index on job.runner_id for efficient job lookups by runner

Compliance with CLAUDE.md

✓ All dependency versions in workspace Cargo.toml
✓ Plus features properly gated
✓ Database macros used correctly (auth_conn!, write_conn!)
✓ cargo gen-types should be run (OpenAPI updated)
⚠ Verify cargo check --no-default-features passes
⚠ Missing #[expect(...)] explanations in some places (e.g., jobs.rs:25-26 uses diesel::alias!)

Recommendations

High Priority: Validate tar paths in layer.rs don't escape target directory
Medium Priority: Add cancellation tokens to heartbeat timeout tasks
Medium Priority: Add index on job.runner_id column
Low Priority: Normalize hex digests to lowercase in ImageDigest::from_str
Low Priority: Add newline at end of migration SQL file

_{Model: claude-opus-4-5-20251101}

github-actions · 2026-02-01T16:31:02Z

Bencher Report

Branch	u/ep/runner
Testbed	ubuntu-22.04

Click to view all benchmark results

Benchmark	Latency	Benchmark Result microseconds (µs) (Result Δ%)	Upper Boundary microseconds (µs) (Limit %)
Adapter::Json	📈 view plot 🚷 view threshold	3.85 µs (+12.72%) Baseline: 3.41 µs	4.57 µs (84.19%)
Adapter::Magic (JSON)	📈 view plot 🚷 view threshold	3.69 µs (+8.09%) Baseline: 3.42 µs	4.51 µs (81.87%)
Adapter::Magic (Rust)	📈 view plot 🚷 view threshold	25.48 µs (-1.02%) Baseline: 25.75 µs	29.53 µs (86.30%)
Adapter::Rust	📈 view plot 🚷 view threshold	2.87 µs (+2.82%) Baseline: 2.79 µs	3.13 µs (91.71%)
Adapter::RustBench	📈 view plot 🚷 view threshold	2.87 µs (+2.63%) Baseline: 2.80 µs	3.13 µs (91.70%)
head_version_insert/batch/10	📈 view plot 🚷 view threshold	90.82 µs (+3.71%) Baseline: 87.57 µs	97.61 µs (93.04%)
head_version_insert/batch/100	📈 view plot 🚷 view threshold	229.93 µs (+2.07%) Baseline: 225.26 µs	241.38 µs (95.26%)
head_version_insert/batch/255	📈 view plot 🚷 view threshold	447.21 µs (-0.45%) Baseline: 449.22 µs	484.93 µs (92.22%)
head_version_insert/batch/50	📈 view plot 🚷 view threshold	149.65 µs (+1.36%) Baseline: 147.64 µs	161.04 µs (92.93%)
threshold_query/join/10	📈 view plot 🚷 view threshold	131.90 µs (+0.08%) Baseline: 131.80 µs	150.30 µs (87.76%)
threshold_query/join/20	📈 view plot 🚷 view threshold	147.76 µs (+0.74%) Baseline: 146.68 µs	163.72 µs (90.25%)
threshold_query/join/5	📈 view plot 🚷 view threshold	125.51 µs (+0.64%) Baseline: 124.71 µs	144.45 µs (86.89%)
threshold_query/join/50	📈 view plot 🚷 view threshold	187.85 µs (-0.02%) Baseline: 187.88 µs	210.11 µs (89.40%)

🐰 View full continuous benchmarking report in Bencher

epompeii self-assigned this Feb 1, 2026

epompeii added the claude label Feb 1, 2026

epompeii force-pushed the u/ep/runner branch from 11c8362 to d6a8f80 Compare February 1, 2026 16:25

epompeii force-pushed the devel branch from fd81639 to cecd603 Compare February 2, 2026 07:07

epompeii added 24 commits February 6, 2026 06:48

CLAUDE.md

e336b72

bencher_runner

0811ceb

test_runner

32809c2

entrypoint_cmd

a94699b

host_cid

c066bb7

arm

78cb79f

vsock

5bc2f8f

vsock_tests

6ab523e

oci_client

2854c48

oci_image

082912f

virtio_vsock

dbf8134

bencher_mock

d3cca46

self_review

b0751a5

kernel

394fe4b

bencher_vmm

b5635f2

github_actions

f7df8b1

split_vsock

d112870

lima

697dc38

runner

ee728c6

bencher_init

958f792

bencher_runner_with_init

1fc3ec1

cleanup_stale

803ad6a

tmp_bencher

e097183

kernel_hashes

969403a

epompeii added 24 commits February 6, 2026 06:50

checkpoint

6df5cde

checkpoint

0caa2ef

checkpoint

da16ebc

checkpoint

a91e458

checkpoint

7174620

checkpoint

071e963

checkpoint

a44aef2

plan

de3eff8

ext4

f9dfca9

tests

60aaae8

rw

0a5b4b8

test

fc5a52b

20_20

af6679c

27/27

f1dbea8

firecracker

9821979

rm_bencher_vmm

19a5c37

single_bin

a67bb5f

tuning

fc0aa56

tests

0344dad

runner_design

1a5fb67

runner_api

89689f8

runner_design

ef42aa5

updated_runner

1d6e8a2

cpu_layout

5691967

epompeii force-pushed the u/ep/runner branch from d6a8f80 to 5691967 Compare February 6, 2026 06:57

epompeii changed the base branch from devel to u/ep/jobs February 6, 2026 06:57

epompeii added 4 commits February 7, 2026 13:34

review

0075c6d

claude

1ac926d

claude_2

ae974b3

claude_3

865b7e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Bencher Bare Metal Runner#655

Add Bencher Bare Metal Runner#655
epompeii wants to merge 58 commits intou/ep/jobsfrom
u/ep/runner

epompeii commented Feb 1, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 1, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

epompeii commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Claude Code Review

Pull Request Review: Runner System Implementation

Overview

Code Quality & Best Practices

Strengths ✓

Concerns ⚠

Security Concerns

Good Security Practices ✓

Concerns ⚠

Performance Implications

Good ✓

Concerns ⚠

Database Schema

Good ✓

Concern ⚠

Compliance with CLAUDE.md

Recommendations

Uh oh!

github-actions bot commented Feb 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

epompeii commented Feb 1, 2026 •

edited

Loading

github-actions bot commented Feb 1, 2026 •

edited

Loading

github-actions bot commented Feb 1, 2026 •

edited

Loading