A Cargo workspace of from-scratch Rust concurrency primitives, built to be read, benchmarked, and reasoned about rather than to replace mature libraries like Rayon or Tokio. Each crate ships its own correctness tests, benchmark suite, and a long-form README walking through the design decisions, the unsafe code, and the memory-ordering reasoning behind it.
The two crates are deliberately not just neighbors: thread-pool-runtime's LockFreeQueuePool baseline is backed directly by lock-free-queue's MpmcQueue, so the workspace also answers a concrete question — does swapping the mutex behind a single-queue thread pool for a lock-free queue actually change anything? (Short answer, with numbers: see thread-pool-runtime's benchmarks.)
| Crate | Package | What it is |
|---|---|---|
lock-free-queue |
lfqueue |
A bounded, lock-free MPMC ring-buffer queue built on atomics and explicit memory ordering (Vyukov's sequence-number design), plus an SPSC fast path, an unbounded Michael-Scott queue (epoch-based reclamation via crossbeam-epoch), and a mutex-protected baseline — benchmarked against each other across thread counts, contention levels, and CAS backoff strategies, and checked under both ThreadSanitizer and Loom's exhaustive interleaving model. |
thread-pool-runtime |
rust-thread-pool-runtime |
A work-stealing task execution runtime: per-priority work-stealing queues (crossbeam-deque), panic-safe JoinHandles, cooperative cancellation, and a dependency-graph (DAG) scheduler built on top. Ships a benchmark suite comparing five execution strategies, including a baseline pool backed directly by lfqueue::MpmcQueue. |
See each crate's own README for architecture diagrams, the reasoning behind every design decision, full benchmark tables, and a testing-strategy writeup (including ThreadSanitizer runs for the queue):
lock-free-queue/README.md— CAS and the ABA problem, why sequence numbers replace a full/empty flag, acquire/release pairing explained operation-by-operation, false sharing and cache-line padding, throughput/latency benchmarks, and the bounded-vs-unbounded / lock-free-vs-wait-free tradeoffs.thread-pool-runtime/README.md— the three-stage work-stealing search (local → injector → peer steal), shutdown/lifetime correctness under nested spawns, panic-safe result delivery, the DAG scheduler's'staticworkaround via channels, and a five-strategy benchmark comparison (single-threaded, thread-per-task, global-queue, work-stealing, lock-free-queue-backed).
.
├── Cargo.toml # workspace manifest
├── lock-free-queue/
│ ├── src/ # MpmcQueue, MutexQueue baseline, cache padding
│ ├── benchmarks/ # throughput / latency / padding benches + run_all.sh
│ └── tests/ # functional + concurrent stress correctness tests
└── thread-pool-runtime/
├── src/ # Runtime, worker, work-stealing, cancellation, DAG scheduler
├── benches/ # criterion comparison across 5 execution strategies
├── examples/ # basic_pool, task_handle, priority_tasks, dependency_graph,
│ # queue_pool_pipeline, parallel_map_reduce
└── tests/ # 25 integration tests across 6 files
git clone https://github.com/czhao-dev/systems-concurrency-runtime.git
cd systems-concurrency-runtime
cargo build --workspace
cargo test --workspace
cargo bench -p rust-thread-pool-runtime # criterion comparison across execution strategies
cargo run -p rust-thread-pool-runtime --example basic_poolcargo build --workspace
cargo test --workspace
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo fmt --all -- --check
cargo bench -p rust-thread-pool-runtimeTo run the lock-free queue's own micro-benchmarks (separate from the thread pool's criterion suite) or its ThreadSanitizer correctness checks, see the Building and Running section of its README.
Concurrency primitives are usually presented in isolation — a queue, a thread pool — without showing how one becomes a building block of the other. This workspace is structured so that relationship is visible and testable: lock-free-queue is a standalone, independently benchmarked data structure, and thread-pool-runtime consumes it as a real path dependency (lfqueue = { path = "../lock-free-queue" }) to back one of its baseline pools, alongside the mutex-protected baseline it's compared against. The result is a small, concrete answer to a real systems question — when does a lock-free queue actually help at the head of a thread pool, and when does the simpler mutex hold its own — rather than an assertion that lock-free is unconditionally faster.
This project is licensed under the MIT License. See LICENSE for details.