A compact Rust implementation of the Bitcask log-structured key/value storage model.
Bitcask is the storage design behind Riak's low-latency key/value engine: writes are appended to data files, reads are served through an in-memory key directory, and old immutable files can be memory mapped for fast lookup. This repository implements those core ideas in a small, readable Rust crate.
Reference: Bitcask: A Log-Structured Hash Table for Fast Key/Value Data
- Append-only data files with fixed binary record headers and CRC32 validation.
- In-memory key directory mapping UUID keys to file id, offset, and value length.
- Memory-mapped read path that returns borrowed bytes without copying value payloads.
- Startup recovery by scanning data files and rebuilding the latest key directory state.
- Hint-file loading support for faster rebuilds when
.hintfiles are present. - Active-file rotation based on configurable
max_file_size. - Lock-file protection with blocking
openand non-blockingtry_openmodes. - Focused test coverage for basic persistence behavior and data-file rotation.
This is a systems-programming project meant to make storage-engine tradeoffs concrete:
- binary layout design with
zerocopy - crash-aware append-only recovery
- file locking and single-writer ownership
mmap-backed immutable data access- careful offset/length accounting across file rotation
- small public API over a lower-level persistence model
The code is intentionally compact enough to audit while still exercising real database internals.
This crate currently targets Rust nightly because it uses nightly-only standard-library features.
rustup toolchain install nightly
cargo +nightly test
cargo +nightly run --example hello_worlduse uuid::Uuid;
fn main() -> anyhow::Result<()> {
let database = bitcask::open(
"./my_database",
bitcask::Options {
max_file_size: 2 * 1024 * 1024 * 1024,
},
)?;
let id = Uuid::now_v7();
database.put(id, b"Hello, World!")?;
let value = database.get(id).unwrap();
println!("{}", str::from_utf8(&value)?);
Ok(())
}pub fn open<P>(directory_path: P, options: Options) -> Result<Bitcask>;
pub fn try_open<P>(directory_path: P, options: Options) -> Result<Bitcask>;
impl Bitcask {
pub fn put(&self, key: Uuid, value: &[u8]) -> Result<()>;
pub fn get(&self, key: Uuid) -> Option<MmapBytes>;
}get returns MmapBytes, a lightweight view into a memory-mapped data file. That keeps the read path small and avoids copying stored values into a new buffer.
Data files are named with monotonically increasing numeric ids:
000000.data
000001.data
000002.data
Each value is written as:
DataEntry { crc, timestamp, key, value_len } + value bytes
On startup, the database:
- Acquires a
LOCKfile to prevent multiple processes from opening the same store for writes. - Lists and sorts existing
.datafiles. - Loads older files as immutable memory maps.
- Rebuilds the key directory from
.hintfiles when available, otherwise from.datarecords. - Scans the active file up to the last valid CRC-checked record and resumes appending there.
Implemented:
- UUID-keyed
putandget - append-only persistence
- active-file rotation
- memory-mapped immutable reads
- key-directory rebuild on open
- CRC validation during data-file scans
- lock-file based process exclusion
Not yet implemented:
- merge/compaction of stale records
- public delete/tombstone API
- hint-file generation
- transactions or batch writes
- stable-Rust support
src/lib.rs Core Bitcask implementation and tests
examples/hello_world.rs Minimal end-to-end usage example
Cargo.toml Crate manifest and dependency list
cargo +nightly test
cargo +nightly fmt --check
cargo +nightly clippy --all-targets --all-features -- -D warningsThe current unit tests cover the basic write/read path and rotation across multiple data files.