Skip to content

dprkh/bitcask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bitcask

A compact Rust implementation of the Bitcask log-structured key/value storage model.

Bitcask is the storage design behind Riak's low-latency key/value engine: writes are appended to data files, reads are served through an in-memory key directory, and old immutable files can be memory mapped for fast lookup. This repository implements those core ideas in a small, readable Rust crate.

Reference: Bitcask: A Log-Structured Hash Table for Fast Key/Value Data

Highlights

  • Append-only data files with fixed binary record headers and CRC32 validation.
  • In-memory key directory mapping UUID keys to file id, offset, and value length.
  • Memory-mapped read path that returns borrowed bytes without copying value payloads.
  • Startup recovery by scanning data files and rebuilding the latest key directory state.
  • Hint-file loading support for faster rebuilds when .hint files are present.
  • Active-file rotation based on configurable max_file_size.
  • Lock-file protection with blocking open and non-blocking try_open modes.
  • Focused test coverage for basic persistence behavior and data-file rotation.

Why this project exists

This is a systems-programming project meant to make storage-engine tradeoffs concrete:

  • binary layout design with zerocopy
  • crash-aware append-only recovery
  • file locking and single-writer ownership
  • mmap-backed immutable data access
  • careful offset/length accounting across file rotation
  • small public API over a lower-level persistence model

The code is intentionally compact enough to audit while still exercising real database internals.

Quick Start

This crate currently targets Rust nightly because it uses nightly-only standard-library features.

rustup toolchain install nightly
cargo +nightly test
cargo +nightly run --example hello_world

Example

use uuid::Uuid;

fn main() -> anyhow::Result<()> {
    let database = bitcask::open(
        "./my_database",
        bitcask::Options {
            max_file_size: 2 * 1024 * 1024 * 1024,
        },
    )?;

    let id = Uuid::now_v7();

    database.put(id, b"Hello, World!")?;

    let value = database.get(id).unwrap();

    println!("{}", str::from_utf8(&value)?);

    Ok(())
}

API Surface

pub fn open<P>(directory_path: P, options: Options) -> Result<Bitcask>;
pub fn try_open<P>(directory_path: P, options: Options) -> Result<Bitcask>;

impl Bitcask {
    pub fn put(&self, key: Uuid, value: &[u8]) -> Result<()>;
    pub fn get(&self, key: Uuid) -> Option<MmapBytes>;
}

get returns MmapBytes, a lightweight view into a memory-mapped data file. That keeps the read path small and avoids copying stored values into a new buffer.

Storage Model

Data files are named with monotonically increasing numeric ids:

000000.data
000001.data
000002.data

Each value is written as:

DataEntry { crc, timestamp, key, value_len } + value bytes

On startup, the database:

  1. Acquires a LOCK file to prevent multiple processes from opening the same store for writes.
  2. Lists and sorts existing .data files.
  3. Loads older files as immutable memory maps.
  4. Rebuilds the key directory from .hint files when available, otherwise from .data records.
  5. Scans the active file up to the last valid CRC-checked record and resumes appending there.

Current Scope

Implemented:

  • UUID-keyed put and get
  • append-only persistence
  • active-file rotation
  • memory-mapped immutable reads
  • key-directory rebuild on open
  • CRC validation during data-file scans
  • lock-file based process exclusion

Not yet implemented:

  • merge/compaction of stale records
  • public delete/tombstone API
  • hint-file generation
  • transactions or batch writes
  • stable-Rust support

Project Layout

src/lib.rs              Core Bitcask implementation and tests
examples/hello_world.rs Minimal end-to-end usage example
Cargo.toml              Crate manifest and dependency list

Verification

cargo +nightly test
cargo +nightly fmt --check
cargo +nightly clippy --all-targets --all-features -- -D warnings

The current unit tests cover the basic write/read path and rotation across multiple data files.

Releases

No releases published

Packages

 
 
 

Contributors

Languages