Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 33 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

[![Test](https://github.com/IsaacCheng9/kv-engine/actions/workflows/test.yml/badge.svg)](https://github.com/IsaacCheng9/kv-engine/actions/workflows/test.yml)

A C++23 LSM-tree key-value store with crash recovery and a gRPC API
supporting point operations and server-streaming range scans.
A C++23 LSM-tree key-value store with crash recovery and a gRPC API supporting
point operations and server-streaming range scans.

Modelled after LevelDB and RocksDB, with the LSM-tree design from O'Neil
et al. (1996).
Modelled after LevelDB and RocksDB, with the LSM-tree design from O'Neil et al.
(1996).

## Key Features

Expand All @@ -23,13 +23,12 @@ et al. (1996).
- **SSTable reader cache** – parsed readers stay resident for each file's
lifetime and serve concurrent `get()` callers via positioned reads,
eliminating per-lookup open and index-parse cost
- **Per-SSTable Bloom filter** – probabilistic membership test consulted
before the binary search on `get()`, short-circuiting lookups for keys
guaranteed not to be in the file (no false negatives, ~1% false positive
rate)
- **Per-SSTable Bloom filter** – probabilistic membership test consulted before
the binary search on `get()`, short-circuiting lookups for keys guaranteed not
to be in the file (no false negatives, ~1% false positive rate)
- **Key range pruning** – cached min/max keys let `get()` skip SSTables whose
key range cannot contain the lookup key, avoiding the Bloom check and
binary search entirely
key range cannot contain the lookup key, avoiding the Bloom check and binary
search entirely
- **gRPC API** – `Put` / `Get` / `Delete` as unary RPCs and `Scan` as
server-streaming, with snapshot semantics isolating in-flight scans from
concurrent writes, flushes, and compactions
Expand All @@ -38,6 +37,20 @@ et al. (1996).

- Raft consensus for distributed replication across multiple nodes

## Performance

Measured on M1 Max in Release build. Full numbers in
[`docs/2026_05_05_grpc_with_scan_baseline.txt`](docs/2026_05_05_grpc_with_scan_baseline.txt).

| Workload | Throughput | Latency (p50) | Notes |
| ------------------- | -------------: | ------------: | ---------------------------------------------------------------- |
| Memtable read | 2.6M ops/sec | 0.33 µs | Hot in-memory path |
| SSTable read | 114k ops/sec | 8.54 µs | Cached reader + Bloom filter + range pruning |
| Negative lookup | 73k ops/sec | 13.58 µs | All read-path optimisations short-circuit |
| Write (`put`) | 16k ops/sec | 42 µs | `fsync`-bound on the WAL |
| gRPC unary read | 7.3k ops/sec | ~130 µs | Loopback overhead vs direct in-process call |
| gRPC streaming scan | ~117k rows/sec | ~8.5 µs/row | ~15x amortisation vs unary (HTTP/2 framing paid once per stream) |

## Architecture

```mermaid
Expand All @@ -46,10 +59,13 @@ flowchart TD
grpc -->|engine API| engine[Engine]
engine -->|writes| wal[Write-Ahead Log]
engine -->|writes / reads| memtable[Memtable<br/>sorted in-memory]
engine -->|scan creates| snapshot[Scan Snapshot<br/>memtable copy + SSTable readers]
memtable -->|flush when full| l0[L0 SSTables<br/>overlapping key ranges]
l0 -->|background compaction| l1[L1 SSTables<br/>merged, deduplicated]
engine -.reads.-> l0
engine -.reads.-> l1
engine -.point reads.-> l0
engine -.point reads.-> l1
snapshot -.range reads.-> l0
snapshot -.range reads.-> l1
wal -.replay on startup.-> memtable
```

Expand Down Expand Up @@ -145,17 +161,12 @@ the scan don't change what it yields. Tombstones are collapsed and shadowed
older versions of a key are discarded; the caller sees only the newest live
value per key in `[start_key, end_key)` order.

### Performance

On loopback (no real network RTT), gRPC adds ~130 µs round-trip vs direct
in-process engine calls – HTTP/2 framing + protobuf serialise/deserialise +
kernel TCP loopback. See the `grpc_*` rows in
`docs/2026_05_05_grpc_with_scan_baseline.txt` for full numbers.
### Why Server-Streaming for `Scan`

Streaming RPCs amortise that overhead: `grpc_scan` measures ~8.5 µs per row vs
~130 µs per unary call. Server-streaming pays the HTTP/2 framing cost once per
stream rather than once per row, so the per-operation gRPC tax shrinks ~15x
for range queries. This is the argument for using server-streaming `Scan` over
Streaming RPCs amortise gRPC overhead: `grpc_scan` measures **~8.5 µs per row vs
~130 µs per unary call.** Server-streaming pays the HTTP/2 framing cost once per
stream rather than once per row, so the **per-operation gRPC tax shrinks ~15x
for range queries.** This is the argument for using server-streaming `Scan` over
a cursor-based unary API for `Scan`-shaped workloads.

## Benchmarks
Expand Down
Loading