From ab39a6d519d6373208628846e1649a5888e4b16f Mon Sep 17 00:00:00 2001 From: Isaac Cheng <47993930+IsaacCheng9@users.noreply.github.com> Date: Mon, 11 May 2026 00:13:46 +0100 Subject: [PATCH 1/3] fix: Show `Scan`snapshot in README architecture diagram --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9e08965..55d77fb 100644 --- a/README.md +++ b/README.md @@ -46,10 +46,13 @@ flowchart TD grpc -->|engine API| engine[Engine] engine -->|writes| wal[Write-Ahead Log] engine -->|writes / reads| memtable[Memtable
sorted in-memory] + engine -->|scan creates| snapshot[Scan Snapshot
memtable copy + SSTable readers] memtable -->|flush when full| l0[L0 SSTables
overlapping key ranges] l0 -->|background compaction| l1[L1 SSTables
merged, deduplicated] - engine -.reads.-> l0 - engine -.reads.-> l1 + engine -.point reads.-> l0 + engine -.point reads.-> l1 + snapshot -.range reads.-> l0 + snapshot -.range reads.-> l1 wal -.replay on startup.-> memtable ``` From 3f0e45ff00e376b8497272f55acd66c9967b1f84 Mon Sep 17 00:00:00 2001 From: Isaac Cheng <47993930+IsaacCheng9@users.noreply.github.com> Date: Mon, 11 May 2026 00:18:28 +0100 Subject: [PATCH 2/3] fix: Add performance table derived from benchmark results --- README.md | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 55d77fb..d97ac2a 100644 --- a/README.md +++ b/README.md @@ -2,11 +2,11 @@ [![Test](https://github.com/IsaacCheng9/kv-engine/actions/workflows/test.yml/badge.svg)](https://github.com/IsaacCheng9/kv-engine/actions/workflows/test.yml) -A C++23 LSM-tree key-value store with crash recovery and a gRPC API -supporting point operations and server-streaming range scans. +A C++23 LSM-tree key-value store with crash recovery and a gRPC API supporting +point operations and server-streaming range scans. -Modelled after LevelDB and RocksDB, with the LSM-tree design from O'Neil -et al. (1996). +Modelled after LevelDB and RocksDB, with the LSM-tree design from O'Neil et al. +(1996). ## Key Features @@ -23,13 +23,12 @@ et al. (1996). - **SSTable reader cache** – parsed readers stay resident for each file's lifetime and serve concurrent `get()` callers via positioned reads, eliminating per-lookup open and index-parse cost -- **Per-SSTable Bloom filter** – probabilistic membership test consulted - before the binary search on `get()`, short-circuiting lookups for keys - guaranteed not to be in the file (no false negatives, ~1% false positive - rate) +- **Per-SSTable Bloom filter** – probabilistic membership test consulted before + the binary search on `get()`, short-circuiting lookups for keys guaranteed not + to be in the file (no false negatives, ~1% false positive rate) - **Key range pruning** – cached min/max keys let `get()` skip SSTables whose - key range cannot contain the lookup key, avoiding the Bloom check and - binary search entirely + key range cannot contain the lookup key, avoiding the Bloom check and binary + search entirely - **gRPC API** – `Put` / `Get` / `Delete` as unary RPCs and `Scan` as server-streaming, with snapshot semantics isolating in-flight scans from concurrent writes, flushes, and compactions @@ -38,6 +37,20 @@ et al. (1996). - Raft consensus for distributed replication across multiple nodes +## Performance + +Measured on M1 Max in Release build. Full numbers in +[`docs/2026_05_05_grpc_with_scan_baseline.txt`](docs/2026_05_05_grpc_with_scan_baseline.txt). + +| Workload | Throughput | Latency (p50) | Notes | +| ------------------- | -------------: | ------------: | ---------------------------------------------------------------- | +| Memtable read | 2.6M ops/sec | 0.33 µs | Hot in-memory path | +| SSTable read | 114k ops/sec | 8.54 µs | Cached reader + Bloom filter + range pruning | +| Negative lookup | 73k ops/sec | 13.58 µs | All read-path optimisations short-circuit | +| Write (`put`) | 16k ops/sec | 42 µs | `fsync`-bound on the WAL | +| gRPC unary read | 7.3k ops/sec | ~130 µs | Loopback overhead vs direct in-process call | +| gRPC streaming scan | ~117k rows/sec | ~8.5 µs/row | ~15x amortisation vs unary (HTTP/2 framing paid once per stream) | + ## Architecture ```mermaid From 9a30e9378e7f9456843c2c5d655a266efc7c056b Mon Sep 17 00:00:00 2001 From: Isaac Cheng <47993930+IsaacCheng9@users.noreply.github.com> Date: Mon, 11 May 2026 00:18:47 +0100 Subject: [PATCH 3/3] fix: Explain why we used server-streaming for gRPC `Scan` --- README.md | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index d97ac2a..13c2543 100644 --- a/README.md +++ b/README.md @@ -161,17 +161,12 @@ the scan don't change what it yields. Tombstones are collapsed and shadowed older versions of a key are discarded; the caller sees only the newest live value per key in `[start_key, end_key)` order. -### Performance +### Why Server-Streaming for `Scan` -On loopback (no real network RTT), gRPC adds ~130 µs round-trip vs direct -in-process engine calls – HTTP/2 framing + protobuf serialise/deserialise + -kernel TCP loopback. See the `grpc_*` rows in -`docs/2026_05_05_grpc_with_scan_baseline.txt` for full numbers. - -Streaming RPCs amortise that overhead: `grpc_scan` measures ~8.5 µs per row vs -~130 µs per unary call. Server-streaming pays the HTTP/2 framing cost once per -stream rather than once per row, so the per-operation gRPC tax shrinks ~15x -for range queries. This is the argument for using server-streaming `Scan` over +Streaming RPCs amortise gRPC overhead: `grpc_scan` measures **~8.5 µs per row vs +~130 µs per unary call.** Server-streaming pays the HTTP/2 framing cost once per +stream rather than once per row, so the **per-operation gRPC tax shrinks ~15x +for range queries.** This is the argument for using server-streaming `Scan` over a cursor-based unary API for `Scan`-shaped workloads. ## Benchmarks