diff --git a/README.md b/README.md
index 9e08965..13c2543 100644
--- a/README.md
+++ b/README.md
@@ -2,11 +2,11 @@
[](https://github.com/IsaacCheng9/kv-engine/actions/workflows/test.yml)
-A C++23 LSM-tree key-value store with crash recovery and a gRPC API
-supporting point operations and server-streaming range scans.
+A C++23 LSM-tree key-value store with crash recovery and a gRPC API supporting
+point operations and server-streaming range scans.
-Modelled after LevelDB and RocksDB, with the LSM-tree design from O'Neil
-et al. (1996).
+Modelled after LevelDB and RocksDB, with the LSM-tree design from O'Neil et al.
+(1996).
## Key Features
@@ -23,13 +23,12 @@ et al. (1996).
- **SSTable reader cache** – parsed readers stay resident for each file's
lifetime and serve concurrent `get()` callers via positioned reads,
eliminating per-lookup open and index-parse cost
-- **Per-SSTable Bloom filter** – probabilistic membership test consulted
- before the binary search on `get()`, short-circuiting lookups for keys
- guaranteed not to be in the file (no false negatives, ~1% false positive
- rate)
+- **Per-SSTable Bloom filter** – probabilistic membership test consulted before
+ the binary search on `get()`, short-circuiting lookups for keys guaranteed not
+ to be in the file (no false negatives, ~1% false positive rate)
- **Key range pruning** – cached min/max keys let `get()` skip SSTables whose
- key range cannot contain the lookup key, avoiding the Bloom check and
- binary search entirely
+ key range cannot contain the lookup key, avoiding the Bloom check and binary
+ search entirely
- **gRPC API** – `Put` / `Get` / `Delete` as unary RPCs and `Scan` as
server-streaming, with snapshot semantics isolating in-flight scans from
concurrent writes, flushes, and compactions
@@ -38,6 +37,20 @@ et al. (1996).
- Raft consensus for distributed replication across multiple nodes
+## Performance
+
+Measured on M1 Max in Release build. Full numbers in
+[`docs/2026_05_05_grpc_with_scan_baseline.txt`](docs/2026_05_05_grpc_with_scan_baseline.txt).
+
+| Workload | Throughput | Latency (p50) | Notes |
+| ------------------- | -------------: | ------------: | ---------------------------------------------------------------- |
+| Memtable read | 2.6M ops/sec | 0.33 µs | Hot in-memory path |
+| SSTable read | 114k ops/sec | 8.54 µs | Cached reader + Bloom filter + range pruning |
+| Negative lookup | 73k ops/sec | 13.58 µs | All read-path optimisations short-circuit |
+| Write (`put`) | 16k ops/sec | 42 µs | `fsync`-bound on the WAL |
+| gRPC unary read | 7.3k ops/sec | ~130 µs | Loopback overhead vs direct in-process call |
+| gRPC streaming scan | ~117k rows/sec | ~8.5 µs/row | ~15x amortisation vs unary (HTTP/2 framing paid once per stream) |
+
## Architecture
```mermaid
@@ -46,10 +59,13 @@ flowchart TD
grpc -->|engine API| engine[Engine]
engine -->|writes| wal[Write-Ahead Log]
engine -->|writes / reads| memtable[Memtable
sorted in-memory]
+ engine -->|scan creates| snapshot[Scan Snapshot
memtable copy + SSTable readers]
memtable -->|flush when full| l0[L0 SSTables
overlapping key ranges]
l0 -->|background compaction| l1[L1 SSTables
merged, deduplicated]
- engine -.reads.-> l0
- engine -.reads.-> l1
+ engine -.point reads.-> l0
+ engine -.point reads.-> l1
+ snapshot -.range reads.-> l0
+ snapshot -.range reads.-> l1
wal -.replay on startup.-> memtable
```
@@ -145,17 +161,12 @@ the scan don't change what it yields. Tombstones are collapsed and shadowed
older versions of a key are discarded; the caller sees only the newest live
value per key in `[start_key, end_key)` order.
-### Performance
-
-On loopback (no real network RTT), gRPC adds ~130 µs round-trip vs direct
-in-process engine calls – HTTP/2 framing + protobuf serialise/deserialise +
-kernel TCP loopback. See the `grpc_*` rows in
-`docs/2026_05_05_grpc_with_scan_baseline.txt` for full numbers.
+### Why Server-Streaming for `Scan`
-Streaming RPCs amortise that overhead: `grpc_scan` measures ~8.5 µs per row vs
-~130 µs per unary call. Server-streaming pays the HTTP/2 framing cost once per
-stream rather than once per row, so the per-operation gRPC tax shrinks ~15x
-for range queries. This is the argument for using server-streaming `Scan` over
+Streaming RPCs amortise gRPC overhead: `grpc_scan` measures **~8.5 µs per row vs
+~130 µs per unary call.** Server-streaming pays the HTTP/2 framing cost once per
+stream rather than once per row, so the **per-operation gRPC tax shrinks ~15x
+for range queries.** This is the argument for using server-streaming `Scan` over
a cursor-based unary API for `Scan`-shaped workloads.
## Benchmarks