feat(keydir): store offsets and revalidate CRC on every get (#4) by robertoberto · Pull Request #28 · deepcausa/datawal

robertoberto · 2026-05-21T13:31:32Z

Summary

Switches DataWal's in-memory keydir from HashMap<Vec<u8>, Vec<u8>> (full payload cached) to HashMap<Vec<u8>, RecordRef> (segment + offset + len) backed by a small LRU fd-pool. Every get now does one pread and revalidates the full record CRC before returning.

This trades RAM for I/O. A DataWal with N keys and average payload P used to hold N * P bytes of payload in memory; it now holds N RecordRefs (~40 bytes each) and at most FdPool::DEFAULT_CAPACITY open fds. Soak driver smoke continues to hold RSS stable at ~5.6 MB under ~1 GB of write traffic.

Closes #4.

What this is, what it is not

Is	Is not
`HashMap<Vec<u8>, RecordRef>` + fd-pool LRU	An mmap path or page-cache layer
`pread` + CRC revalidate on every `get`	An optimistic / unchecked read path
Lazy `open` (no payload materialisation)	A removal of the in-memory keydir
Single-threaded `&mut self` semantics	A concurrent reader API (`RecordLogReader` from PR #27 covers that)

What ships

crates/datawal-core/src/fd_pool.rs (new, ~205 LOC + 6 unit tests)
- Tiny VecDeque-backed LRU keyed by segment_id.
- read_at(dir, segment_id, offset, len) uses pread (FileExt::read_exact_at) on Unix, falls back to seek+read on non-Unix.
- Default capacity 16; configurable via with_capacity. Capacity 0 disables caching (single replaceable slot) and is used by tests.
crates/datawal-core/src/datawal.rs (rewritten, ~340 LOC)
- Field map is now HashMap<Vec<u8>, RecordRef>.
- open rebuilds the keydir lazily via scan_iter (PR feat: add RecordLog::scan_iter for record-level lazy scanning #23 / bench: add criterion benchmarks for record log and datawal #15), never materialising payloads.
- get(&mut self, key) reads the record bytes through the fd-pool, decodes them, checks record_type == Put, and validates that the consumed byte count matches RecordRef.len. Truncation, CRC mismatch, and structural decode errors are reported with explicit segment/offset context.
- New helper ref_of(&self, key) -> Option<RecordRef> exposes the keydir entry without touching disk.
crates/datawal-core/tests/keydir_by_offset.rs (new, 245 LOC, 8 tests)
- get after drop+reopen, CRC revalidation on get, ref_of tracks deletes, ref_of advances on overwrite, get after rotation / compaction, fd reuse across 500 repeated gets (Linux-only /proc/self/fd assertion), items() revalidates CRC, compact_to preserves live state via the offset keydir.

Breaking changes (0.1.x semver-acceptable)

Four DataWal methods change signature:

Method	Before	After
`get`	`&self`	`&mut self`
`items`	`&self -> Vec<...>`	`&mut self -> Result<Vec<...>>`
`compact_to`	`&self -> Result<...>`	`&mut self -> Result<...>`
`export_jsonl`	`&self -> Result<()>`	`&mut self -> Result<()>`

All four perform reads against the underlying segments and must mutate the fd-pool. &mut self is the honest signature; no interior-mutability hack. Documented in the module rustdoc.

All in-tree callers (tests, examples, benches, CLI cmd_get, soak driver) are updated to bind handles as let mut. No semantic change in any of them.

What stays the same

Wire format frozen at WIRE_VERSION = 1. Six corpus fixtures unchanged.
Recovery is still longest valid prefix. Tail truncation on the active segment is non-fatal, CRC mismatch on a sealed segment is a hard error during scan_iter.
Single-writer, single-process invariant via fs2 exclusive lock on .lock is unchanged.
Compaction stays snapshot-style (source untouched).
MSRV stays at 1.75.0.
Public surface adds exactly one method (DataWal::ref_of). No new types: RecordRef was already re-exported.

Local validation

Gate	Result
`cargo fmt --all -- --check`	clean
`cargo clippy --workspace --all-targets -- -D warnings`	clean
`cargo test --workspace --all-targets`	137 passed, 1 ignored
`RUSTDOCFLAGS='-D warnings' cargo doc --workspace --no-deps`	clean
`cargo bench --workspace --no-run`	clean
`cargo publish --dry-run -p datawal --allow-dirty`	clean

Out of scope (deferred)

No fd-pool tuning APIs on DataWal (default 16 is plenty for current workloads; revisit when benchmarks demand it).
No pread_record low-level escape hatch.
No mmap path. pread keeps semantics simple and matches what the CRC check expects.
No benchmark refresh in this PR -- saved for the 0.1.4 release PR (Roadmap: crash injection and soak testing #8 in the kit roadmap).

Switches `DataWal`'s in-memory keydir from `HashMap<Vec<u8>, Vec<u8>>` (full payload cached) to `HashMap<Vec<u8>, RecordRef>` (segment + offset + len) backed by a small LRU fd-pool. Every `get` now does one `pread` and revalidates the full record CRC before returning. This trades RAM for I/O. A DataWal with N keys and average payload P used to hold N * P bytes of payload in memory; it now holds N RecordRefs (~40 bytes each) and at most `FdPool::DEFAULT_CAPACITY` open fds. Soak driver smoke continues to hold RSS stable at ~5.6 MB under 1 GB of write traffic. What ships ---------- * `crates/datawal-core/src/fd_pool.rs` (new, ~205 LOC + 6 unit tests) - Tiny VecDeque-backed LRU keyed by `segment_id`. - `read_at(dir, segment_id, offset, len)` uses `pread` (`FileExt::read_exact_at`) on Unix, falls back to seek+read on non-Unix. - Default capacity 16, configurable via `with_capacity`. - Capacity 0 disables caching (single replaceable slot) and is used by tests. * `crates/datawal-core/src/datawal.rs` (rewritten, ~340 LOC) - Field `map` is now `HashMap<Vec<u8>, RecordRef>`. - `open` rebuilds the keydir lazily via `scan_iter` (PR #23), never materialising payloads. - `get(&mut self, key)` reads the record bytes through the fd-pool, decodes them, checks `record_type == Put`, and validates that the consumed byte count matches `RecordRef.len`. Truncation, CRC mismatch, and structural decode errors are reported with explicit segment/offset context. - New helper `ref_of(&self, key) -> Option<RecordRef>` exposes the keydir entry without touching disk. * `crates/datawal-core/tests/keydir_by_offset.rs` (new, 245 LOC, 8 tests) - get after drop+reopen, CRC revalidation on get, ref_of tracks deletes, ref_of advances on overwrite, get after rotation / compaction, fd reuse across 500 repeated gets (Linux-only /proc/self/fd assertion), items() revalidates CRC, compact_to preserves live state via the offset keydir. Breaking changes (0.1.x semver-acceptable per owner) ---------------------------------------------------- Four `DataWal` methods change signature: * `get` : `&self` -> `&mut self` * `items` : `&self -> Vec<...>` -> `&mut self -> Result<Vec<...>>` * `compact_to` : `&self -> Result<...>` -> `&mut self -> Result<...>` * `export_jsonl`: `&self -> Result<()>` -> `&mut self -> Result<()>` All four perform reads against the underlying segments and must mutate the fd-pool. `&mut self` is the honest signature; no interior-mutability hack. Documented in the module rustdoc. All in-tree callers (tests, examples, benches, CLI `cmd_get`, soak driver) are updated to bind handles as `let mut`. No semantic change in any of them. What stays the same ------------------- * Wire format frozen at WIRE_VERSION=1. Six corpus fixtures unchanged. * Recovery is still longest valid prefix; tail truncation on the active segment is non-fatal, CRC mismatch on a sealed segment is a hard error during scan_iter. * Single-writer, single-process invariant via `fs2` exclusive lock on `.lock` is unchanged. * Compaction stays snapshot-style (source untouched). * MSRV stays at 1.75.0. * Public surface adds exactly one method (`DataWal::ref_of`). No new types: `RecordRef` was already re-exported. Local validation ---------------- cargo fmt --all -- --check : clean cargo clippy --workspace --all-targets -- -D warnings : clean cargo test --workspace --all-targets : 137 passed, 1 ignored cargo doc --workspace --no-deps : clean (RUSTDOCFLAGS=-D warnings) cargo bench --workspace --no-run : clean cargo publish --dry-run -p datawal --allow-dirty : clean Out of scope (deferred) ----------------------- * No fd-pool tuning APIs on `DataWal` (default 16 is plenty for the current workloads; revisit when benchmarks demand it). * No `pread_record` low-level escape hatch. * No mmap path. `pread` keeps semantics simple and matches what the CRC check expects. * No benchmark refresh in this PR -- saved for the 0.1.4 release PR (#8). Refs #4.

The fd_pool_reuses_fd_across_repeated_gets test asserted that the process-level fd count grew by no more than 4 across 500 gets. The GHA MSRV 1.75 runner reliably grows by 7 due to harness/runner fds unrelated to the pool, which has DEFAULT_CAPACITY=16 and at most one segment opened in this scenario. Relax the bound to capacity + slack (32) so the test signals only when the pool genuinely fails to reuse, not when the runner happens to hold extra fds. Local re-run: passes.

Replaces the TBD scaffold in docs/benchmarks/v0.1.4-reference.md with the numbers from the criterion run on the dia host (DATAWAL_BENCH_DIR on ext4 with relatime,discard, NVMe-PLP). Stack: ext4 with relatime,discard, on a single NVMe-PLP raw fsync(64B) median 1785 us, p10 1720 us, p90 1858 us Highlights vs alpha (where comparable): - append_no_fsync: ~same as alpha (codec path unchanged). - append_fsync_each: ~1.78 ms (alpha reported ~3.7 us; see below). - datawal_get: ~700-780 ns (alpha ~30-46 ns). Expected: PR #28 keydir-by-offset adds one pread + CRC32C verify per get. - datawal_open_rebuild: ~25.6 ms at 100k (alpha ~39 ms). Faster by design: offset-keydir does not materialise values on replay. - compact_to: 3-9x slower than alpha. Two factors, both expected: (1) real-fsync floor, (2) PR #28 re-reads each live value via fd-pool LRU instead of copying from memory. Bench-dir caveat (the reason the alpha numbers differed so much in fsync-bearing tables): a raw-fsync sanity check on the same machine gave 0.59 us on tmpfs vs 1785 us on ext4. The alpha-reference single-digit-microsecond fsyncs are only consistent with a tmpfs bench dir (i.e. DATAWAL_BENCH_DIR unset, defaulting to /tmp). The v0.1.4 run was collected with DATAWAL_BENCH_DIR pointing at the real ext4 mount, so the new numbers reflect honest durability costs. A retrospective note has been added at the top of v0.1.0-alpha-reference.md flagging this; historical numbers in that doc are kept verbatim as a record of what was published. Code in this PR: docs only. No source / config / Cargo.* changes. The release/0.1.4 branch is now ready for merge + tag v0.1.4 push once the rest of the PR review wraps.

robertoberto added 2 commits May 21, 2026 13:30

robertoberto merged commit ef1530f into main May 21, 2026
7 checks passed

robertoberto deleted the feat/keydir-offset branch May 21, 2026 13:39

robertoberto mentioned this pull request May 21, 2026

chore: release datawal 0.1.4 (first non-alpha) #30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(keydir): store offsets and revalidate CRC on every get (#4)#28

feat(keydir): store offsets and revalidate CRC on every get (#4)#28
robertoberto merged 2 commits into
mainfrom
feat/keydir-offset

robertoberto commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

robertoberto commented May 21, 2026

Summary

What this is, what it is not

What ships

Breaking changes (0.1.x semver-acceptable)

What stays the same

Local validation

Out of scope (deferred)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant