feat(keydir): store offsets and revalidate CRC on every get (#4)#28
Merged
Conversation
Switches `DataWal`'s in-memory keydir from
`HashMap<Vec<u8>, Vec<u8>>` (full payload cached) to
`HashMap<Vec<u8>, RecordRef>` (segment + offset + len) backed by a
small LRU fd-pool. Every `get` now does one `pread` and revalidates
the full record CRC before returning.
This trades RAM for I/O. A DataWal with N keys and average payload P
used to hold N * P bytes of payload in memory; it now holds N
RecordRefs (~40 bytes each) and at most `FdPool::DEFAULT_CAPACITY`
open fds. Soak driver smoke continues to hold RSS stable at ~5.6 MB
under 1 GB of write traffic.
What ships
----------
* `crates/datawal-core/src/fd_pool.rs` (new, ~205 LOC + 6 unit tests)
- Tiny VecDeque-backed LRU keyed by `segment_id`.
- `read_at(dir, segment_id, offset, len)` uses `pread`
(`FileExt::read_exact_at`) on Unix, falls back to seek+read on
non-Unix.
- Default capacity 16, configurable via `with_capacity`.
- Capacity 0 disables caching (single replaceable slot) and is
used by tests.
* `crates/datawal-core/src/datawal.rs` (rewritten, ~340 LOC)
- Field `map` is now `HashMap<Vec<u8>, RecordRef>`.
- `open` rebuilds the keydir lazily via `scan_iter` (PR #23),
never materialising payloads.
- `get(&mut self, key)` reads the record bytes through the
fd-pool, decodes them, checks `record_type == Put`, and
validates that the consumed byte count matches `RecordRef.len`.
Truncation, CRC mismatch, and structural decode errors are
reported with explicit segment/offset context.
- New helper `ref_of(&self, key) -> Option<RecordRef>` exposes
the keydir entry without touching disk.
* `crates/datawal-core/tests/keydir_by_offset.rs` (new, 245 LOC, 8
tests)
- get after drop+reopen, CRC revalidation on get, ref_of tracks
deletes, ref_of advances on overwrite, get after rotation /
compaction, fd reuse across 500 repeated gets (Linux-only
/proc/self/fd assertion), items() revalidates CRC, compact_to
preserves live state via the offset keydir.
Breaking changes (0.1.x semver-acceptable per owner)
----------------------------------------------------
Four `DataWal` methods change signature:
* `get` : `&self` -> `&mut self`
* `items` : `&self -> Vec<...>` -> `&mut self -> Result<Vec<...>>`
* `compact_to` : `&self -> Result<...>` -> `&mut self -> Result<...>`
* `export_jsonl`: `&self -> Result<()>` -> `&mut self -> Result<()>`
All four perform reads against the underlying segments and must
mutate the fd-pool. `&mut self` is the honest signature; no
interior-mutability hack. Documented in the module rustdoc.
All in-tree callers (tests, examples, benches, CLI `cmd_get`,
soak driver) are updated to bind handles as `let mut`. No semantic
change in any of them.
What stays the same
-------------------
* Wire format frozen at WIRE_VERSION=1. Six corpus fixtures
unchanged.
* Recovery is still longest valid prefix; tail truncation on the
active segment is non-fatal, CRC mismatch on a sealed segment is
a hard error during scan_iter.
* Single-writer, single-process invariant via `fs2` exclusive lock
on `.lock` is unchanged.
* Compaction stays snapshot-style (source untouched).
* MSRV stays at 1.75.0.
* Public surface adds exactly one method (`DataWal::ref_of`). No
new types: `RecordRef` was already re-exported.
Local validation
----------------
cargo fmt --all -- --check : clean
cargo clippy --workspace --all-targets -- -D warnings : clean
cargo test --workspace --all-targets : 137 passed, 1 ignored
cargo doc --workspace --no-deps : clean (RUSTDOCFLAGS=-D warnings)
cargo bench --workspace --no-run : clean
cargo publish --dry-run -p datawal --allow-dirty : clean
Out of scope (deferred)
-----------------------
* No fd-pool tuning APIs on `DataWal` (default 16 is plenty for
the current workloads; revisit when benchmarks demand it).
* No `pread_record` low-level escape hatch.
* No mmap path. `pread` keeps semantics simple and matches what the
CRC check expects.
* No benchmark refresh in this PR -- saved for the 0.1.4 release PR
(#8).
Refs #4.
The fd_pool_reuses_fd_across_repeated_gets test asserted that the process-level fd count grew by no more than 4 across 500 gets. The GHA MSRV 1.75 runner reliably grows by 7 due to harness/runner fds unrelated to the pool, which has DEFAULT_CAPACITY=16 and at most one segment opened in this scenario. Relax the bound to capacity + slack (32) so the test signals only when the pool genuinely fails to reuse, not when the runner happens to hold extra fds. Local re-run: passes.
robertoberto
added a commit
that referenced
this pull request
May 21, 2026
Replaces the TBD scaffold in docs/benchmarks/v0.1.4-reference.md with the numbers from the criterion run on the dia host (DATAWAL_BENCH_DIR on ext4 with relatime,discard, NVMe-PLP). Stack: ext4 with relatime,discard, on a single NVMe-PLP raw fsync(64B) median 1785 us, p10 1720 us, p90 1858 us Highlights vs alpha (where comparable): - append_no_fsync: ~same as alpha (codec path unchanged). - append_fsync_each: ~1.78 ms (alpha reported ~3.7 us; see below). - datawal_get: ~700-780 ns (alpha ~30-46 ns). Expected: PR #28 keydir-by-offset adds one pread + CRC32C verify per get. - datawal_open_rebuild: ~25.6 ms at 100k (alpha ~39 ms). Faster by design: offset-keydir does not materialise values on replay. - compact_to: 3-9x slower than alpha. Two factors, both expected: (1) real-fsync floor, (2) PR #28 re-reads each live value via fd-pool LRU instead of copying from memory. Bench-dir caveat (the reason the alpha numbers differed so much in fsync-bearing tables): a raw-fsync sanity check on the same machine gave 0.59 us on tmpfs vs 1785 us on ext4. The alpha-reference single-digit-microsecond fsyncs are only consistent with a tmpfs bench dir (i.e. DATAWAL_BENCH_DIR unset, defaulting to /tmp). The v0.1.4 run was collected with DATAWAL_BENCH_DIR pointing at the real ext4 mount, so the new numbers reflect honest durability costs. A retrospective note has been added at the top of v0.1.0-alpha-reference.md flagging this; historical numbers in that doc are kept verbatim as a record of what was published. Code in this PR: docs only. No source / config / Cargo.* changes. The release/0.1.4 branch is now ready for merge + tag v0.1.4 push once the rest of the PR review wraps.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switches
DataWal's in-memory keydir fromHashMap<Vec<u8>, Vec<u8>>(full payload cached) toHashMap<Vec<u8>, RecordRef>(segment + offset + len) backed by a small LRU fd-pool. Everygetnow does onepreadand revalidates the full record CRC before returning.This trades RAM for I/O. A
DataWalwithNkeys and average payloadPused to holdN * Pbytes of payload in memory; it now holdsNRecordRefs (~40 bytes each) and at mostFdPool::DEFAULT_CAPACITYopen fds. Soak driver smoke continues to hold RSS stable at ~5.6 MB under ~1 GB of write traffic.Closes #4.
What this is, what it is not
HashMap<Vec<u8>, RecordRef>+ fd-pool LRUpread+ CRC revalidate on everygetopen(no payload materialisation)&mut selfsemanticsRecordLogReaderfrom PR #27 covers that)What ships
crates/datawal-core/src/fd_pool.rs(new, ~205 LOC + 6 unit tests)VecDeque-backed LRU keyed bysegment_id.read_at(dir, segment_id, offset, len)usespread(FileExt::read_exact_at) on Unix, falls back to seek+read on non-Unix.with_capacity. Capacity 0 disables caching (single replaceable slot) and is used by tests.crates/datawal-core/src/datawal.rs(rewritten, ~340 LOC)mapis nowHashMap<Vec<u8>, RecordRef>.openrebuilds the keydir lazily viascan_iter(PR feat: add RecordLog::scan_iter for record-level lazy scanning #23 / bench: add criterion benchmarks for record log and datawal #15), never materialising payloads.get(&mut self, key)reads the record bytes through the fd-pool, decodes them, checksrecord_type == Put, and validates that the consumed byte count matchesRecordRef.len. Truncation, CRC mismatch, and structural decode errors are reported with explicit segment/offset context.ref_of(&self, key) -> Option<RecordRef>exposes the keydir entry without touching disk.crates/datawal-core/tests/keydir_by_offset.rs(new, 245 LOC, 8 tests)/proc/self/fdassertion),items()revalidates CRC,compact_topreserves live state via the offset keydir.Breaking changes (0.1.x semver-acceptable)
Four
DataWalmethods change signature:get&self&mut selfitems&self -> Vec<...>&mut self -> Result<Vec<...>>compact_to&self -> Result<...>&mut self -> Result<...>export_jsonl&self -> Result<()>&mut self -> Result<()>All four perform reads against the underlying segments and must mutate the fd-pool.
&mut selfis the honest signature; no interior-mutability hack. Documented in the module rustdoc.All in-tree callers (tests, examples, benches, CLI
cmd_get, soak driver) are updated to bind handles aslet mut. No semantic change in any of them.What stays the same
WIRE_VERSION = 1. Six corpus fixtures unchanged.scan_iter.fs2exclusive lock on.lockis unchanged.DataWal::ref_of). No new types:RecordRefwas already re-exported.Local validation
cargo fmt --all -- --checkcargo clippy --workspace --all-targets -- -D warningscargo test --workspace --all-targetsRUSTDOCFLAGS='-D warnings' cargo doc --workspace --no-depscargo bench --workspace --no-runcargo publish --dry-run -p datawal --allow-dirtyOut of scope (deferred)
DataWal(default 16 is plenty for current workloads; revisit when benchmarks demand it).pread_recordlow-level escape hatch.preadkeeps semantics simple and matches what the CRC check expects.