Skip to content

examples: add long-running soak driver with consistency check (#22)#26

Merged
robertoberto merged 1 commit into
mainfrom
harden/soak
May 21, 2026
Merged

examples: add long-running soak driver with consistency check (#22)#26
robertoberto merged 1 commit into
mainfrom
harden/soak

Conversation

@robertoberto
Copy link
Copy Markdown
Contributor

Summary

Lands the soak slot of the Tier 1 hardening track from umbrella #22. Adds a long-running DataWal driver that exercises put / delete / fsync / compact under a wall-clock budget, samples Linux RSS / fd / segment counts to a CSV, and asserts the live keydir matches an in-memory oracle after a drop + reopen cycle.

Refs #22 hardening umbrella PR-4.

What this is, what it isn't

  • It is a runnable example, not a benchmark and not a CI job. The example only has to compile; running it is opt-in.
  • It is single-threaded by design because the crate is single-writer.
  • There is no QPS target, no SLA, no concurrency dimension, no power-loss simulation, no filesystem-corruption injection. Those are explicit Tier 2 items and stay out of scope here.

What ships

  • crates/datawal-core/examples/soak.rs (~495 LOC). Two modes (--mode synthetic / --mode real), env-var driven, CSV output at ${DATAWAL_SOAK_LOG_DIR}/soak.csv. Inner loop: weighted 70/25/5 small/medium/large stream selection, ~95% put / ~5% delete, explicit fsync every 1k ops, compact_to + swap-and-reopen every ROTATE_EVERY * COMPACT_EVERY_ROTATIONS ops (defaults 5000 × 4 = 20000). Final consistency check compares the reopened keydir against the in-memory oracle; exits 0 iff equal, 1 if different, 2 on setup error.
  • crates/datawal-core/examples/gen_soak_fixtures.rs (~180 LOC). Deterministic SplitMix64-seeded JSONL generator. Three committed fixtures under tests/fixtures/soak/ (small=100×512B, medium=100×3KiB, large=20×64KiB; ~2.2MB total). Same seeds reproduce the same bytes.
  • docs/soak.md. Documents synthetic vs real, env vars, CSV columns, exit codes, what the soak is and is not.
  • scripts/run_soak_real.sh. Refuses to run without explicit DATAWAL_SOAK_INPUT_{SMALL,MEDIUM,LARGE} pointing at readable files.

What stays the same

  • WIRE_VERSION = 1. Corpus fixtures untouched.
  • Public surface unchanged. No new types, no new methods, no new constants.
  • No new dependency. Uses existing workspace serde, serde_json, base64, anyhow.
  • CI matrix unchanged. The compile gate already covers --examples.
  • TLA+ models untouched.

Env vars

Variable Default Notes
DATAWAL_SOAK_DURATION 1800 (sec) Wall-clock budget.
DATAWAL_SOAK_ROTATE_EVERY 5000 Ops between rotations.
DATAWAL_SOAK_COMPACT_EVERY_ROTATIONS 4 Rotations between compactions.
DATAWAL_SOAK_PROGRESS_SECS 60 Seconds between CSV rows.
DATAWAL_SOAK_LOG_DIR /tmp Where soak.csv goes.
DATAWAL_SOAK_WORK_DIR ${TMPDIR}/datawal-soak Live store directory.
DATAWAL_SOAK_INPUT_{SMALL,MEDIUM,LARGE} Required in --mode real.

CSV columns: elapsed_s,rss_kb,fds,segments,live_keys,puts,deletes,rotates,compacts,bytes_written. rss_kb and fds are read from /proc/self/{status,fd} and are blank on non-Linux.

Local validation

  • cargo fmt --all -- --check — clean
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo test --workspace --all-targets — 124 tests green
  • RUSTDOCFLAGS='-D warnings' cargo doc --workspace --no-deps — clean
  • cargo build --workspace --examples — clean
  • 8-second synthetic smoke (rotate_every=200, compact_every_rotations=2): RSS stable at ~5.6 MB, fd count stable at 7, segment count stable at 1 under steady-state compaction, live-keys band 91–95, ~1 GB written, final consistency check OK, exit 0.

Out of scope (Tier 2)

  • dm-flakey injection
  • differential comparison against a second implementation
  • memory-leak instrumentation across the run
  • concurrent / multi-thread soak
  • CI invocation

Adds a soak example for DataWal that exercises put / delete / fsync /
compact over a wall-clock budget, samples Linux RSS / fd / segment
counts to a CSV, and asserts that the live keydir matches an in-memory
oracle after a drop + reopen cycle.

The soak is intentionally not part of CI and not part of the default
test suite. It only has to compile; running it is opt-in. There is no
claim about durability under power loss, OS crash, or hardware failure
and no QPS target.

Refs #22 hardening umbrella PR-4.

What ships:
- crates/datawal-core/examples/soak.rs (~495 LOC). Two modes
  (synthetic / real), env-var driven, CSV output. Mode `real` reads
  three JSONL files via DATAWAL_SOAK_INPUT_{SMALL,MEDIUM,LARGE}.
  Mode `synthetic` uses an in-process PRNG. Both share the same
  weighted-put / 5%-delete / fsync-every-1k / compact-every-N-ops
  inner loop. Final consistency check compares reopened keydir
  against the oracle; exit 0 iff equal.
- crates/datawal-core/examples/gen_soak_fixtures.rs (~180 LOC).
  Deterministic SplitMix64-seeded JSONL generator. Committed
  fixtures under tests/fixtures/soak/ are reproducible byte for
  byte from the same seeds (small=100x512B, medium=100x3KiB,
  large=20x64KiB; ~2.2MB total).
- docs/soak.md. Documents synthetic vs real, env vars, CSV
  columns, exit codes, what the soak is and is not.
- scripts/run_soak_real.sh. Refuses to run without explicit
  DATAWAL_SOAK_INPUT_{SMALL,MEDIUM,LARGE} pointing at readable
  files. No internal paths baked in.

What stays the same:
- WIRE_VERSION = 1, six corpus fixtures untouched.
- Public surface unchanged. No new types, no new methods.
- No new runtime dependencies. The example uses serde, serde_json,
  base64 and anyhow, all already in workspace.dependencies.
- CI signals unchanged. Examples compile via cargo build but are
  not executed.

Local validation:
- cargo fmt --all -- --check               clean
- cargo clippy --workspace --all-targets   clean
- cargo test --workspace --all-targets     124 tests green
- cargo doc --workspace --no-deps          clean (RUSTDOCFLAGS=-D)
- cargo build --workspace --examples       clean
- 8-second synthetic smoke: RSS stable at 5.6 MB, fd count 7,
  segment count 1, live keys oscillating in 90-95 band,
  ~1 GB written, final check OK, exit 0.

Not in this PR:
- dm-flakey injection
- differential comparison against a second implementation
- memory-leak instrumentation
- concurrent fuzz
- CI invocation of the soak

Those remain on the Tier 2 hardening shortlist.
@robertoberto robertoberto merged commit b69071e into main May 21, 2026
7 checks passed
@robertoberto robertoberto deleted the harden/soak branch May 21, 2026 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant