Skip to content

feat(reader): add RecordLogReader for lockless concurrent inspection (#5)#27

Merged
robertoberto merged 1 commit into
mainfrom
feat/reader-api
May 21, 2026
Merged

feat(reader): add RecordLogReader for lockless concurrent inspection (#5)#27
robertoberto merged 1 commit into
mainfrom
feat/reader-api

Conversation

@robertoberto
Copy link
Copy Markdown
Contributor

Summary

Adds RecordLogReader, a read-only handle that can be opened on a
RecordLog directory in parallel with a live writer in the same or
another process, without contending for the writer's fs2
advisory lock.

The reader snapshots the segment id list at open(). Iteration is
performed via the existing RecordIter and obeys the same recovery
contract as the writer.

What this is, what it isn't

Is Concurrent inspection. Lockless reader. Snapshot-at-open.
Is not Live tail. Per-byte snapshot of the active segment. Substitute for the writer's own recovery.

The reader is intentionally cheap: no lock acquisition, no directory
creation, no fd hygiene tricks. It is suitable for local recoverable
logs where JSONL would otherwise be used.

What ships

  • src/record_log_reader.rs — new RecordLogReader { dir, snapshot_ids }
    with open(), dir(), segments(), scan_iter(). ~165 LOC.
  • src/record_log.rsRecordIter::new now pub(crate) so the new
    module can construct iterators. No behaviour change for existing
    callers. (+6 lines).
  • src/lib.rs — re-exports RecordLogReader. (+2 lines).
  • tests/reader_api.rsnine integration tests covering open
    semantics, snapshot freeze across rotate, intentional pinning of
    segment-level snapshot granularity, scan_iter idempotence,
    unfsynced tail tolerance, and hard error on corrupted sealed
    segment. ~277 LOC.
  • formal/ReadWhileWrite.tla + .cfg — TLA+ model for
    read-while-write. Six invariants. ~190 + 12 LOC.
  • formal/README.md — documents the new model, removes the
    "NOT modelled" line, adds run command.
  • .github/workflows/ci.yml — adds TLC ReadWhileWrite as a fourth
    step in the formal job, mirroring the existing pattern.

Total: 4 new files + 4 modified, +690 / -4 LOC.

What stays the same

  • Wire format frozen at WIRE_VERSION = 1. Six corpus fixtures
    unchanged.
  • Single-writer single-process invariant on RecordLog. The writer's
    fs2 lock is unaffected; readers do not acquire any lock.
  • Longest-valid-prefix recovery. The reader reuses RecordIter and
    inherits the per-segment behaviour.
  • Snapshot-style compaction. The reader is independent of compaction.
  • MSRV 1.75.0. No new dependencies.

Public surface delta

pub use record_log_reader::RecordLogReader;

One additive type. Aligned with the kit 0.1.4 release.

Snapshot semantics (pinned)

Snapshot is at segment-id granularity. Records appended to a
segment that was already in the snapshot at open() time will be
visible
to the reader when it reaches that segment. This is
documented in the module-level rustdoc and pinned by a dedicated
test (reader_observes_writer_appends_within_snapshot_segments),
and modelled in ReadWhileWrite.tla by snapshotting
durable \o buffered at open.

Local validation

Gate Result
cargo fmt --all -- --check clean
cargo clippy --workspace --all-targets -- -D warnings clean
cargo test --workspace --all-targets 131 tests pass (incl. 9 new)
RUSTDOCFLAGS='-D warnings' cargo doc --workspace --no-deps clean
cargo bench --workspace --no-run builds
cargo publish --dry-run -p datawal --allow-dirty clean

TLC ReadWhileWrite is exercised in CI; the workbench has no Java 21
to run it locally.

TLA+ model summary

ReadWhileWrite.tla extends the durability model with reader state:

  • Variables added: readerOpen, snapshot, yielded.
  • Reader actions: DoOpenReader (snapshots durable \o buffered),
    DoReaderStep (yields the next element of snapshot),
    DoCloseReader (clears reader state).
  • Invariants:
    • YieldedIsPrefix — the reader's yielded sequence is always a
      prefix of its snapshot.
    • NoSpuriousYield — every yielded element was appended.
    • ReaderBoundedByOpenLen(yielded) <= Len(snapshot).
    • ClosedReaderHasNoState.

Deliberately not re-modelled here: crash (covered by RecordLog.tla),
concurrent writers (single-writer by contract).

Out of scope

  • Live tail / notification on new records.
  • Cross-process reader fuzz (in-process exercises the same code path).
  • Per-byte snapshot of the active segment.

Closes #5.

)

Adds RecordLogReader, a read-only handle that can be opened on a
RecordLog directory in parallel with a live writer in the same or
another process, without contending for the writer's fs2 advisory
lock.

The reader snapshots the segment id list at open(). Iteration is
performed via the existing RecordIter and obeys the same recovery
contract as the writer: a truncated tail on the last segment yields
a clean end and is reflected in the RecoveryReport; a CRC mismatch
or truncated frame in a sealed (non-active) segment is a hard error.

What ships
----------

* src/record_log_reader.rs: new RecordLogReader { dir, snapshot_ids }
  with open(), dir(), segments(), and scan_iter(). open() does not
  create the directory and does not acquire any lock; it errors on a
  missing or non-directory path. scan_iter() can be called repeatedly
  and returns a fresh iterator over the same snapshot each time.
* src/record_log.rs: RecordIter::new is now pub(crate) so the reader
  module can construct iterators against the snapshotted segment ids.
  No behaviour change for existing callers.
* src/lib.rs: re-exports RecordLogReader. Public surface delta is one
  additive type.
* tests/reader_api.rs: nine integration tests covering open semantics
  (missing path, non-dir path, empty dir, locked-by-writer), snapshot
  freeze across rotate, intentional pinning of segment-level snapshot
  granularity (within-snapshotted-segment appends remain visible),
  scan_iter idempotence across repeated calls, tolerance of unfsynced
  tail on the last segment, and hard error on a corrupted sealed
  segment.
* formal/ReadWhileWrite.tla and ReadWhileWrite.cfg: a new TLA+ model
  for read-while-write. Writer actions append + fsync as in
  RecordLog.tla. Reader actions snapshot the recoverable prefix at
  open, step through it record-by-record, and close. Invariants:
  YieldedIsPrefix(snapshot), NoSpuriousYield, ReaderBoundedByOpen,
  ClosedReaderHasNoState, plus type and self-prefix sanity.
* formal/README.md: documents the new model, removes it from the
  "NOT modelled" list, and adds a fourth java/TLC invocation example.
* .github/workflows/ci.yml: adds TLC ReadWhileWrite as a fourth step
  in the formal job, mirroring the existing pattern, and uploads the
  log to the tlc-logs artifact. Job display name updated.

What stays the same
-------------------

* Wire format frozen at WIRE_VERSION = 1. Six corpus fixtures unchanged.
* Single-writer single-process invariant on RecordLog. The writer's
  fs2 lock is unaffected; readers do not acquire any lock.
* Longest-valid-prefix recovery contract. The reader reuses the
  existing RecordIter and inherits the same per-segment behaviour.
* Snapshot-style compaction. The reader is independent of compaction.
* MSRV 1.75.0. No new dependencies.

Public surface delta
--------------------

+ pub use record_log_reader::RecordLogReader

Snapshot is at segment-id granularity. Records appended to a segment
that was already in the snapshot at open time will be visible to the
reader when it gets to that segment. This is documented in the
module-level rustdoc and pinned by a dedicated test
(reader_observes_writer_appends_within_snapshot_segments), and is
modelled by snapshotting durable \o buffered at open time in
ReadWhileWrite.tla.

Local validation
----------------

* cargo fmt --all -- --check: clean
* cargo clippy --workspace --all-targets -- -D warnings: clean
* cargo test --workspace --all-targets: 131 tests pass (including 9
  new in tests/reader_api.rs)
* RUSTDOCFLAGS='-D warnings' cargo doc --workspace --no-deps: clean
* cargo bench --workspace --no-run: builds
* cargo publish --dry-run -p datawal --allow-dirty: clean

TLC ReadWhileWrite is exercised in CI; the workbench has no Java 21
to run it locally.

Out of scope
------------

* Live tail / notification on new records. The reader is snapshot at
  open by design; a follow-on iterator-from-offset variant can be
  added without changing this contract.
* Cross-process concurrent reader fuzz. The reader's lack of lock
  acquisition means a remote process can hold the writer lock; the
  in-process test exercises the same code path.
* Per-byte snapshot of the active segment. Segment-id snapshot
  is sufficient for inspection and keeps the reader trivially cheap.

Closes #5.
@robertoberto robertoberto merged commit 0d3cae8 into main May 21, 2026
7 checks passed
@robertoberto robertoberto deleted the feat/reader-api branch May 21, 2026 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Roadmap: reader API and ReadWhileWrite model

1 participant