Skip to content

perf(datastore): enable WAL journal mode#616

Merged
ErikBjare merged 2 commits into
ActivityWatch:masterfrom
0xbrayo:wal-mode-miscs
Jun 12, 2026
Merged

perf(datastore): enable WAL journal mode#616
ErikBjare merged 2 commits into
ActivityWatch:masterfrom
0xbrayo:wal-mode-miscs

Conversation

@0xbrayo

@0xbrayo 0xbrayo commented Jun 12, 2026

Copy link
Copy Markdown
Member

Re-lands the WAL change that was split out of #615 so that PR could merge, together with a fix for the race that made test_datastore_reload fail on the ubuntu runner when this was briefly part of #615 (run 27410891058).

Commits

perf(datastore): enable WAL journal mode with synchronous=FULL

Each batch commit becomes a single sequential WAL append+fsync where delete mode paid two fsyncs plus rollback-journal file create/delete churn, and commits no longer block readers — groundwork for serving reads from parallel read-only connections later (the existing TODO at the top of worker.rs).

  • No db migration involved: journal_mode=WAL is set at connection open and persists in the db header; any SQLite ≥ 3.7.0 (2010) can still open the file.
  • In-memory datastores (tests, --no-db) ignore it and keep journal_mode=memory.
  • Verified on a 98 MB / 544k-event production snapshot: WAL active, clean checkpoint on shutdown, integrity_check ok, heartbeat throughput +4% on top of the Performance improvements #615 numbers (5,572 → 5,815/s), reads unchanged.

fix(datastore): ack ForceCommit/Close only after the transaction commits

This is the actual cause of the ubuntu CI failure, a pre-existing bug that WAL unmasked rather than a WAL problem: the worker acked ForceCommit/Close inside the request loop, but tx.commit() only runs after breaking out of it — so force_commit() returned while the data was still uncommitted. test_datastore_reload calls force_commit(), reopens the datastore, and the new instance populates its bucket-metadata cache from a snapshot that could predate the commit. The rollback journal's locking happened to serialize the second reader behind the commit; in WAL readers never block on the writer, so the stale read became reachable (macOS won the race, ubuntu lost it).

The acks for these two commands are now held until tx.commit() returns, and a failed commit is propagated to the caller as an error instead of a silent success — this also fixes close() acknowledging shutdown before the final commit was durable. All other commands (heartbeats in particular) are still acked immediately and never wait on the batch commit. test_datastore_reload passed 50/50 stress iterations with the fix; full workspace suite green.

Why synchronous=FULL and not NORMAL?

Raised by @ErikBjare on #615 — the usual "WAL+NORMAL for many small writes" advice doesn't transfer here because the worker already does group commit at the application level: all writes accumulate in one transaction committed every 15 s / 100 events / on bucket change. The per-commit fsync that NORMAL exists to eliminate happens ~4×/minute, not per heartbeat — it isn't in the per-write path at all (the +4% above was measured with FULL, hammering writes ~1000× faster than real watchers).

What NORMAL would cost: the WAL is then only synced at checkpoints, and autocheckpoint triggers on page count (1000 pages), not time — at AW's trickle write rate a power loss could drop hours of activity since the last checkpoint, vs ≤15 s today. Bounding that would need a time-based wal_checkpoint, i.e. fsyncing on a timer — which is what FULL already does, given commits are timer-driven. FULL keeps exactly the durability semantics the server has always had (delete mode's default is FULL), with strictly fewer syncs than before.

Readers are unaffected either way: in WAL the commit fsync holds only the write lock, and there's a single application-level writer.

0xbrayo added 2 commits June 12, 2026 14:11
In delete mode every batch commit paid two fsyncs plus rollback-journal
file create/delete churn; in WAL a commit is a single sequential WAL
append+fsync. It also stops commits from blocking future reader
connections, which is groundwork for serving reads in parallel with
the writer.

synchronous=FULL is set explicitly so each commit is durable the moment
it returns, same as the delete-mode default. The overall durability
window is unchanged: the worker already batches events into one
transaction committed every 15 s / 100 events / bucket change, and that
policy is untouched.

In-memory datastores (tests, --no-db) ignore the pragma and keep
journal_mode=memory. Verified on a 98 MB production snapshot: WAL
active, clean checkpoint on shutdown, integrity ok, heartbeat
throughput +4% (5572 -> 5815/s), reads unchanged.
The worker acked ForceCommit and Close inside the request loop and only
committed the batch transaction after breaking out of it, so callers
were told their data was committed before it was. Under the rollback
journal this was masked by file locking, but with WAL readers never
block on the writer: a caller could force_commit, reopen the database,
and read a pre-commit snapshot. This is exactly how
test_datastore_reload failed on the ubuntu CI runner (bucket metadata
cache populated from a snapshot taken before the forced commit landed).

Hold the ack for these two commands until tx.commit() returns, and
propagate a commit failure to the caller as an error instead of a
silent success. Heartbeats and other commands are still acked
immediately, since they must not wait for the batch commit.
@greptile-apps

greptile-apps Bot commented Jun 12, 2026

Copy link
Copy Markdown

Greptile Summary

This PR enables SQLite WAL journal mode on every file-based connection and fixes a pre-existing race where ForceCommit/Close acks were sent before tx.commit() ran — a bug that WAL's non-blocking reader semantics made reliably observable in CI.

  • WAL enablement (lines 142–157): PRAGMA journal_mode=WAL is set at connection open with synchronous=FULL; in-memory databases silently stay in memory mode, and any other failure to activate WAL degrades gracefully to a logged warning rather than a panic.
  • Deferred ack for ForceCommit/Close (lines 197–265): the response sender is now held in deferred_ack and flushed only after tx.commit() succeeds; on commit failure an InternalError is returned to the caller instead of a silent success, fixing the stale-snapshot read in test_datastore_reload and the dropped final commit in close().

Confidence Score: 5/5

Safe to merge. The deferred-ack change eliminates a commit-before-ack inversion that was always a bug; WAL activation is additive and falls back gracefully.

Both changes are narrowly scoped to worker.rs. The race fix is correct: deferred_ack is set atomically before breaking the inner loop, the ack is sent only after tx.commit() returns, and the error path propagates commit failures to callers rather than swallowing them. The WAL path is guarded against unsupported modes with a warn-and-continue fallback, synchronous=FULL preserves existing durability semantics, and in-memory databases are explicitly excluded from the WAL warning. The PR description documents the durability tradeoff analysis thoroughly, and the fix was stress-tested 50 iterations.

No files require special attention. The single changed file has clear, well-commented logic and the control-flow changes are easy to follow.

Important Files Changed

Filename Overview
aw-datastore/src/worker.rs Two changes: WAL journal mode enabled at connection open with a graceful warn-and-continue fallback for in-memory/unsupported cases; and the ForceCommit/Close ack is now correctly deferred until after tx.commit() to fix a pre-existing race that WAL made observable.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant W as Worker Thread
    participant DB as SQLite WAL

    note over C,DB: Before this PR - race condition
    C->>W: ForceCommit or Close
    W->>W: handle_request sets commit or quit flag
    W-->>C: ACK sent immediately before commit
    C->>DB: Reopen datastore reads pre-commit snapshot
    W->>DB: tx.commit() runs too late

    note over C,DB: After this PR - race fixed
    C->>W: ForceCommit or Close
    W->>W: handle_request sets commit or quit flag
    W->>W: store deferred_ack and break inner loop
    W->>DB: tx.commit()
    DB-->>W: Ok
    W-->>C: ACK sent after durable commit
    C->>DB: Reopen datastore reads committed snapshot

    note over C,DB: Commit failure path
    W->>DB: tx.commit()
    DB-->>W: Err
    W-->>C: Err InternalError propagated to caller
Loading

Reviews (1): Last reviewed commit: "fix(datastore): ack ForceCommit/Close on..." | Re-trigger Greptile

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.09%. Comparing base (656f3c9) to head (30c914b).
⚠️ Report is 60 commits behind head on master.

Files with missing lines Patch % Lines
aw-datastore/src/worker.rs 75.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #616      +/-   ##
==========================================
+ Coverage   70.81%   77.09%   +6.27%     
==========================================
  Files          51       62      +11     
  Lines        2916     4951    +2035     
==========================================
+ Hits         2065     3817    +1752     
- Misses        851     1134     +283     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ErikBjare ErikBjare merged commit d7349c2 into ActivityWatch:master Jun 12, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants