perf(datastore): enable WAL journal mode by 0xbrayo · Pull Request #616 · ActivityWatch/aw-server-rust

0xbrayo · 2026-06-12T11:16:53Z

Re-lands the WAL change that was split out of #615 so that PR could merge, together with a fix for the race that made test_datastore_reload fail on the ubuntu runner when this was briefly part of #615 (run 27410891058).

Commits

perf(datastore): enable WAL journal mode with synchronous=FULL

Each batch commit becomes a single sequential WAL append+fsync where delete mode paid two fsyncs plus rollback-journal file create/delete churn, and commits no longer block readers — groundwork for serving reads from parallel read-only connections later (the existing TODO at the top of worker.rs).

No db migration involved: journal_mode=WAL is set at connection open and persists in the db header; any SQLite ≥ 3.7.0 (2010) can still open the file.
In-memory datastores (tests, --no-db) ignore it and keep journal_mode=memory.
Verified on a 98 MB / 544k-event production snapshot: WAL active, clean checkpoint on shutdown, integrity_check ok, heartbeat throughput +4% on top of the Performance improvements #615 numbers (5,572 → 5,815/s), reads unchanged.

fix(datastore): ack ForceCommit/Close only after the transaction commits

This is the actual cause of the ubuntu CI failure, a pre-existing bug that WAL unmasked rather than a WAL problem: the worker acked ForceCommit/Close inside the request loop, but tx.commit() only runs after breaking out of it — so force_commit() returned while the data was still uncommitted. test_datastore_reload calls force_commit(), reopens the datastore, and the new instance populates its bucket-metadata cache from a snapshot that could predate the commit. The rollback journal's locking happened to serialize the second reader behind the commit; in WAL readers never block on the writer, so the stale read became reachable (macOS won the race, ubuntu lost it).

The acks for these two commands are now held until tx.commit() returns, and a failed commit is propagated to the caller as an error instead of a silent success — this also fixes close() acknowledging shutdown before the final commit was durable. All other commands (heartbeats in particular) are still acked immediately and never wait on the batch commit. test_datastore_reload passed 50/50 stress iterations with the fix; full workspace suite green.

Why `synchronous=FULL` and not `NORMAL`?

Raised by @ErikBjare on #615 — the usual "WAL+NORMAL for many small writes" advice doesn't transfer here because the worker already does group commit at the application level: all writes accumulate in one transaction committed every 15 s / 100 events / on bucket change. The per-commit fsync that NORMAL exists to eliminate happens ~4×/minute, not per heartbeat — it isn't in the per-write path at all (the +4% above was measured with FULL, hammering writes ~1000× faster than real watchers).

What NORMAL would cost: the WAL is then only synced at checkpoints, and autocheckpoint triggers on page count (1000 pages), not time — at AW's trickle write rate a power loss could drop hours of activity since the last checkpoint, vs ≤15 s today. Bounding that would need a time-based wal_checkpoint, i.e. fsyncing on a timer — which is what FULL already does, given commits are timer-driven. FULL keeps exactly the durability semantics the server has always had (delete mode's default is FULL), with strictly fewer syncs than before.

Readers are unaffected either way: in WAL the commit fsync holds only the write lock, and there's a single application-level writer.

In delete mode every batch commit paid two fsyncs plus rollback-journal file create/delete churn; in WAL a commit is a single sequential WAL append+fsync. It also stops commits from blocking future reader connections, which is groundwork for serving reads in parallel with the writer. synchronous=FULL is set explicitly so each commit is durable the moment it returns, same as the delete-mode default. The overall durability window is unchanged: the worker already batches events into one transaction committed every 15 s / 100 events / bucket change, and that policy is untouched. In-memory datastores (tests, --no-db) ignore the pragma and keep journal_mode=memory. Verified on a 98 MB production snapshot: WAL active, clean checkpoint on shutdown, integrity ok, heartbeat throughput +4% (5572 -> 5815/s), reads unchanged.

The worker acked ForceCommit and Close inside the request loop and only committed the batch transaction after breaking out of it, so callers were told their data was committed before it was. Under the rollback journal this was masked by file locking, but with WAL readers never block on the writer: a caller could force_commit, reopen the database, and read a pre-commit snapshot. This is exactly how test_datastore_reload failed on the ubuntu CI runner (bucket metadata cache populated from a snapshot taken before the forced commit landed). Hold the ack for these two commands until tx.commit() returns, and propagate a commit failure to the caller as an error instead of a silent success. Heartbeats and other commands are still acked immediately, since they must not wait for the batch commit.

greptile-apps · 2026-06-12T11:22:53Z

Greptile Summary

This PR enables SQLite WAL journal mode on every file-based connection and fixes a pre-existing race where ForceCommit/Close acks were sent before tx.commit() ran — a bug that WAL's non-blocking reader semantics made reliably observable in CI.

WAL enablement (lines 142–157): PRAGMA journal_mode=WAL is set at connection open with synchronous=FULL; in-memory databases silently stay in memory mode, and any other failure to activate WAL degrades gracefully to a logged warning rather than a panic.
Deferred ack for ForceCommit/Close (lines 197–265): the response sender is now held in deferred_ack and flushed only after tx.commit() succeeds; on commit failure an InternalError is returned to the caller instead of a silent success, fixing the stale-snapshot read in test_datastore_reload and the dropped final commit in close().

Confidence Score: 5/5

Safe to merge. The deferred-ack change eliminates a commit-before-ack inversion that was always a bug; WAL activation is additive and falls back gracefully.

Both changes are narrowly scoped to worker.rs. The race fix is correct: deferred_ack is set atomically before breaking the inner loop, the ack is sent only after tx.commit() returns, and the error path propagates commit failures to callers rather than swallowing them. The WAL path is guarded against unsupported modes with a warn-and-continue fallback, synchronous=FULL preserves existing durability semantics, and in-memory databases are explicitly excluded from the WAL warning. The PR description documents the durability tradeoff analysis thoroughly, and the fix was stress-tested 50 iterations.

No files require special attention. The single changed file has clear, well-commented logic and the control-flow changes are easy to follow.

Important Files Changed

Filename	Overview
aw-datastore/src/worker.rs	Two changes: WAL journal mode enabled at connection open with a graceful warn-and-continue fallback for in-memory/unsupported cases; and the ForceCommit/Close ack is now correctly deferred until after tx.commit() to fix a pre-existing race that WAL made observable.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant W as Worker Thread
    participant DB as SQLite WAL

    note over C,DB: Before this PR - race condition
    C->>W: ForceCommit or Close
    W->>W: handle_request sets commit or quit flag
    W-->>C: ACK sent immediately before commit
    C->>DB: Reopen datastore reads pre-commit snapshot
    W->>DB: tx.commit() runs too late

    note over C,DB: After this PR - race fixed
    C->>W: ForceCommit or Close
    W->>W: handle_request sets commit or quit flag
    W->>W: store deferred_ack and break inner loop
    W->>DB: tx.commit()
    DB-->>W: Ok
    W-->>C: ACK sent after durable commit
    C->>DB: Reopen datastore reads committed snapshot

    note over C,DB: Commit failure path
    W->>DB: tx.commit()
    DB-->>W: Err
    W-->>C: Err InternalError propagated to caller

_{Reviews (1): Last reviewed commit: "fix(datastore): ack ForceCommit/Close on..." | Re-trigger Greptile}

codecov · 2026-06-12T11:24:32Z

Codecov Report

❌ Patch coverage is 75.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.09%. Comparing base (656f3c9) to head (30c914b).
⚠️ Report is 60 commits behind head on master.

Files with missing lines	Patch %	Lines
aw-datastore/src/worker.rs	75.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #616      +/-   ##
==========================================
+ Coverage   70.81%   77.09%   +6.27%     
==========================================
  Files          51       62      +11     
  Lines        2916     4951    +2035     
==========================================
+ Hits         2065     3817    +1752     
- Misses        851     1134     +283

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

0xbrayo added 2 commits June 12, 2026 14:11

ErikBjare merged commit d7349c2 into ActivityWatch:master Jun 12, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf(datastore): enable WAL journal mode#616

perf(datastore): enable WAL journal mode#616
ErikBjare merged 2 commits into
ActivityWatch:masterfrom
0xbrayo:wal-mode-miscs

0xbrayo commented Jun 12, 2026

Uh oh!

greptile-apps Bot commented Jun 12, 2026

Uh oh!

codecov Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

0xbrayo commented Jun 12, 2026

Commits

Why synchronous=FULL and not NORMAL?

Uh oh!

greptile-apps Bot commented Jun 12, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

codecov Bot commented Jun 12, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why `synchronous=FULL` and not `NORMAL`?