exp 168: synchronous fast path for uncontended writer mutex#162
exp 168: synchronous fast path for uncontended writer mutex#162danReynolds wants to merge 4 commits into
Conversation
`await _mutex.lock()` always paid a microtask hop even uncontended, because Dart's async wrapper around `lock()` yields once before the caller resumes. Add `Mutex.lockSync()` that returns `null` when the lock is acquired without waiting, and rewrite `Writer.execute` / `Writer.executeBatch` as non-async around it. Slow path defers to a new `_executeSlow` / `_executeBatchSlow` helper that mirrors the existing await-then-try/finally body, so the contended semantics are preserved bit for bit. `Writer.locked` (transactions) is intentionally left unchanged so `writer_pipelining`'s `transaction-guardrail` row acts as a clean control. Focused `benchmark/experiments/writer_pipelining.dart`, three paired passes with stash/pop side-flipping (medians, ms): shape baseline candidate delta sequential-awaited (2000 writes) 32.686 30.868 -5.6% concurrent-burst (10 x 200 writes) 25.272 24.607 -2.6% transaction-guardrail (50 tx x 10) 4.282 4.240 -1.0% Per-pass deltas were monotonic in direction on the two changed paths (-4% to -9% sequential; -2.6% to -4.8% concurrent); the transaction guardrail stayed inside +-2% across all three passes. Test update: the `close()` during contention case used to rely on back-to-back `db.execute(...)` calls all parking on the lock for one microtask while `close()` set `_closed`. With the fast path, those sends all reach the worker's port FIFO before `close()` runs, so the test now uses a long-running transaction to force the queue. A new test pins the new fast-path semantics: writes submitted synchronously before `close()` succeed. Aggregate: benchmark/profile/results/exp-168-uncontended-write-mutex-fast-path.md Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adopt the conventional spelling used by Java
`Lock.tryLock()`, Rust `Mutex::try_lock`, Python
`Lock.acquire(blocking=False)`, and Go 1.18+
`sync.Mutex.TryLock`. The new API returns `bool` instead of
`Future<void>?`, which reads more naturally at call sites:
if (!_mutex.tryLock()) {
return _executeSlow(...);
}
// hold lock, send, unlock
The slow-path helpers no longer take a waiter parameter — they
just `await _mutex.lock()` directly. Semantics, tests, and
focused benchmark numbers are unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`_executeSlow` / `_executeBatchSlow` named the helpers for the dispatch reason (vs. the fast path) rather than what they do. The actual job is: await the writer mutex, then run an execute that mirrors the fast-path body. Rename to `_executeAwaitingLock` / `_executeBatchAwaitingLock` so the name describes the work, and update the matching doc and test references. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Collided with two other exp-168 PRs (the lockSync win #162 keeps 168, the Database runtime-cache rejection becomes 171). Pure renumber: file, README row, JOURNAL link, and signals.json key/archive/narrative.
) * exp 168: synchronous uncontended writer mutex fast-path (rejected) Tested Mutex.tryLock + non-`async` Writer.execute / executeBatch as the request-side counterpart to exp 151's rejected response-side Completer<T>.sync() attempt. Hypothesis: dropping the uncontended `await _mutex.lock()` microtask hop plus the async wrapper's implicit reply await would move the Single Inserts (100 sequential) lane. Paired focused + release runs (3 alternating pairs each): - Single Inserts (100 sequential): 3.099 → 3.153 ms median (+1.7%) - writer_pipelining sequential-awaited (2000 writes): 34.047 → 34.723 ms median (+2.0%) - Concurrent Single Inserts (100 concurrent): 1.231 → 1.135 ms (-7.8%) - writer_pipelining concurrent-burst (10×200): 27.524 → 26.981 ms (-2.0%) - transaction-guardrail (50 tx × 10): 4.611 → 4.770 ms (+3.4%) The primary sequential lanes moved within ±2 % in the wrong direction. The only positive signal (Concurrent Single Inserts) is owned by exp 159 at -58 % to -61 % already. Rejected as request-scheduling change below current signal. No runtime code kept; doc + signals carry the durable evidence so the same idea is not retried without a Dart-runtime change or a workload where main-isolate scheduling dominates writer-side execution. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * exp 168: add JOURNAL entry on rejecting symmetric-side scheduling tweaks Records the lesson that exp 151 (response-side) and exp 168 (request-side) produced the same verdict under the same gate; the candidate that follows a scheduling-shape rejection should change the mechanism, not the side. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Regenerate experiment history after exp 168 rebase * Renumber exp 168 (tryLock request-side mutex) -> exp 170 Collided with two other exp-168 PRs (the lockSync win #162 keeps 168, the Database runtime-cache rejection becomes 171). Pure renumber: file, README row, JOURNAL link, and signals.json key/archive/narrative. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Third of the three-way exp-168 collision (lockSync win #162 keeps 168, tryLock rejection became 170). Renames the experiment + profile-aggregate files and updates the README row and signals.json key/archive/narrative.
* exp 168: reject Database resolved-runtime cache Mirror exp 159's `_sendPort` cache one layer up: a sync-readable `_resolvedRuntime` field on `Database` so post-open `select` / `selectBytes` / `execute` / `executeBatch` / `transaction` skip the `await _runtime` microtask hop. Two order-flipped passes on writer_pipelining.dart produced alternating-sign deltas inside per-round variance — sequential-awaited -2.3% / +2.5%, transaction-guardrail -7.5% / +6.1%, concurrent-burst +4-5% both passes — so the ~1-2 us per-call hop sits at or below the focused-harness floor. No runtime code kept. Database-layer microtask hop trimming above the writer no longer moves exp 159's sequential-write residual floor; the next reduction candidate must reduce round-trip count (group commit) or change transport, not chase hops. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Renumber exp 168 (Database resolved-runtime cache) -> exp 171 Third of the three-way exp-168 collision (lockSync win #162 keeps 168, tryLock rejection became 170). Renames the experiment + profile-aggregate files and updates the README row and signals.json key/archive/narrative. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
exp 168 review cleanup: the fast/slow x execute/batch split wrote the lock/check/send/unlock body four times, with unlock in two sites per non-async path. Factor it into one generic _sendUnderLock<T>: a single try/catch/finally where finally runs after send() is evaluated, so the lock still releases at send time (not reply time) from one place, and a sync failure becomes Future.error so the public contract never throws synchronously. execute/executeBatch are now ~5 lines each: tryLock fast path returns _sendUnderLock(send) directly (win preserved — still non-async, no microtask hop); the contended path is one _mutex.lock().then. No public API or behavior change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Review cleanup pushed (459433b)Collapsed the fast/slow × execute/batch structure (4 functions, the lock/check/send/unlock body written 4×, unlock in 2 sites per non-async path) into one generic helper: Future<ExecuteResponse> execute(String sql, [params, id]) {
Future<ExecuteResponse> send() => executeInTransaction(sql, params, id);
if (_mutex.tryLock()) return _sendUnderLock(send); // fast: non-async, no hop
return _mutex.lock().then((_) => _sendUnderLock(send)); // contended: one line
}
Future<T> _sendUnderLock<T>(Future<T> Function() send) {
try {
if (_closed) throw ResqliteConnectionException('Database is closed.');
return send(); // finally runs AFTER this → unlock at send time
} catch (e, st) {
return Future.error(e, st); // preserve 'never throws synchronously'
} finally {
_mutex.unlock(); // one unlock site for both paths
}
}Why the shape is irreducible, not arbitrary: the win requires Verification: |
Hypothesis
Exp 159 made the writer's request path synchronous through
SendPort.sendand pipelined concurrent standalone writes by releasing the write lock at send time. It still relied onawait _mutex.lock()to acquire the mutex, even when no transaction was holding it.Mutex.lockisasync, so its implicit Future hops once before the caller resumes — every uncontendeddb.execute/db.executeBatchpaid that microtask hop on its way toSendPort.send.Adding
Mutex.lockSync()(returnsnullwhen the lock can be claimed without waiting) and rewritingWriter.execute/Writer.executeBatchas non-asyncshould remove that hop. Dart is single-threaded, so checking the completer and claiming the slot in one synchronous call is safe.Approach
Mutex.lockSync()— additive, opt-in. Returnsnull(lock acquired sync) or theFuture<void>fromlock()(caller shouldawaitit). Existinglock()/unlock()/run()are unchanged.Writer.execute/Writer.executeBatch— no longerasync. Fast path: synchronous lock,_closedcheck,executeInTransaction/executeBatchInTransactionsend, unlock, return the replyFuture. Synchronous throws are converted toFuture.errorso the public contract — "always returns aFuture" — is preserved._executeSlow/_executeBatchSlowasynchelpers thatawaitthe waiter, then mirror the original try/finally body bit for bit.Writer.locked(transactions) is intentionally unchanged. Transactions hold the lock across BEGIN/body/COMMIT, where the microtask hop is dominated by the round-trips. Leaving this path on the async lock keepswriter_pipelining'stransaction-guardrailas a clean control.See
experiments/168-uncontended-write-mutex-fast-path.mdfor the full reasoning and prior-experiment context.Results
Focused
benchmark/experiments/writer_pipelining.dart, three paired passes with stash/pop side-flipping (medians, ms):sequential-awaited(2000 writes)concurrent-burst(10 × 200 writes)transaction-guardrail(50 tx × 10)Per-pass:
sequential-awaitedconcurrent-bursttransaction-guardrailThe transaction guardrail stays inside ±2% across all three passes — the expected null result for the untouched code path. Full per-round timings:
benchmark/profile/results/exp-168-uncontended-write-mutex-fast-path.md.Outcome
In Review (accept-shaped). The change is a focused overhead removal: monotonic direction on the two paths it touches (sequential-awaited and concurrent-burst), neutral on the unchanged path (transactions), no public API change. The
Mutexcontract is unchanged;lockSync()is additive. Reopen if exp 161's release rows (Single Inserts (100 sequential),Concurrent Single Inserts (100 concurrent)) regress during the soak window, or if any future workload shows the missing fast path onWriter.lockedis now material.One behavioral change worth calling out: writes submitted synchronously before
close()(e.g.db.execute(A); db.execute(B); db.close();) now succeed instead of sometimes throwingResqliteConnectionException. The old behavior was an artifact of the async lock wait; A and B'sExecuteRequests now reach the worker's port FIFO beforeclose()enqueuesCloseRequest. The original close-vs-queued-writer test is updated to force the queue with a long-running transaction (preserving the original assertion), and a new test pins the new fast-path semantics.Test plan
dart pub getdart analyze lib/src/mutex.dart lib/src/writer/writer.dart(no issues)dart test test/transaction_test.dart test/database_test.dart test/profile_counters_test.dart test/stream_test.dart test/stream_invalidation_coalescing_test.dart test/stream_dependency_shapes_test.dart test/stream_cache_hit_reliability_test.dart test/stream_overflow_fallback_test.dart test/stream_trigger_cascade_test.dart test/reader_pool_test.dart test/query_decoder_test.dart test/diagnostics_test.dart(all pass)dart run benchmark/experiments/writer_pipelining.dart— three paired passesdart run benchmark/finalize_experiment.dart --experiment=experiments/168-uncontended-write-mutex-fast-path.md(green;history.jsonregenerated)Single Inserts (100 sequential)andConcurrent Single Inserts (100 concurrent)rows for the public-lane confirmation of this focused signal🤖 Generated with Claude Code