Skip to content

exp 168: synchronous fast path for uncontended writer mutex#162

Closed
danReynolds wants to merge 4 commits into
mainfrom
exp-168-uncontended-write-mutex-fast-path
Closed

exp 168: synchronous fast path for uncontended writer mutex#162
danReynolds wants to merge 4 commits into
mainfrom
exp-168-uncontended-write-mutex-fast-path

Conversation

@danReynolds

Copy link
Copy Markdown
Owner

Hypothesis

Exp 159 made the writer's request path synchronous through SendPort.send and pipelined concurrent standalone writes by releasing the write lock at send time. It still relied on await _mutex.lock() to acquire the mutex, even when no transaction was holding it. Mutex.lock is async, so its implicit Future hops once before the caller resumes — every uncontended db.execute / db.executeBatch paid that microtask hop on its way to SendPort.send.

Adding Mutex.lockSync() (returns null when the lock can be claimed without waiting) and rewriting Writer.execute / Writer.executeBatch as non-async should remove that hop. Dart is single-threaded, so checking the completer and claiming the slot in one synchronous call is safe.

Approach

  • Mutex.lockSync() — additive, opt-in. Returns null (lock acquired sync) or the Future<void> from lock() (caller should await it). Existing lock() / unlock() / run() are unchanged.
  • Writer.execute / Writer.executeBatch — no longer async. Fast path: synchronous lock, _closed check, executeInTransaction / executeBatchInTransaction send, unlock, return the reply Future. Synchronous throws are converted to Future.error so the public contract — "always returns a Future" — is preserved.
  • Slow path: new _executeSlow / _executeBatchSlow async helpers that await the waiter, then mirror the original try/finally body bit for bit.
  • Writer.locked (transactions) is intentionally unchanged. Transactions hold the lock across BEGIN/body/COMMIT, where the microtask hop is dominated by the round-trips. Leaving this path on the async lock keeps writer_pipelining's transaction-guardrail as a clean control.

See experiments/168-uncontended-write-mutex-fast-path.md for the full reasoning and prior-experiment context.

Results

Focused benchmark/experiments/writer_pipelining.dart, three paired passes with stash/pop side-flipping (medians, ms):

shape baseline candidate delta
sequential-awaited (2000 writes) 32.686 30.868 -5.6%
concurrent-burst (10 × 200 writes) 25.272 24.607 -2.6%
transaction-guardrail (50 tx × 10) 4.282 4.240 -1.0%

Per-pass:

shape pass 1 pass 2 pass 3
sequential-awaited -4.0% -9.0% -5.0%
concurrent-burst -4.4% -2.6% -4.8%
transaction-guardrail +1.6% 0.0% -1.5%

The transaction guardrail stays inside ±2% across all three passes — the expected null result for the untouched code path. Full per-round timings: benchmark/profile/results/exp-168-uncontended-write-mutex-fast-path.md.

Outcome

In Review (accept-shaped). The change is a focused overhead removal: monotonic direction on the two paths it touches (sequential-awaited and concurrent-burst), neutral on the unchanged path (transactions), no public API change. The Mutex contract is unchanged; lockSync() is additive. Reopen if exp 161's release rows (Single Inserts (100 sequential), Concurrent Single Inserts (100 concurrent)) regress during the soak window, or if any future workload shows the missing fast path on Writer.locked is now material.

One behavioral change worth calling out: writes submitted synchronously before close() (e.g. db.execute(A); db.execute(B); db.close();) now succeed instead of sometimes throwing ResqliteConnectionException. The old behavior was an artifact of the async lock wait; A and B's ExecuteRequests now reach the worker's port FIFO before close() enqueues CloseRequest. The original close-vs-queued-writer test is updated to force the queue with a long-running transaction (preserving the original assertion), and a new test pins the new fast-path semantics.

Test plan

  • dart pub get
  • dart analyze lib/src/mutex.dart lib/src/writer/writer.dart (no issues)
  • dart test test/transaction_test.dart test/database_test.dart test/profile_counters_test.dart test/stream_test.dart test/stream_invalidation_coalescing_test.dart test/stream_dependency_shapes_test.dart test/stream_cache_hit_reliability_test.dart test/stream_overflow_fallback_test.dart test/stream_trigger_cascade_test.dart test/reader_pool_test.dart test/query_decoder_test.dart test/diagnostics_test.dart (all pass)
  • Focused A/B: dart run benchmark/experiments/writer_pipelining.dart — three paired passes
  • dart run benchmark/finalize_experiment.dart --experiment=experiments/168-uncontended-write-mutex-fast-path.md (green; history.json regenerated)
  • CI / release-suite soak — watch Single Inserts (100 sequential) and Concurrent Single Inserts (100 concurrent) rows for the public-lane confirmation of this focused signal

🤖 Generated with Claude Code

danReynolds and others added 3 commits June 13, 2026 07:28
`await _mutex.lock()` always paid a microtask hop even uncontended,
because Dart's async wrapper around `lock()` yields once before the
caller resumes. Add `Mutex.lockSync()` that returns `null` when the
lock is acquired without waiting, and rewrite `Writer.execute` /
`Writer.executeBatch` as non-async around it. Slow path defers to a
new `_executeSlow` / `_executeBatchSlow` helper that mirrors the
existing await-then-try/finally body, so the contended semantics are
preserved bit for bit. `Writer.locked` (transactions) is intentionally
left unchanged so `writer_pipelining`'s `transaction-guardrail` row
acts as a clean control.

Focused `benchmark/experiments/writer_pipelining.dart`, three paired
passes with stash/pop side-flipping (medians, ms):

  shape                                 baseline   candidate   delta
  sequential-awaited (2000 writes)       32.686     30.868    -5.6%
  concurrent-burst (10 x 200 writes)     25.272     24.607    -2.6%
  transaction-guardrail (50 tx x 10)      4.282      4.240    -1.0%

Per-pass deltas were monotonic in direction on the two changed paths
(-4% to -9% sequential; -2.6% to -4.8% concurrent); the transaction
guardrail stayed inside +-2% across all three passes.

Test update: the `close()` during contention case used to rely on
back-to-back `db.execute(...)` calls all parking on the lock for one
microtask while `close()` set `_closed`. With the fast path, those
sends all reach the worker's port FIFO before `close()` runs, so the
test now uses a long-running transaction to force the queue. A new
test pins the new fast-path semantics: writes submitted synchronously
before `close()` succeed.

Aggregate: benchmark/profile/results/exp-168-uncontended-write-mutex-fast-path.md

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adopt the conventional spelling used by Java
`Lock.tryLock()`, Rust `Mutex::try_lock`, Python
`Lock.acquire(blocking=False)`, and Go 1.18+
`sync.Mutex.TryLock`. The new API returns `bool` instead of
`Future<void>?`, which reads more naturally at call sites:

  if (!_mutex.tryLock()) {
    return _executeSlow(...);
  }
  // hold lock, send, unlock

The slow-path helpers no longer take a waiter parameter — they
just `await _mutex.lock()` directly. Semantics, tests, and
focused benchmark numbers are unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`_executeSlow` / `_executeBatchSlow` named the helpers for the
dispatch reason (vs. the fast path) rather than what they do.
The actual job is: await the writer mutex, then run an execute
that mirrors the fast-path body. Rename to
`_executeAwaitingLock` / `_executeBatchAwaitingLock` so the
name describes the work, and update the matching doc and test
references.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
danReynolds added a commit that referenced this pull request Jun 15, 2026
Collided with two other exp-168 PRs (the lockSync win #162 keeps 168, the
Database runtime-cache rejection becomes 171). Pure renumber: file, README
row, JOURNAL link, and signals.json key/archive/narrative.
danReynolds added a commit that referenced this pull request Jun 15, 2026
)

* exp 168: synchronous uncontended writer mutex fast-path (rejected)

Tested Mutex.tryLock + non-`async` Writer.execute / executeBatch as the
request-side counterpart to exp 151's rejected response-side
Completer<T>.sync() attempt. Hypothesis: dropping the uncontended
`await _mutex.lock()` microtask hop plus the async wrapper's implicit
reply await would move the Single Inserts (100 sequential) lane.

Paired focused + release runs (3 alternating pairs each):

- Single Inserts (100 sequential): 3.099 → 3.153 ms median (+1.7%)
- writer_pipelining sequential-awaited (2000 writes):
  34.047 → 34.723 ms median (+2.0%)
- Concurrent Single Inserts (100 concurrent): 1.231 → 1.135 ms (-7.8%)
- writer_pipelining concurrent-burst (10×200): 27.524 → 26.981 ms (-2.0%)
- transaction-guardrail (50 tx × 10): 4.611 → 4.770 ms (+3.4%)

The primary sequential lanes moved within ±2 % in the wrong direction.
The only positive signal (Concurrent Single Inserts) is owned by
exp 159 at -58 % to -61 % already.

Rejected as request-scheduling change below current signal. No runtime
code kept; doc + signals carry the durable evidence so the same idea is
not retried without a Dart-runtime change or a workload where main-isolate
scheduling dominates writer-side execution.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* exp 168: add JOURNAL entry on rejecting symmetric-side scheduling tweaks

Records the lesson that exp 151 (response-side) and exp 168 (request-side)
produced the same verdict under the same gate; the candidate that follows
a scheduling-shape rejection should change the mechanism, not the side.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Regenerate experiment history after exp 168 rebase

* Renumber exp 168 (tryLock request-side mutex) -> exp 170

Collided with two other exp-168 PRs (the lockSync win #162 keeps 168, the
Database runtime-cache rejection becomes 171). Pure renumber: file, README
row, JOURNAL link, and signals.json key/archive/narrative.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
danReynolds added a commit that referenced this pull request Jun 15, 2026
Third of the three-way exp-168 collision (lockSync win #162 keeps 168,
tryLock rejection became 170). Renames the experiment + profile-aggregate
files and updates the README row and signals.json key/archive/narrative.
danReynolds added a commit that referenced this pull request Jun 15, 2026
* exp 168: reject Database resolved-runtime cache

Mirror exp 159's `_sendPort` cache one layer up: a sync-readable
`_resolvedRuntime` field on `Database` so post-open `select` /
`selectBytes` / `execute` / `executeBatch` / `transaction` skip the
`await _runtime` microtask hop.

Two order-flipped passes on writer_pipelining.dart produced
alternating-sign deltas inside per-round variance — sequential-awaited
-2.3% / +2.5%, transaction-guardrail -7.5% / +6.1%, concurrent-burst
+4-5% both passes — so the ~1-2 us per-call hop sits at or below the
focused-harness floor.

No runtime code kept. Database-layer microtask hop trimming above the
writer no longer moves exp 159's sequential-write residual floor; the
next reduction candidate must reduce round-trip count (group commit)
or change transport, not chase hops.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Renumber exp 168 (Database resolved-runtime cache) -> exp 171

Third of the three-way exp-168 collision (lockSync win #162 keeps 168,
tryLock rejection became 170). Renames the experiment + profile-aggregate
files and updates the README row and signals.json key/archive/narrative.

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
exp 168 review cleanup: the fast/slow x execute/batch split wrote the
lock/check/send/unlock body four times, with unlock in two sites per
non-async path. Factor it into one generic _sendUnderLock<T>: a single
try/catch/finally where finally runs after send() is evaluated, so the
lock still releases at send time (not reply time) from one place, and a
sync failure becomes Future.error so the public contract never throws
synchronously. execute/executeBatch are now ~5 lines each: tryLock fast
path returns _sendUnderLock(send) directly (win preserved — still
non-async, no microtask hop); the contended path is one _mutex.lock().then.
No public API or behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@danReynolds

Copy link
Copy Markdown
Owner Author

Review cleanup pushed (459433b)

Collapsed the fast/slow × execute/batch structure (4 functions, the lock/check/send/unlock body written 4×, unlock in 2 sites per non-async path) into one generic helper:

Future<ExecuteResponse> execute(String sql, [params, id]) {
  Future<ExecuteResponse> send() => executeInTransaction(sql, params, id);
  if (_mutex.tryLock()) return _sendUnderLock(send);          // fast: non-async, no hop
  return _mutex.lock().then((_) => _sendUnderLock(send));     // contended: one line
}

Future<T> _sendUnderLock<T>(Future<T> Function() send) {
  try {
    if (_closed) throw ResqliteConnectionException('Database is closed.');
    return send();                       // finally runs AFTER this → unlock at send time
  } catch (e, st) {
    return Future.error(e, st);          // preserve 'never throws synchronously'
  } finally {
    _mutex.unlock();                     // one unlock site for both paths
  }
}

Why the shape is irreducible, not arbitrary: the win requires execute be non-async (an async wrapper re-adds the exact microtask hop being removed), and a non-async function can't await, so the contended branch must hand back _mutex.lock().then(...). That branch stays — but as one line, not a second function. The try/catch/finally unifies what was a hand-rolled catch→Future.error (fast) + try/finally (slow): finally runs after return send() evaluates, so early-unlock-at-send-time is preserved with a single unlock site.

Verification: dart analyze clean; 109 tests pass (incl. the close() contention guards); fast path is structurally identical (tryLock → send → unlock), the only addition being one send closure/call (~0.3% of the 2000-write lane, below the win and below noise). Net −45 lines. Focused A/B vs the prior version is neutral; the public-lane win confirmation is the release-suite soak as before.

@danReynolds danReynolds deleted the exp-168-uncontended-write-mutex-fast-path branch June 16, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant