Skip to content

Fix diagnostics reading live NOMUTEX reader connections#156

Merged
danReynolds merged 1 commit into
mainfrom
fix-diagnostics-reader-race
Jun 10, 2026
Merged

Fix diagnostics reading live NOMUTEX reader connections#156
danReynolds merged 1 commit into
mainfrom
fix-diagnostics-reader-race

Conversation

@danReynolds

Copy link
Copy Markdown
Owner

Problem

resqlite_db_status_total skips readers marked in_use — but that flag has been dead code since exp 030 gave workers dedicated readers (exp 051 documented the acquire path as dead). Database.diagnostics() was therefore calling sqlite3_db_status on live NOMUTEX reader connections from the main isolate. SCHEMA_USED measures memory by toggling the connection's pnBytesFreed dry-run mechanism; doing that under a mid-query reader corrupts the reader's allocation accounting — a latent heap-corruption race on main, observed as a flaky reader-isolate SEGV (sqlite3VdbeDelete → allocation-size read at null) at ~1-in-30 stream_test runs once exp 160's detached admission reads made readers reliably busy at diagnostics-poll time.

Fix

Read workers bracket each request with resqlite_reader_set_busy (atomic store, two ~ns leaf FFI calls per request), making the existing busy guard real — diagnostics reports busy readers as a partial snapshot exactly as the in_use contract always intended. The sacrifice path clears the bracket before Isolate.exit (exit skips finally).

Split out of #155 so the crash fix lands on main independently of the IVM review; the investigation record (bisection, crash trace, stress validation) lives in that PR's experiment doc.

Test plan

  • Crash reproduced locally pre-fix (bisection: gone with detached reads disabled, 60/60)
  • Post-fix on the exp 160 branch: 100/100 stream_test stress iterations + 8/8 full-suite runs clean
  • This branch: dart analyze clean; stream/diagnostics/database tests pass

🤖 Generated with Claude Code

resqlite_db_status_total skips readers marked in_use, but that flag has
been dead code since exp 030 moved workers to dedicated reader
assignment — Database.diagnostics() was calling sqlite3_db_status on
reader connections actively executing queries on their worker threads.
Connections are NOMUTEX, and SQLITE_DBSTATUS_SCHEMA_USED measures memory
via the connection's pnBytesFreed dry-run mechanism; toggling it under a
mid-query reader corrupts that reader's allocation accounting (observed
as a flaky reader-isolate SEGV in sqlite3VdbeDelete once exp 160's
detached admission reads made readers reliably busy at diagnostics-poll
time, ~1-in-30 stream_test runs).

Read workers now bracket each request with resqlite_reader_set_busy
(atomic store, two leaf FFI calls per request), making the existing busy
guard real; diagnostics reports those readers as a partial snapshot
exactly as the existing in_use contract intended. The sacrifice path
clears the bracket before Isolate.exit since exit skips finally.
Validated on the exp 160 branch: crash reproduced pre-fix, 100/100
clean stress iterations and 8/8 full-suite runs post-fix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@danReynolds danReynolds merged commit 19256e7 into main Jun 10, 2026
5 checks passed
danReynolds added a commit that referenced this pull request Jun 10, 2026
The rebase onto main (which landed the diagnostics race fix via #156)
re-applied the function on top of main's copy.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@danReynolds danReynolds deleted the fix-diagnostics-reader-race branch June 16, 2026 15:14
@danReynolds danReynolds mentioned this pull request Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant