Skip to content

Adopt SQLite project unit test harness#692

Open
brandonpayton wants to merge 15 commits into
mainfrom
integration/kad-wtb-sqlite-testing
Open

Adopt SQLite project unit test harness#692
brandonpayton wants to merge 15 commits into
mainfrom
integration/kad-wtb-sqlite-testing

Conversation

@brandonpayton

@brandonpayton brandonpayton commented Jun 13, 2026

Copy link
Copy Markdown
Member

Summary

Adopts the SQLite project-unit harness work from PR #5 into Kandelo and records
both-host validation status against SQLite's official full permutation.

This PR adds scripts/run-sqlite-project-unit-tests.sh, documents the harness in
the porting guide, disables accidental default browser syscall tracing for the
SQLite demo runner, improves browser artifact snapshotting, fixes browser VFS
open-unlink lifetime behavior, and stabilizes the browser threaded-sorter path
used by sort4.test.

Validation Status

Current completion target: SQLite official full permutation on both Node and
browser. The larger all permutation is tracked separately as kad-29m.

Full hard-count report: test-runs/gastown-sqlite-epic-synthesis/final-hard-counts.md.

Node full snapshot:

  • Command: /bin/bash scripts/dev-shell.sh /bin/bash scripts/run-sqlite-project-unit-tests.sh --host node --permutation full --jobs 2 --timeout-ms 21600000 --results-root test-runs/gastown-sqlite-node-full-pr5
  • Artifacts: test-runs/gastown-sqlite-node-full-pr5/{command.log,host-status.tsv,node/summary.txt,node/failures.tsv,node/testrunner.db,node/testrunner.log}
  • Runner status: node 143, command exit_status=1 after summary write.
  • Hard counts: 1394 jobs total, 0 done, 0 failed, 0 omit/skip, 1 running, 1393 ready, 0 cases, 0 case errors.
  • kad-36g fixed the Mach-O exec-resolution wedge that caused this snapshot. No later full-suite Node DB is present in the final artifact set.

Browser full snapshot:

  • Command: bash scripts/run-sqlite-project-unit-tests.sh --host browser --permutation full --jobs 2 --timeout-ms 21600000 --results-root test-runs/gastown-sqlite-browser-full-pr5-snapshot
  • Artifacts: test-runs/gastown-sqlite-browser-full-pr5-snapshot/{run.log,host-status.tsv,combined-summary.md,browser/summary.txt,browser/failures.tsv,browser/testrunner.db,browser/testrunner.log}
  • Runner status: browser 1, page navigation/reload while Playwright was waiting in page.evaluate().
  • Hard counts: 1393 jobs total, 58 done, 4 failed, 0 omit/skip, 2 running, 1329 ready, 20066 cases, 1004 case errors.
  • The SQLite testrunner records done, failed, omit, running, and ready; it does not record XFAIL/XPASS/flaky fields.

Focused Superseding Results

The browser full snapshot's failed/running rows were followed by focused reruns:

Host Test Focused result Follow-up
browser sysfault.test PASS, 1365 cases / 0 errors Original full-snapshot failures did not reproduce after rebuild.
browser writecrash.test FAIL, 158 cases / 1 error kad-wtb.19: browser executable visibility/materialization after repeated crash-child iterations.
node writecrash.test PASS, 995 cases / 0 errors Node comparison passes.
browser walfault.test FAIL, 1 recorded case / 1 error kad-wtb.20: browser Tcl abort plus kernel munmap trap.
node walfault.test TIME/RUNNING, 0 cases / 0 errors Did not hit the browser abort path before timeout.
node like.test PASS, 159 cases / 0 errors Original browser like-14.2 concern is timing-threshold behavior.
browser like.test default cap PASS twice, 159 cases / 0 errors each Diagnostic 16384-page comparison fails adjacent timing case like-14.1, not like-14.2.
browser savepoint6.test PASS, 8007 cases / 0 errors Fixed by SharedFS open-unlink/rename-over lifetime handling.
browser sort4.test PASS, 11 cases / 0 errors Browser threaded-sorter crash/stall fixed by kad-wtb.9.
node sort4.test FAIL, 11 cases / 5 errors kad-wtb.21: Node temp database open failures in sort4-2.3/2.4/2.5/2.6/2.8.

No SQLite test was skipped or xfailed as a substitute for runtime/platform work.

Artifacts

  • Node full snapshot: test-runs/gastown-sqlite-node-full-pr5/
  • Browser full snapshot: test-runs/gastown-sqlite-browser-full-pr5-snapshot/
  • Epic synthesis: test-runs/gastown-sqlite-epic-synthesis/summary.md
  • Final hard counts: test-runs/gastown-sqlite-epic-synthesis/final-hard-counts.md
  • LIKE focused artifacts: test-runs/kad-wtb13-like-*
  • Fault/crash focused report: test-runs/gastown-sqlite-browser-full-pr5-snapshot/browser-fault-recheck.md

Test Verification

Latest child branches recorded the full Kandelo gate suite before merge into
integration/kad-wtb-sqlite-testing: cargo test -p kandelo --target aarch64-apple-darwin --lib, cd host && npx vitest run,
scripts/run-libc-tests.sh, scripts/run-posix-tests.sh, and
scripts/dev-shell.sh bash scripts/check-abi-version.sh.

@brandonpayton brandonpayton force-pushed the integration/kad-wtb-sqlite-testing branch from 93ef2b4 to 92cd505 Compare June 13, 2026 12:47
@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown

Phase B-1 matrix build status — pr-692-staging

ABI v15. 66 built, 0 failed, 66 total.

Package Arch Status Sha
libcurl wasm32 built ce011fa0
libcxx wasm32 built cb86af2e
libcxx wasm64 built 6a6ad11d
libpng wasm32 built 72185039
libxml2 wasm32 built df935363
libxml2 wasm64 built 98da11bb
ncurses wasm32 built 7e3c0e90
openssl wasm32 built a500b5dc
openssl wasm64 built e4db922d
sqlite wasm32 built 621a54bc
sqlite wasm64 built 5bcb5605
zlib wasm32 built 40932d64
zlib wasm64 built 4ebfa8c2
bash wasm32 built 2f184714
bc wasm32 built 679b5b57
bzip2 wasm32 built d25eab2e
coreutils wasm32 built 2f577527
curl wasm32 built 8958e33c
dash wasm32 built 5ae935a3
diffutils wasm32 built 6283becf
dinit wasm32 built c007ac0a
fbdoom wasm32 built a07b454d
file wasm32 built 02b42883
findutils wasm32 built 2264e878
gawk wasm32 built 263d59ae
git wasm32 built 0398f88b
grep wasm32 built 06cfacd3
gzip wasm32 built f1ea005c
kandelo-sdk wasm32 built be92ea52
kernel wasm32 built 98e21a05
less wasm32 built a7db2115
lsof wasm32 built 23cde9ae
m4 wasm32 built a8e81cc0
make wasm32 built 3ade5be8
mariadb wasm32 built b98ae7c6
mariadb wasm64 built ed07a0b9
msmtpd wasm32 built 263d33ef
nano wasm32 built 844e9bbe
netcat wasm32 built 72fd3e91
nethack wasm32 built 93ed633b
nginx wasm32 built 8ef1852a
php wasm32 built 31f3844d
posix-utils-lite wasm32 built fa6d1074
sed wasm32 built 498dd764
spidermonkey wasm32 built fc097aa8
tar wasm32 built 20922362
tcl wasm32 built cb2a2699
unzip wasm32 built a27a6076
userspace wasm32 built 9eb2c94c
vim wasm32 built b8621f5b
wget wasm32 built d8367779
xz wasm32 built 8f63492f
zip wasm32 built 0e37cb2a
zstd wasm32 built ae08e187
lamp wasm32 built 4abe20e8
mariadb-test wasm32 built 3a1dce84
mariadb-vfs wasm32 built e655616c
mariadb-vfs wasm64 built 255bfde1
nethack-browser-bundle wasm32 built b33d5459
node wasm32 built 288ce1f6
rootfs wasm32 built 04a457ae
spidermonkey-node wasm32 built 2ac80e33
vim-browser-bundle wasm32 built 6c746258
wordpress wasm32 built 7255dced
shell wasm32 built 2451b31b
node-vfs wasm32 built f799a4df

Auto-generated; replaced on each push. Raw data in the publish-status workflow artifact.

Return the active thread id for gettid/set_tid_address and keep clone helper state consistent in host tests.

Add a wasm32posix pthread_create overlay that preserves 16-byte shadow-stack alignment after reserving startup arguments, so variadic calls in worker threads read 64-bit arguments correctly.

Cover the SQLite sorter-temp failure mode with a pthread regression that exercises va_arg(uint64_t) and snprintf(%llx%c) from a worker thread.
@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite project-unit harness interim status, 2026-06-18

Tracking: kd-8ei / SQLite PR #692 audit, active child kd-nbh.

This work is still active, but the PR had not been updated while the run was blocked by Kandelo platform crashes. This comment records the current state; it is not a pass claim and does not replace the final PR description update.

Current scope

  • Target currently being driven: SQLite official all permutation on Kandelo.
  • Fresh --explain runs after the latest local kernel refresh discovered 10,523 testrunner jobs/scripts on each host:
    • Node: 10,523 total, 0 done, 0 failed, 0 omitted, 10,523 ready.
    • Browser: 10,523 total, 0 done, 0 failed, 0 omitted, 10,523 ready.
  • Artifact: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-pathfix-20260618-1805/combined-summary.md.

Platform blockers hit before the current retry

These were incomplete Kandelo platform crashes, not SQLite pass/fail results:

Retry First blocker Status
Node all, jobs=32 fcntl lock/kernel crash cascade Quarantined with DB/log artifacts.
After pipe fix eager exec-state allocation trap Fixed locally; focused tests passed.
After exec-state fix munmap metadata allocation trap Fixed locally; focused munmap tests passed.
After munmap fix stat/path normalization allocation trap Fixed locally; full cargo test -p kandelo --target aarch64-apple-darwin --lib passed, 964 tests.

The platform fixes above still need to be reviewed as commits/PR updates before this PR can be considered publication-complete.

Current live run

Command policy:

timeout 86400s scripts/dev-shell.sh bash -lc 'scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pathfix-20260618-1810 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pathfix-20260618-1810/workdir --keep-workdir'
  • Artifact root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pathfix-20260618-1810/.
  • Status around 2026-06-18 18:24 EDT: Node process is CPU-active, no RuntimeError, kernel threw, UNCAUGHT, memory access out of bounds, or unreachable signatures in command.log.
  • Visible progress: around tcl(104/10523), with SQLite job failures being recorded, but the runner has not completed.
  • Timeout or crash remains classified as incomplete, not pass/fail.

Next update expected

The next PR update should happen when one of these occurs:

  • the current Node all run completes and produces a runner summary;
  • it hits another Kandelo platform blocker and artifacts are quarantined;
  • it reaches the hard timeout; or
  • the local platform fixes are split/committed/pushed for review.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite PR #692 audit update: Node all retry hit a new Kandelo platform crash/quarantine, so this is still incomplete and not a SQLite pass/fail suite result.

Scope and policy:

  • Host: node
  • Upstream public SQLite permutation: all
  • Jobs: 32
  • Runner timeout: 86400000 ms
  • Shell timeout: 86400s
  • Timeout/crash policy: incomplete, not pass/fail
  • Command:
    timeout 86400s scripts/dev-shell.sh bash -lc "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pathfix-20260618-1810 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pathfix-20260618-1810/workdir --keep-workdir"

Run outcome:

  • Started: 2026-06-18 18:06:01 EDT
  • Stopped: 2026-06-18 18:45:02 EDT, status 143 after SIGTERM
  • Last healthy progress before first trap: about 36:25 tcl(148/10523) f34 r32
  • First trap: syscall=46 (Mmap) args [0,65536,3,34,-1,0], RuntimeError: unreachable
  • Cascade: repeated CentralizedKernelWorker.handleFcntlLock RuntimeError entries, then broader memory access out of bounds traps in syscalls including writev, read, munmap, and process-exit paths
  • Process state before stop: Node process alive, sleeping, 0 CPU

Stopped copied DB counts, diagnostic only:

  • total jobs: 10523
  • done rows: 115
  • failed rows: 34
  • omitted rows: 0
  • running rows: 32
  • ready rows: 10342
  • SQLite cases recorded: 10660
  • case errors recorded: 454
  • copied DB PRAGMA integrity_check: ok

Diagnostic failed rows, not final suite failures:

test/sysfault.test
test/writecrash.test
ext/fts5/test/fts5prefix.test
ext/fts5/test/fts5interrupt.test
test/manydb.test
test/vtabI.test
test/gcfault.test
test/like.test
test/delete.test
test/exists.test
test/bigfile2.test
test/sort4.test
ext/fts5/test/fts5optimize2.test
config=memsubsys1 ext/fts5/test/fts5optimize3.test
config=memsubsys1 ext/fts5/test/fts5optimize2.test
config=memsubsys2 ext/fts5/test/fts5optimize3.test
config=memsubsys2 ext/fts5/test/fts5optimize2.test
config=multithread test/sort4.test
config=no_mutex_try ext/fts5/test/fts5optimize3.test
config=no_mutex_try ext/fts5/test/fts5optimize2.test
config=journaltest ext/fts5/test/fts5optimize3.test
config=journaltest ext/fts5/test/fts5optimize2.test
config=inmemory_journal ext/fts5/test/fts5optimize3.test
config=inmemory_journal ext/fts5/test/fts5optimize2.test
config=prepare ext/fts5/test/fts5optimize3.test
config=prepare test/busy2.test
config=prepare ext/fts5/test/fts5optimize2.test
config=prepare test/walsetlk.test
config=prepare test/wal3.test
config=mmap ext/fts5/test/fts5optimize3.test
config=mmap test/busy2.test
config=mmap ext/fts5/test/fts5optimize2.test
config=mmap test/walsetlk.test
config=mmap test/wal3.test

Diagnostic running rows at stop:

test/boundary2.test
test/fkey_malloc.test
test/savepoint6.test
ext/fts5/test/fts5ac.test
test/walfault.test
ext/fts5/test/fts5corrupt2.test
ext/fts5/test/fts5securefault.test
test/walcrash2.test
test/fts3defer.test
test/quota.test
test/types2.test
test/capi2.test
test/avtrans.test
test/fuzz-oss1.test
test/createtab.test
test/insert.test
config=prepare test/temptable2.test
config=prepare ext/fts5/test/fts5secure7.test
config=prepare ext/fts5/test/fts5ah.test
config=prepare test/round1.test
config=prepare test/joinD.test
config=prepare test/memjournal2.test
config=prepare test/vacuum6.test
config=prepare ext/fts5/test/fts5secure3.test
config=prepare test/vacuummem.test
config=mmap test/temptable2.test
config=mmap ext/fts5/test/fts5ah.test
config=mmap test/round1.test
config=mmap test/memjournal2.test
config=mmap test/vacuum6.test
config=mmap ext/fts5/test/fts5secure3.test
config=mmap test/vacuummem.test

Artifacts:

  • run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pathfix-20260618-1810
  • quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pathfix-20260618-1810/quarantine-kernel-crash-20260618-1845/
  • key files: README.txt, command.log.normalized, first-trap-excerpt.txt, trap-snippets-before-stop.txt, diagnostic-counts.txt, diagnostic-failed-jobs.txt, diagnostic-running-jobs.txt, copied queryable DB files, and raw-stopped-db/

Publication status:

  • This comment publishes the new blocker/quarantine milestone to PR Adopt SQLite project unit test harness #692.
  • No new commits are pushed for this milestone yet.
  • Next action remains in kd-nbh: diagnose/fix the Mmap/low-memory crash root cause, rerun fresh Node/browser all --explain, then retry the selected public Tcl suite.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite PR #692 audit update: local mmap metadata-allocation fix is applied and the required fresh Node/browser all --explain completed after the source/kernel refresh.

Local fix summary:

  • Root cause addressed: anonymous mmap metadata management could allocate through infallible Vec paths under low-memory SQLite workers, aborting the kernel with RuntimeError: unreachable instead of returning ENOMEM.
  • Changed crates/kernel/src/memory.rs to add a fallible try_mmap_anonymous, make mmap gap search allocation-free, and make host-reserved region insertion use try_reserve.
  • Changed syscall mmap/mremap callers in crates/kernel/src/syscalls.rs to propagate the fallible mmap result.
  • Added focused memory-manager tests for fallible mmap/capacity-sensitive paths.

Validation run locally:

  • cargo fmt -p kandelo
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin try_mmap --lib' => 2 passed
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin test_mmap_zero_length_fails --lib' => 1 passed
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin mmap --lib' => 30 passed
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin memory::tests --lib' => 35 passed
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin --lib' => 966 passed
  • Release Wasm rebuild succeeded with existing wasm_api warnings.
  • Rebuilt kernel copied to:
    • target/wasm32-unknown-unknown/release/kandelo_kernel.wasm
    • local-binaries/kernel.wasm
    • host/wasm/kandelo-kernel.wasm
    • size: 673096 bytes

Fresh all --explain after rebuild:

  • Command:
    scripts/dev-shell.sh bash -lc 'scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-explain-after-mmapfix-20260618-1852'
  • Combined artifact:
    test-runs/sqlite-project-unit-all/kd-nbh-explain-after-mmapfix-20260618-1852/combined-summary.md

Explain counts:

Host Runner exit Total jobs Done Failed Omitted Running Ready SQLite cases Case errors
node 0 10523 0 0 0 0 10523 0 0
browser 0 10523 0 0 0 0 10523 0 0

Publication status:

  • This comment publishes the completed post-fix explain milestone to PR Adopt SQLite project unit test harness #692.
  • No commits have been pushed yet.
  • Next action: retry the selected public Tcl all suite on Node under the documented hard-timeout policy (--timeout-ms 86400000, shell timeout 86400s; timeout/crash remains incomplete, not pass/fail).

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: Node all/jobs32 crashed/wedged after mmap fix

This is an incomplete platform-blocker run, not a SQLite pass/fail result. The run passed the previous mmap trap point, then hit kernel/runtime traps and wedged with the Node process alive at 0%% CPU. I preserved artifacts, sent SIGINT, and the wrapper exited status 130.

Command/policy:

  • Host/permutation/jobs: node all jobs=32
  • Runner timeout: 86400000 ms
  • Shell timeout: 86400s
  • Stop outcome: manual SIGINT after crash/wedge; wrapper status 130
  • Timeout/crash policy: incomplete, not pass/fail
  • Command: timeout 86400s scripts/dev-shell.sh bash -lc "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/workdir --keep-workdir"

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912
  • Raw stopped DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/raw-stopped-db/testrunner.db*
  • Query DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/query-db/testrunner.db
  • Counts: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/db-counts.txt and test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/db-state-counts.txt
  • Complete stopped-state failed jobs: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/failed-jobs.txt
  • Complete stopped-state running jobs: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/running-jobs.txt
  • Trap signatures: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/trap-signatures-post-stop.txt (2671 matching lines)
  • First trap excerpt: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/first-trap-signatures.txt
  • Command logs: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/command.pre-stop.log and test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-mmapfix-20260618-1857/quarantine-kernel-crash-20260618-1912/command.post-stop.log

Stopped-state DB counts from copied DB (integrity_check=ok):

total_jobs  done  failed  omitted  running  ready  total_cases  case_errors
----------  ----  ------  -------  -------  -----  -----------  -----------
10523       157   32      0        32       10302  14639        461        

State counts:

state    jobs   cases  errors
-------  -----  -----  ------
done     157    12967  0     
failed   32     1672   461   
ready    10302  0      0     
running  32     0      0     

Crash signature summary: first recorded trap was syscall=1 RuntimeError: unreachable at normalized command log line 176; repeated handleFcntlLock RuntimeError lines follow; the final cascade includes memory access out of bounds in syscalls 47/40/64/124. This is still a kernel/runtime blocker, not a SQLite failure classification.

Decision: keeping kd-nbh open and investigating/fixing this blocker in-place before another full retry; no separate bead yet because it directly blocks the assigned audit.

Complete stopped-state failed jobs (diagnostic only)
jobid  displaytype  displayname                                               state   cases  errors  span_ms
-----  -----------  --------------------------------------------------------  ------  -----  ------  -------
3      tcl          test/sysfault.test                                        failed  1      1       367501 
5      tcl          test/writecrash.test                                      failed  35     3       100901 
14     tcl          ext/fts5/test/fts5interrupt.test                          failed  311    4       911820 
19     tcl          test/manydb.test                                          failed  901    348     504356 
26     tcl          test/vtabI.test                                           failed  16     3       59096  
27     tcl          test/gcfault.test                                         failed  1      1       150034 
53     tcl          test/delete.test                                          failed  68     6       29673  
83     tcl          test/exists.test                                          failed  73     9       42818  
104    tcl          test/bigfile2.test                                        failed  4      2       8003   
748    tcl          test/sort4.test                                           failed  11     5       145098 
801    tcl          ext/fts5/test/fts5optimize2.test                          failed  4      2       39037  
1752   tcl          config=memsubsys1 ext/fts5/test/fts5optimize3.test        failed  4      2       50204  
2209   tcl          config=memsubsys1 ext/fts5/test/fts5optimize2.test        failed  4      2       51789  
3081   tcl          config=memsubsys2 ext/fts5/test/fts5optimize3.test        failed  4      2       53605  
3539   tcl          config=memsubsys2 ext/fts5/test/fts5optimize2.test        failed  4      2       43020  
4137   tcl          config=multithread test/sort4.test                        failed  51     35      389403 
4548   tcl          config=no_mutex_try ext/fts5/test/fts5optimize3.test      failed  4      2       57271  
5008   tcl          config=no_mutex_try ext/fts5/test/fts5optimize2.test      failed  4      2       59147  
5862   tcl          config=journaltest ext/fts5/test/fts5optimize3.test       failed  4      2       60998  
6262   tcl          config=journaltest ext/fts5/test/fts5optimize2.test       failed  4      2       63266  
7038   tcl          config=inmemory_journal ext/fts5/test/fts5optimize3.test  failed  4      2       48548  
7472   tcl          config=inmemory_journal ext/fts5/test/fts5optimize2.test  failed  4      2       50774  
8357   tcl          config=prepare ext/fts5/test/fts5optimize3.test           failed  4      2       52435  
8585   tcl          config=prepare test/busy2.test                            failed  29     5       67926  
8778   tcl          config=prepare ext/fts5/test/fts5optimize2.test           failed  4      2       68969  
8858   tcl          config=prepare test/walsetlk.test                         failed  40     1       58434  
9035   tcl          config=prepare test/wal3.test                             failed  1      1       196763 
9579   tcl          config=mmap ext/fts5/test/fts5optimize3.test              failed  4      2       70334  
9807   tcl          config=mmap test/busy2.test                               failed  29     5       64499  
10000  tcl          config=mmap ext/fts5/test/fts5optimize2.test              failed  4      2       72008  
10082  tcl          config=mmap test/walsetlk.test                            failed  40     1       69961  
10259  tcl          config=mmap test/wal3.test                                failed  1      1       189425 
Complete stopped-state running jobs at stop time
jobid  displaytype  displayname                                    state    cases  errors  span_ms
-----  -----------  ---------------------------------------------  -------  -----  ------  -------
37     tcl          test/boundary2.test                            running  0      0       0      
47     tcl          test/fkey_malloc.test                          running  0      0       0      
49     tcl          test/savepoint6.test                           running  0      0       0      
59     tcl          test/walfault.test                             running  0      0       0      
85     tcl          ext/fts5/test/fts5synonym2.test                running  0      0       0      
96     tcl          test/walcrash2.test                            running  0      0       0      
110    tcl          test/fts3defer.test                            running  0      0       0      
114    tcl          test/quota.test                                running  0      0       0      
126    tcl          test/avtrans.test                              running  0      0       0      
139    tcl          ext/fts5/test/fts5bigid.test                   running  0      0       0      
149    tcl          test/wal5.test                                 running  0      0       0      
156    tcl          test/wal2.test                                 running  0      0       0      
164    tcl          test/json101.test                              running  0      0       0      
167    tcl          ext/fts5/test/fts5faultH.test                  running  0      0       0      
169    tcl          test/crash7.test                               running  0      0       0      
172    tcl          test/trigger6.test                             running  0      0       0      
311    tcl          ext/fts5/test/fts5optimize3.test               running  0      0       0      
8156   tcl          config=prepare ext/fts5/test/fts5ah.test       running  0      0       0      
8377   tcl          config=prepare test/temptable2.test            running  0      0       0      
8701   tcl          config=prepare ext/fts5/test/fts5secure7.test  running  0      0       0      
8732   tcl          config=prepare test/round1.test                running  0      0       0      
8775   tcl          config=prepare test/memjournal2.test           running  0      0       0      
8906   tcl          config=prepare test/vacuum6.test               running  0      0       0      
8962   tcl          config=prepare test/vacuummem.test             running  0      0       0      
9117   tcl          config=prepare ext/fts5/test/fts5secure3.test  running  0      0       0      
9378   tcl          config=mmap ext/fts5/test/fts5ah.test          running  0      0       0      
9599   tcl          config=mmap test/temptable2.test               running  0      0       0      
9954   tcl          config=mmap test/round1.test                   running  0      0       0      
9997   tcl          config=mmap test/memjournal2.test              running  0      0       0      
10130  tcl          config=mmap test/vacuum6.test                  running  0      0       0      
10186  tcl          config=mmap test/vacuummem.test                running  0      0       0      
10341  tcl          config=mmap ext/fts5/test/fts5secure3.test     running  0      0       0      

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: open-path kernel refresh and fresh all --explain

This is a milestone update, not a final SQLite pass/fail report. After the post-mmapfix run crashed first in SYS_OPEN, I fixed the remaining runtime syscall paths that still used infallible path normalization/resolution under low-memory pressure. Those paths now use try_resolve_path / try_normalize_path and return ENOMEM instead of trapping the kernel.

Validation/build commands:

  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin path::tests --lib && cargo test -p kandelo --target aarch64-apple-darwin sys_open --lib && cargo test -p kandelo --target aarch64-apple-darwin realpath --lib && cargo test -p kandelo --target aarch64-apple-darwin unix --lib' => path::tests 17 passed; sys_open filter matched 0; realpath 9 passed; unix 19 passed.\n- scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin open --lib && cargo test -p kandelo --target aarch64-apple-darwin statfs --lib && cargo test -p kandelo --target aarch64-apple-darwin --lib' => open 30 passed; statfs 1 passed; full kernel lib 966 passed.\n- Initial release rebuild attempt with login shell failed because Homebrew stable cargo was ahead of the Nix nightly in PATH: scripts/dev-shell.sh bash -lc 'cargo build -p kandelo --target wasm32-unknown-unknown --release -Z build-std=core,alloc' => cargo rejected -Z.\n- Successful release rebuild used the non-login dev shell PATH: scripts/dev-shell.sh bash -c 'cargo build -p kandelo --target wasm32-unknown-unknown --release -Z build-std=core,alloc' => exit 0 with Nix cargo 1.97.0-nightly.\n- Copied rebuilt kernel to local-binaries/kernel.wasm and host/wasm/kandelo-kernel.wasm; size 674032 bytes, timestamp Jun 18 19:22 EDT.\n\nRequired fresh explain after source/kernel refresh:\n- Command: scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-explain-after-openpathfix-20260618-1923'\n- Results root: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-openpathfix-20260618-1923\n- Combined artifact: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-openpathfix-20260618-1923/combined-summary.md\n- Node: runner exit 0; total_jobs 10523; done 0; failed 0; omitted 0; running 0; ready 10523; SQLite cases 0; case errors 0.\n- Browser: runner exit 0; total_jobs 10523; done 0; failed 0; omitted 0; running 0; ready 10523; SQLite cases 0; case errors 0.\n\nScope/status:\n- This explain run only enumerates the public upstream SQLite Tcl all job set; it executes zero cases, so it is not a pass/fail result.\n- TH3/private SQLite suites and non-public/out-of-scope suites remain skipped/out of scope for this audit.\n- Next action: rerun the selected public Tcl all suite on Node with jobs=32, runner timeout 86400000 ms and shell timeout 86400s; any timeout/crash remains incomplete rather than pass/fail.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: Node all/jobs32 crashed after open-path fix

This is an incomplete Kandelo platform-blocker run, not a SQLite pass/fail result. The run passed the prior SYS_OPEN crash point, then hit a new kernel trap in SYS_GETDENTS64 and cascaded through fcntl/memory traps. I preserved artifacts, stopped the runner, copied the stopped DB/WAL/SHM, and queried only the copied database.

Command/policy:

  • Host/permutation/jobs: node all jobs=32
  • Runner timeout: 86400000 ms
  • Shell timeout: 86400s
  • Stop outcome: manual SIGINT after platform crash; wrapper/session status 120
  • Timeout/crash policy: incomplete, not pass/fail
  • Command: timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-openpathfix-20260618-1925 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-openpathfix-20260618-1925/workdir --keep-workdir"

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-openpathfix-20260618-1925
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-openpathfix-20260618-1925/quarantine-kernel-crash-20260618-1939
  • Authoritative raw stopped DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-openpathfix-20260618-1925/quarantine-kernel-crash-20260618-1939/raw-stopped-db/testrunner.db*
  • Authoritative query DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-openpathfix-20260618-1925/quarantine-kernel-crash-20260618-1939/query-db/testrunner.db
  • Counts: db-counts.txt and db-state-counts.txt in the quarantine directory
  • Complete stopped-state failed/running jobs: failed-jobs.txt and running-jobs.txt in the quarantine directory
  • Logs/signatures: command.pre-stop.log, command.post-stop.log, command.post-stop.normalized.log, trap-signatures-before-stop.txt (3218 matching lines), first-trap-signatures.txt, command-tail-before-stop.txt

Copied DB integrity: ok.

Stopped-state diagnostic counts from copied DB:

total_jobs  done  failed  omitted  running  ready  total_cases  case_errors
----------  ----  ------  -------  -------  -----  -----------  -----------
10523       135   33      0        32       10323  12343        506        

State counts:

state    jobs   cases  errors
-------  -----  -----  ------
done     135    10760  0     
failed   33     1583   506   
ready    10323  0      0     
running  32     0      0     

Crash signature summary:

  • Last progress before first trap: about 12:27 tcl(166/10523) f33 r32.
  • First recorded trap: [handleSyscall] kernel threw for pid=216 syscall=122 args=[3,2289096,2048,0,0,0]: RuntimeError: unreachable.
  • syscall=122 is SYS_GETDENTS64.
  • Stack maps through kernel_getdents64; immediately after, a process worker reported Fork failed: errno=12, then repeated handleFcntlLock RuntimeErrors and later memory-access-out-of-bounds traps. I am treating the fcntl/bounds lines as secondary until the getdents64 trap is explained.

The rows below are diagnostic stopped-state rows only because the platform crashed before suite completion.

Complete stopped-state failed jobs
jobid  displaytype  displayname                                               state   cases  errors  span_ms
-----  -----------  --------------------------------------------------------  ------  -----  ------  -------
3      tcl          test/sysfault.test                                        failed  1      1       306816 
5      tcl          test/writecrash.test                                      failed  44     3       68859  
9      tcl          ext/rtree/rtree4.test                                     failed  1      1       308115 
15     tcl          ext/rtree/rtree1.test                                     failed  1      1       160820 
21     tcl          test/manydb.test                                          failed  901    348     325224 
29     tcl          test/badutf.test                                          failed  37     25      49762  
48     tcl          test/like.test                                            failed  159    1       115565 
55     tcl          test/delete.test                                          failed  68     6       16706  
86     tcl          test/exists.test                                          failed  73     9       21526  
102    tcl          test/bigfile2.test                                        failed  4      2       10181  
747    tcl          test/sort4.test                                           failed  11     5       182078 
1056   tcl          ext/fts5/test/fts5optimize2.test                          failed  4      2       49754  
1998   tcl          config=memsubsys1 ext/fts5/test/fts5optimize3.test        failed  4      2       50801  
2440   tcl          config=memsubsys1 ext/fts5/test/fts5optimize2.test        failed  4      2       52025  
3328   tcl          config=memsubsys2 ext/fts5/test/fts5optimize3.test        failed  4      2       53047  
3770   tcl          config=memsubsys2 ext/fts5/test/fts5optimize2.test        failed  4      2       54327  
4137   tcl          config=multithread test/sort4.test                        failed  83     62      391287 
4796   tcl          config=no_mutex_try ext/fts5/test/fts5optimize3.test      failed  4      2       55816  
5239   tcl          config=no_mutex_try ext/fts5/test/fts5optimize2.test      failed  4      2       56972  
6072   tcl          config=journaltest ext/fts5/test/fts5optimize3.test       failed  4      2       58134  
6459   tcl          config=journaltest ext/fts5/test/fts5optimize2.test       failed  4      2       59485  
7274   tcl          config=inmemory_journal ext/fts5/test/fts5optimize3.test  failed  4      2       60514  
7690   tcl          config=inmemory_journal ext/fts5/test/fts5optimize2.test  failed  4      2       61662  
8578   tcl          config=prepare test/busy2.test                            failed  29     4       19726  
8580   tcl          config=prepare ext/fts5/test/fts5optimize3.test           failed  4      2       62550  
8852   tcl          config=prepare test/walsetlk.test                         failed  40     1       22393  
8988   tcl          config=prepare ext/fts5/test/fts5optimize2.test           failed  4      2       63315  
9035   tcl          config=prepare test/wal3.test                             failed  1      1       208208 
9800   tcl          config=mmap test/busy2.test                               failed  29     4       30055  
9802   tcl          config=mmap ext/fts5/test/fts5optimize3.test              failed  4      2       63800  
10076  tcl          config=mmap test/walsetlk.test                            failed  40     1       35974  
10212  tcl          config=mmap ext/fts5/test/fts5optimize2.test              failed  4      2       64360  
10259  tcl          config=mmap test/wal3.test                                failed  1      1       224671 
Complete stopped-state running jobs at stop time
jobid  displaytype  displayname                                    state    cases  errors  span_ms
-----  -----------  ---------------------------------------------  -------  -----  ------  -------
17     tcl          ext/fts5/test/fts5fault9.test                  running  0      0       0      
23     tcl          ext/fts5/test/fts5fault6.test                  running  0      0       0      
27     tcl          ext/fts5/test/fts5fault3.test                  running  0      0       0      
40     tcl          test/boundary2.test                            running  0      0       0      
49     tcl          test/fkey_malloc.test                          running  0      0       0      
51     tcl          test/savepoint6.test                           running  0      0       0      
62     tcl          test/walfault.test                             running  0      0       0      
74     tcl          ext/fts5/test/fts5porter.test                  running  0      0       0      
95     tcl          test/walcrash2.test                            running  0      0       0      
108    tcl          test/fts3defer.test                            running  0      0       0      
113    tcl          test/quota.test                                running  0      0       0      
123    tcl          ext/fts5/test/fts5update.test                  running  0      0       0      
128    tcl          test/avtrans.test                              running  0      0       0      
143    tcl          ext/intck/intck1.test                          running  0      0       0      
147    tcl          test/triggerB.test                             running  0      0       0      
148    tcl          test/upsert4.test                              running  0      0       0      
582    tcl          ext/fts5/test/fts5optimize3.test               running  0      0       0      
8297   tcl          config=prepare ext/fts5/test/fts5ah.test       running  0      0       0      
8384   tcl          config=prepare test/temptable2.test            running  0      0       0      
8464   tcl          config=prepare ext/fts5/test/fts5secure7.test  running  0      0       0      
8731   tcl          config=prepare test/round1.test                running  0      0       0      
8775   tcl          config=prepare test/memjournal2.test           running  0      0       0      
8879   tcl          config=prepare ext/fts5/test/fts5secure3.test  running  0      0       0      
8901   tcl          config=prepare test/vacuum6.test               running  0      0       0      
8962   tcl          config=prepare test/vacuummem.test             running  0      0       0      
9519   tcl          config=mmap ext/fts5/test/fts5ah.test          running  0      0       0      
9606   tcl          config=mmap test/temptable2.test               running  0      0       0      
9953   tcl          config=mmap test/round1.test                   running  0      0       0      
9997   tcl          config=mmap test/memjournal2.test              running  0      0       0      
10103  tcl          config=mmap ext/fts5/test/fts5secure3.test     running  0      0       0      
10125  tcl          config=mmap test/vacuum6.test                  running  0      0       0      
10186  tcl          config=mmap test/vacuummem.test                running  0      0       0      

Decision: keeping kd-nbh open and investigating/fixing this blocker in-place before another full retry. If the root cause splits beyond the assigned audit, I will create/route a focused blocker bead and keep this PR updated.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite PR #692 audit: post-getdents kernel refresh + fresh all --explain

This is a milestone update for kd-nbh under convoy kd-8ei; it is enumeration only, not a SQLite pass/fail result.

Root cause addressed since the previous crash/quarantine: the first post-open-pathfix platform trap was SYS_GETDENTS64 (syscall=122) during the Node all run. I changed sys_getdents64 to avoid infallible path/name cloning in the low-memory path: it now copies pending entry/path bytes fallibly and returns ENOMEM instead of aborting the kernel on allocation failure.

Validation before this explain:

  • cargo fmt -p kandelo
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin getdents64 --lib && cargo test -p kandelo --target aarch64-apple-darwin dir_stream --lib && cargo test -p kandelo --target aarch64-apple-darwin readdir --lib' => getdents64 5 passed, dir_stream 1 passed, readdir 3 passed
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin --lib' => 966 passed
  • scripts/dev-shell.sh bash -c 'cargo build -p kandelo --target wasm32-unknown-unknown --release -Z build-std=core,alloc' => succeeded
  • Rebuilt kernel copied to local-binaries/kernel.wasm and host/wasm/kandelo-kernel.wasm, size 674332 bytes, timestamp Jun 18 19:45 EDT

Required fresh explain after source/kernel refresh:

  • Command: scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-explain-after-getdentsfix-20260618-1945'
  • Combined artifact: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-getdentsfix-20260618-1945/combined-summary.md

Explain counts:

Host Runner exit Total jobs Done Failed Omitted Running Ready SQLite cases Case errors
node 0 10523 0 0 0 0 10523 0 0
browser 0 10523 0 0 0 0 10523 0 0

Scope/skips remain unchanged: selected public SQLite test/testrunner.tcl all permutation only; TH3/private and other non-public/out-of-scope suites are skipped as not available/not applicable to this Kandelo package audit.

Next action: retry the selected public Tcl all suite on Node with jobs=32, runner timeout 86400000 ms and shell timeout 86400s; any timeout/crash remains incomplete, not pass/fail.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite PR #692 audit: Node all/jobs32 after getdents fix hit new platform blocker

This run is incomplete due to a Kandelo platform crash, not a SQLite pass/fail result.

Command/policy:

  • Host/permutation/jobs: node, all, jobs=32
  • Runner timeout: 86400000 ms
  • Shell timeout: 86400s
  • Command: timeout 86400s scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-getdentsfix-20260618-1951 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-getdentsfix-20260618-1951/workdir --keep-workdir'
  • Stop outcome: manual SIGINT after platform trap cascade; wrapper exit 130
  • Timeout/crash policy: incomplete, not pass/fail

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-getdentsfix-20260618-1951
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-getdentsfix-20260618-1951/quarantine-kernel-crash-20260618-1959
  • Raw stopped DB: .../quarantine-kernel-crash-20260618-1959/raw-stopped-db/testrunner.db*
  • Query DB copy: .../quarantine-kernel-crash-20260618-1959/query-db/testrunner.db
  • Command logs: command.pre-stop.log, command.post-stop.log, command.post-stop.normalized.log
  • Trap signatures: trap-signatures-before-stop.txt, trap-signatures-post-stop.txt (3534 matching lines)
  • First-trap excerpt: first-trap-signatures.normalized.txt
  • Counts/lists: db-counts.txt, db-state-counts.txt, failed-jobs.txt, running-jobs.txt, omitted-jobs.txt
  • Copied DB integrity: ok

Stopped-state diagnostic counts from copied DB:

total_jobs  done  failed_rows  omitted  running  ready  total_cases  total_case_errors
----------  ----  -----------  -------  -------  -----  -----------  -----------------
10523       115   38           0        32       10338  10057        586

State counts:

state    jobs   cases  errors
-------  -----  -----  ------
done     115    8454   0
failed   38     1603   586
ready    10338  0      0
running  32     0      0

Crash signature summary: the run progressed to about 08:49 tcl(153/10523) f38 r32. The first normalized trap is UNCAUGHT ERROR pid=287: RuntimeError: unreachable through CentralizedKernelWorker.handleFcntlLock; shortly after, the log shows kernel threw entries for syscalls 4 and 2, then a memory-access-out-of-bounds cascade in write/exit paths. I am treating handleFcntlLock as the current platform blocker unless source mapping shows an earlier kernel-layer cause.

Scope/skips remain unchanged: selected public SQLite test/testrunner.tcl all permutation only; TH3/private and other non-public/out-of-scope SQLite suites are skipped as unavailable/not applicable to this package audit.

Complete stopped-state failed rows, diagnostic only
jobid  displaytype  displayname                                               state   cases  errors  span_ms
-----  -----------  --------------------------------------------------------  ------  -----  ------  -------
4      tcl          test/sysfault.test                                        failed  1      1       194152
6      tcl          test/writecrash.test                                      failed  47     2       87114
18     tcl          ext/fts5/test/fts5contentless2.test                       failed  1      1       92840
25     tcl          test/manydb.test                                          failed  901    471     278532
32     tcl          ext/rtree/rtreeA.test                                     failed  1      1       57113
33     tcl          ext/fts5/test/fts5tokenizer3.test                         failed  1      1       31211
34     tcl          test/badutf.test                                          failed  37     2       43877
35     tcl          test/vtabI.test                                           failed  16     2       35588
36     tcl          test/gcfault.test                                         failed  1      1       78423
53     tcl          test/like.test                                            failed  159    1       40778
62     tcl          test/delete.test                                          failed  68     6       13111
92     tcl          test/exists.test                                          failed  73     9       24605
98     tcl          ext/fts5/test/fts5optimize2.test                          failed  4      3       27853
115    tcl          test/bigfile2.test                                        failed  4      2       1803
746    tcl          test/sort4.test                                           failed  11     8       34513
1040   tcl          ext/fts5/test/fts5optimize3.test                          failed  4      2       64167
1544   tcl          config=memsubsys1 ext/fts5/test/fts5optimize2.test        failed  4      2       65581
2428   tcl          config=memsubsys1 ext/fts5/test/fts5optimize3.test        failed  4      2       68072
2873   tcl          config=memsubsys2 ext/fts5/test/fts5optimize2.test        failed  4      2       69882
3758   tcl          config=memsubsys2 ext/fts5/test/fts5optimize3.test        failed  4      2       72007
4137   tcl          config=multithread test/sort4.test                        failed  43     25      456283
4341   tcl          config=no_mutex_try ext/fts5/test/fts5optimize2.test      failed  4      2       74834
5227   tcl          config=no_mutex_try ext/fts5/test/fts5optimize3.test      failed  4      2       76288
5674   tcl          config=journaltest ext/fts5/test/fts5optimize2.test       failed  4      2       77167
6451   tcl          config=journaltest ext/fts5/test/fts5optimize3.test       failed  4      2       78737
6844   tcl          config=inmemory_journal ext/fts5/test/fts5optimize2.test  failed  4      2       65229
7678   tcl          config=inmemory_journal ext/fts5/test/fts5optimize3.test  failed  4      2       81478
8162   tcl          config=prepare ext/fts5/test/fts5optimize2.test           failed  4      2       66668
8587   tcl          config=prepare test/busy2.test                            failed  29     4       23426
8845   tcl          config=prepare test/walsetlk.test                         failed  40     1       28558
8980   tcl          config=prepare ext/fts5/test/fts5optimize3.test           failed  4      2       82400
9041   tcl          config=prepare test/wal3.test                             failed  1      1       152707
9384   tcl          config=mmap ext/fts5/test/fts5optimize2.test              failed  4      2       67514
9809   tcl          config=mmap test/busy2.test                               failed  29     4       25863
10069  tcl          config=mmap test/walsetlk.test                            failed  40     1       34642
10204  tcl          config=mmap ext/fts5/test/fts5optimize3.test              failed  4      2       68119
10265  tcl          config=mmap test/wal3.test                                failed  1      1       154340
10505  tcl          config=mmap test/walvfs.test                              failed  35     8       46155
Complete stopped-state running rows at stop time
jobid  displaytype  displayname                                        state    cases  errors  span_ms
-----  -----------  -------------------------------------------------  -------  -----  ------  -------
44     tcl          test/boundary2.test                                running  0      0       0
54     tcl          test/fkey_malloc.test                              running  0      0       0
58     tcl          test/savepoint6.test                               running  0      0       0
63     tcl          ext/rtree/rtree2.test                              running  0      0       0
66     tcl          ext/intck/intckfault.test                          running  0      0       0
70     tcl          test/walfault.test                                 running  0      0       0
91     tcl          test/fts4langid.test                               running  0      0       0
93     tcl          ext/recover/recovercorrupt.test                    running  0      0       0
97     tcl          ext/fts5/test/fts5faultG.test                      running  0      0       0
106    tcl          test/walcrash2.test                                running  0      0       0
109    tcl          ext/fts5/test/fts5faultD.test                      running  0      0       0
114    tcl          ext/fts5/test/fts5aj.test                          running  0      0       0
121    tcl          ext/fts5/test/fts5faultA.test                      running  0      0       0
122    tcl          test/fts3defer.test                                running  0      0       0
127    tcl          test/quota.test                                    running  0      0       0
132    tcl          test/ioerr3.test                                   running  0      0       0
7694   tcl          config=inmemory_journal ext/fts5/test/fts5ah.test  running  0      0       0
8127   tcl          config=prepare ext/fts5/test/fts5secure7.test      running  0      0       0
8384   tcl          config=prepare test/temptable2.test                running  0      0       0
8553   tcl          config=prepare ext/fts5/test/fts5secure3.test      running  0      0       0
8728   tcl          config=prepare test/round1.test                    running  0      0       0
8774   tcl          config=prepare test/memjournal2.test               running  0      0       0
8892   tcl          config=prepare test/vacuum6.test                   running  0      0       0
8964   tcl          config=prepare test/vacuummem.test                 running  0      0       0
8993   tcl          config=prepare ext/fts5/test/fts5ah.test           running  0      0       0
9606   tcl          config=mmap test/temptable2.test                   running  0      0       0
9775   tcl          config=mmap ext/fts5/test/fts5secure3.test         running  0      0       0
9950   tcl          config=mmap test/round1.test                       running  0      0       0
9996   tcl          config=mmap test/memjournal2.test                  running  0      0       0
10116  tcl          config=mmap test/vacuum6.test                      running  0      0       0
10188  tcl          config=mmap test/vacuummem.test                    running  0      0       0
10217  tcl          config=mmap ext/fts5/test/fts5ah.test              running  0      0       0

Next action: diagnose and fix the handleFcntlLock/kernel lock path crash in-place before retrying the full suite; if it proves broader than this bead, I will split a focused blocker bead and link it.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: fcntl-lock crash fix + fresh explain

Published milestone for convoy kd-8ei, bead kd-nbh.

Root cause addressed since the last quarantine: the first post-getdentsfix trap was through CentralizedKernelWorker.handleFcntlLock / kernel_fcntl_lock. The kernel fcntl lock path still cloned ofd.path with an infallible allocation under low-memory SQLite workers. That is now converted to the existing fallible copy path so allocation failure returns ENOMEM instead of aborting the kernel.

Validation after source/kernel refresh:

  • cargo fmt -p kandelo
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin fcntl_lock --lib && cargo test -p kandelo --target aarch64-apple-darwin fcntl_setlk --lib && cargo test -p kandelo --target aarch64-apple-darwin close_releases_fcntl_locks --lib && cargo test -p kandelo --target aarch64-apple-darwin exit_releases_fcntl_locks --lib'
    • fcntl_lock: 3 passed
    • fcntl_setlk: 4 passed
    • close_releases_fcntl_locks: 1 passed
    • exit_releases_fcntl_locks: 1 passed
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin fcntl --lib && cargo test -p kandelo --target aarch64-apple-darwin --lib'
    • fcntl filter: 18 passed
    • full kernel lib: 966 passed
  • scripts/dev-shell.sh bash -c 'cargo build -p kandelo --target wasm32-unknown-unknown --release -Z build-std=core,alloc' succeeded
  • Rebuilt kernel copied to local-binaries/kernel.wasm and host/wasm/kandelo-kernel.wasm, size 674399 bytes, timestamp Jun 18 20:02 EDT

Required fresh all --explain after source/kernel refresh:

  • Command: scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-explain-after-fcntllockfix-20260618-2003'
  • Artifact: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-fcntllockfix-20260618-2003/combined-summary.md

Explain counts:

Host Runner exit Total jobs Done Failed Omitted Running Ready SQLite cases Case errors
node 0 10523 0 0 0 0 10523 0 0
browser 0 10523 0 0 0 0 10523 0 0

This explain run is enumeration only, not a pass/fail result. Scope/skips unchanged: selected public SQLite Tcl all permutation is in scope; TH3/private and other non-public/out-of-scope SQLite suites remain skipped/out of scope.

Next action: retry selected public Tcl all suite on Node with jobs=32, runner timeout 86400000 ms and shell timeout 86400s. Any timeout/crash remains incomplete, not pass/fail. Publication status: this milestone is published to PR #692 and will also be recorded on bead kd-nbh.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: post-fcntl-lockfix Node run quarantined

This is an incomplete Kandelo platform-blocker run, not a SQLite pass/fail result. The run passed the previous fcntl-lock crash window, then hit a new kernel/runtime trap cascade and was stopped for quarantine.

Command/policy:

  • Host/permutation/jobs: node, all, jobs=32
  • Runner timeout: 86400000 ms
  • Shell timeout: 86400s
  • Stop outcome: manual SIGINT after platform trap cascade; wrapper/session status 120
  • Timeout/crash policy: incomplete, not pass/fail
  • Command: timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-fcntllockfix-20260618-2005 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-fcntllockfix-20260618-2005/workdir --keep-workdir"

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-fcntllockfix-20260618-2005
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-fcntllockfix-20260618-2005/quarantine-kernel-crash-20260618-2017
  • Raw stopped DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-fcntllockfix-20260618-2005/quarantine-kernel-crash-20260618-2017/raw-stopped-db/testrunner.db*
  • Query DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-fcntllockfix-20260618-2005/quarantine-kernel-crash-20260618-2017/query-db/testrunner.db*
  • Counts: db-counts.txt, db-state-counts.txt, db-integrity.txt
  • Complete stopped-state failed jobs: failed-jobs.txt
  • Complete stopped-state running jobs: running-jobs.txt
  • Omitted jobs: omitted-jobs.txt (0 rows)
  • Trap signatures: trap-signatures-post-stop.txt (51 matching lines)
  • First trap excerpt: first-trap-signatures.normalized.txt
  • Logs: command.pre-stop.log, command.post-stop.log, command.post-stop.normalized.log, testrunner.log

Copied DB integrity: ok.

Stopped-state diagnostic counts only:

total_jobs  done  failed_rows  omitted  running  ready  total_cases  total_case_errors
----------  ----  -----------  -------  -------  -----  -----------  -----------------
10523       173   36           0        32       10282  16070        592              

State counts:

state    jobs   cases  errors
-------  -----  -----  ------
done     173    14525  0     
failed   36     1545   592   
ready    10282  0      0     
running  32     0      0     

Crash signature summary: latest visible progress before the first trap was 10:57 tcl(208/10523) f36 r32. The first recorded trap is [handleSyscall] kernel threw for pid=263 syscall=2 args=[4,0,0,0,0,0]: RuntimeError: unreachable; shortly after, the log includes exec error, additional syscall 2/4 RuntimeError: unreachable lines, and then memory access out of bounds. Treating the later traps as cascade until source mapping proves otherwise.

This stopped-state failure list is diagnostic only because the platform crashed before the suite completed.

Complete stopped-state failed jobs (diagnostic only)
jobid  displaytype  displayname                                               state   cases  errors  span_ms
-----  -----------  --------------------------------------------------------  ------  -----  ------  -------
2      tcl          test/sysfault.test                                        failed  1      1       231781 
4      tcl          test/writecrash.test                                      failed  50     2       86283  
11     tcl          ext/fts5/test/fts5integrity.test                          failed  1      1       121413 
23     tcl          test/manydb.test                                          failed  901    475     475076 
26     tcl          ext/fts5/test/fts5contentless2.test                       failed  1      1       69627  
30     tcl          test/vtabI.test                                           failed  16     2       25007  
31     tcl          test/gcfault.test                                         failed  1      1       122382 
32     tcl          test/tkt2854.test                                         failed  22     20      36232  
51     tcl          test/like.test                                            failed  159    1       59796  
60     tcl          test/delete.test                                          failed  68     6       25108  
91     tcl          test/exists.test                                          failed  73     9       70732  
107    tcl          test/bigfile2.test                                        failed  4      2       5861   
356    tcl          ext/fts5/test/fts5optimize2.test                          failed  4      3       24322  
749    tcl          test/sort4.test                                           failed  1      1       25034  
1297   tcl          ext/fts5/test/fts5optimize3.test                          failed  4      2       62145  
1794   tcl          config=memsubsys1 ext/fts5/test/fts5optimize2.test        failed  4      2       67351  
2672   tcl          config=memsubsys1 ext/fts5/test/fts5optimize3.test        failed  4      2       68664  
3123   tcl          config=memsubsys2 ext/fts5/test/fts5optimize2.test        failed  4      2       69811  
4002   tcl          config=memsubsys2 ext/fts5/test/fts5optimize3.test        failed  4      2       70899  
4137   tcl          config=multithread test/sort4.test                        failed  43     25      419639 
4590   tcl          config=no_mutex_try ext/fts5/test/fts5optimize2.test      failed  4      2       72542  
5471   tcl          config=no_mutex_try ext/fts5/test/fts5optimize3.test      failed  4      2       73419  
5901   tcl          config=journaltest ext/fts5/test/fts5optimize2.test       failed  4      2       74560  
6664   tcl          config=journaltest ext/fts5/test/fts5optimize3.test       failed  4      2       70384  
7079   tcl          config=inmemory_journal ext/fts5/test/fts5optimize2.test  failed  4      2       77634  
7908   tcl          config=inmemory_journal ext/fts5/test/fts5optimize3.test  failed  4      2       79237  
8396   tcl          config=prepare ext/fts5/test/fts5optimize2.test           failed  4      2       80868  
8589   tcl          config=prepare test/busy2.test                            failed  29     4       23810  
8855   tcl          config=prepare test/walsetlk.test                         failed  40     1       26766  
9039   tcl          config=prepare test/wal3.test                             failed  1      1       196328 
9203   tcl          config=prepare ext/fts5/test/fts5optimize3.test           failed  4      2       81955  
9618   tcl          config=mmap ext/fts5/test/fts5optimize2.test              failed  4      2       82742  
9811   tcl          config=mmap test/busy2.test                               failed  29     4       17088  
10079  tcl          config=mmap test/walsetlk.test                            failed  40     1       27774  
10263  tcl          config=mmap test/wal3.test                                failed  1      1       146105 
10427  tcl          config=mmap ext/fts5/test/fts5optimize3.test              failed  4      2       83694  
Complete stopped-state running jobs at stop time
jobid  displaytype  displayname                                             state    cases  errors  span_ms
-----  -----------  ------------------------------------------------------  -------  -----  ------  -------
41     tcl          test/boundary2.test                                     running                        
52     tcl          test/fkey_malloc.test                                   running                        
54     tcl          test/savepoint6.test                                    running                        
68     tcl          test/walfault.test                                      running                        
100    tcl          test/walcrash2.test                                     running                        
113    tcl          test/fts3defer.test                                     running                        
117    tcl          test/quota.test                                         running                        
129    tcl          test/avtrans.test                                       running                        
155    tcl          test/wal5.test                                          running                        
162    tcl          test/wal2.test                                          running                        
166    tcl          ext/fts5/test/fts5query.test                            running                        
175    tcl          test/crash7.test                                        running                        
179    tcl          ext/rtree/rtree8.test                                   running                        
181    tcl          ext/fts5/test/fts5securefault.test                      running                        
183    tcl          test/crash4.test                                        running                        
186    tcl          ext/fts5/test/fts5corrupt7.test                         running                        
7991   tcl          config=inmemory_journal ext/fts5/test/fts5secure3.test  running                        
8005   tcl          config=inmemory_journal ext/fts5/test/fts5ah.test       running                        
8388   tcl          config=prepare test/temptable2.test                     running                        
8730   tcl          config=prepare test/round1.test                         running                        
8780   tcl          config=prepare test/memjournal2.test                    running                        
8905   tcl          config=prepare test/vacuum6.test                        running                        
8970   tcl          config=prepare test/vacuummem.test                      running                        
9279   tcl          config=prepare ext/fts5/test/fts5secure3.test           running                        
9292   tcl          config=prepare ext/fts5/test/fts5ah.test                running                        
9610   tcl          config=mmap test/temptable2.test                        running                        
9952   tcl          config=mmap test/round1.test                            running                        
10002  tcl          config=mmap test/memjournal2.test                       running                        
10129  tcl          config=mmap test/vacuum6.test                           running                        
10194  tcl          config=mmap test/vacuummem.test                         running                        
10503  tcl          config=mmap ext/fts5/test/fts5secure3.test              running                        
10516  tcl          config=mmap ext/fts5/test/fts5ah.test                   running                        

Skips/out-of-scope unchanged: selected public SQLite Tcl all permutation is in scope; TH3/private and other non-public/out-of-scope SQLite suites remain skipped/out of scope.

Next action: diagnose the first trap from this quarantine, fix the platform root cause, rebuild the kernel if needed, rerun fresh Node/browser all --explain, and retry the selected public Tcl suite under the same hard-timeout policy. Publication status: this blocker/quarantine is being published to PR #692 and will also be recorded on bead kd-nbh.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: close-path crash fix + fresh explain

Published milestone for convoy kd-8ei, bead kd-nbh.

Root cause addressed since the last quarantine: the first post-fcntl-lockfix trap was Syscall::Close (syscall=2) after progress 10:57 tcl(208/10523) f36 r32. sys_close still cloned ofd.path with an infallible allocation before close-time host lock cleanup. Under SQLite low-memory worker pressure that could abort the kernel. The close path now determines whether host lock cleanup needs the path while the fd is still open, copies that path with the fallible helper only when needed, and only then frees the fd, returning ENOMEM instead of aborting if the copy cannot be allocated.

Validation after source/kernel refresh:

  • cargo fmt -p kandelo
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin close --lib && cargo test -p kandelo --target aarch64-apple-darwin fcntl --lib && cargo test -p kandelo --target aarch64-apple-darwin close_releases_fcntl_locks --lib && cargo test -p kandelo --target aarch64-apple-darwin exit_releases_fcntl_locks --lib'
    • close filter: 29 passed
    • fcntl filter: 18 passed
    • close_releases_fcntl_locks: 1 passed
    • exit_releases_fcntl_locks: 1 passed
  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin --lib'
    • full kernel lib: 966 passed
  • scripts/dev-shell.sh bash -c 'cargo build -p kandelo --target wasm32-unknown-unknown --release -Z build-std=core,alloc' succeeded
  • Rebuilt kernel copied to local-binaries/kernel.wasm and host/wasm/kandelo-kernel.wasm, size 674534 bytes, timestamp Jun 18 20:21 EDT

Required fresh all --explain after source/kernel refresh:

  • Command: scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-explain-after-closefix-20260618-2022'
  • Artifact: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-closefix-20260618-2022/combined-summary.md

Explain counts:

Host Runner exit Total jobs Done Failed Omitted Running Ready SQLite cases Case errors
node 0 10523 0 0 0 0 10523 0 0
browser 0 10523 0 0 0 0 10523 0 0

This explain run is enumeration only, not a pass/fail result. Scope/skips unchanged: selected public SQLite Tcl all permutation is in scope; TH3/private and other non-public/out-of-scope SQLite suites remain skipped/out of scope.

Next action: retry selected public Tcl all suite on Node with jobs=32, runner timeout 86400000 ms and shell timeout 86400s. Any timeout/crash remains incomplete, not pass/fail. Publication status: this milestone is published to PR #692 and will also be recorded on bead kd-nbh.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: post-closefix Node run quarantined

This is an incomplete Kandelo platform-blocker run, not a SQLite pass/fail result. The run passed the previous fcntl-lock crash window and reached the neighborhood of the previous close-path trap, then hit a new write-path kernel trap cascade and was stopped for quarantine.

Command/policy:

  • Host/permutation/jobs: node, all, jobs=32
  • Runner timeout: 86400000 ms
  • Shell timeout: 86400s
  • Stop outcome: manual SIGINT after platform trap cascade; wrapper/session status 120
  • Timeout/crash policy: incomplete, not pass/fail
  • Command: timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-closefix-20260618-2023 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-closefix-20260618-2023/workdir --keep-workdir"

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-closefix-20260618-2023
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-closefix-20260618-2023/quarantine-kernel-crash-20260618-2033
  • Raw stopped DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-closefix-20260618-2023/quarantine-kernel-crash-20260618-2033/raw-stopped-db/testrunner.db*
  • Query DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-closefix-20260618-2023/quarantine-kernel-crash-20260618-2033/query-db/testrunner.db*
  • Counts: db-counts.txt, db-state-counts.txt, db-integrity.txt
  • Complete stopped-state failed jobs: failed-jobs.txt
  • Complete stopped-state running jobs: running-jobs.txt
  • Omitted jobs: omitted-jobs.txt (0 rows)
  • Trap signatures: trap-signatures-post-stop.txt (32 matching lines)
  • First trap excerpt: first-trap-signatures.normalized.txt
  • Logs: command.pre-stop.log, command.post-stop.log, command.post-stop.normalized.log, testrunner.log

Copied DB integrity: ok.

Stopped-state diagnostic counts only:

total_jobs  done  failed_rows  omitted  running  ready  total_cases  total_case_errors
----------  ----  -----------  -------  -------  -----  -----------  -----------------
10523       153   35           0        32       10303  14437        449              

State counts:

state    jobs   cases  errors
-------  -----  -----  ------
done     153    13082  0     
failed   35     1355   449   
ready    10303  0      0     
running  32     0      0     

Crash signature summary: latest visible progress before the first trap was 08:33 tcl(187/10523) f35 r32. The first recorded trap is [handleSyscall] kernel threw for pid=1363 syscall=4 args=[2,1710164,7,0,0,0]: RuntimeError: unreachable. Subsequent lines are additional syscall=4 traps, an exec error, and more write-path UNCAUGHT ERROR entries. Treating later lines as cascade until source mapping proves otherwise.

This stopped-state failure list is diagnostic only because the platform crashed before the suite completed.

Complete stopped-state failed jobs (diagnostic only)
jobid  displaytype  displayname                                               state   cases  errors  span_ms
-----  -----------  --------------------------------------------------------  ------  -----  ------  -------
2      tcl          ext/fts5/test/fts5fault7.test                             failed  1      1       80403  
3      tcl          test/sysfault.test                                        failed  1      1       200319 
5      tcl          test/writecrash.test                                      failed  47     2       63342  
10     tcl          ext/fts5/test/fts5fault4.test                             failed  1      1       69808  
23     tcl          test/manydb.test                                          failed  901    348     290905 
28     tcl          ext/fts5/test/fts5synonym2.test                           failed  1      1       40295  
29     tcl          ext/intck/intck2.test                                     failed  1      1       14174  
32     tcl          ext/recover/recoverpgsz.test                              failed  1      1       18050  
61     tcl          test/delete.test                                          failed  68     6       22513  
90     tcl          test/exists.test                                          failed  73     9       15427  
105    tcl          test/bigfile2.test                                        failed  4      2       4801   
166    tcl          ext/fts5/test/fts5optimize2.test                          failed  4      3       10925  
734    tcl          test/sort4.test                                           failed  1      1       12146  
1106   tcl          ext/fts5/test/fts5optimize3.test                          failed  4      2       50604  
1612   tcl          config=memsubsys1 ext/fts5/test/fts5optimize2.test        failed  4      2       54950  
2488   tcl          config=memsubsys1 ext/fts5/test/fts5optimize3.test        failed  4      2       53453  
2941   tcl          config=memsubsys2 ext/fts5/test/fts5optimize2.test        failed  4      2       50389  
3818   tcl          config=memsubsys2 ext/fts5/test/fts5optimize3.test        failed  4      2       59340  
4137   tcl          config=multithread test/sort4.test                        failed  51     30      412316 
4408   tcl          config=no_mutex_try ext/fts5/test/fts5optimize2.test      failed  4      2       52972  
5287   tcl          config=no_mutex_try ext/fts5/test/fts5optimize3.test      failed  4      2       55852  
5732   tcl          config=journaltest ext/fts5/test/fts5optimize2.test       failed  4      2       60462  
6502   tcl          config=journaltest ext/fts5/test/fts5optimize3.test       failed  4      2       62934  
6904   tcl          config=inmemory_journal ext/fts5/test/fts5optimize2.test  failed  4      2       55381  
7732   tcl          config=inmemory_journal ext/fts5/test/fts5optimize3.test  failed  4      2       58967  
8223   tcl          config=prepare ext/fts5/test/fts5optimize2.test           failed  4      2       62572  
8567   tcl          config=prepare test/busy2.test                            failed  29     4       12849  
8846   tcl          config=prepare test/walsetlk.test                         failed  40     1       31498  
9030   tcl          config=prepare test/wal3.test                             failed  1      1       125994 
9035   tcl          config=prepare ext/fts5/test/fts5optimize3.test           failed  4      2       59044  
9445   tcl          config=mmap ext/fts5/test/fts5optimize2.test              failed  4      2       61020  
9789   tcl          config=mmap test/busy2.test                               failed  29     4       42253  
10070  tcl          config=mmap test/walsetlk.test                            failed  40     1       34259  
10254  tcl          config=mmap test/wal3.test                                failed  1      1       126542 
10259  tcl          config=mmap ext/fts5/test/fts5optimize3.test              failed  4      2       64228  
Complete stopped-state running jobs at stop time
jobid  displaytype  displayname                                    state    cases  errors  span_ms
-----  -----------  ---------------------------------------------  -------  -----  ------  -------
17     tcl          ext/fts5/test/fts5fault1.test                  running                        
27     tcl          ext/recover/recoverfault2.test                 running                        
44     tcl          test/boundary2.test                            running                        
55     tcl          test/fkey_malloc.test                          running                        
57     tcl          test/savepoint6.test                           running                        
66     tcl          test/walfault.test                             running                        
99     tcl          test/walcrash2.test                            running                        
110    tcl          test/fts3defer.test                            running                        
115    tcl          test/quota.test                                running                        
128    tcl          test/avtrans.test                              running                        
154    tcl          test/wal5.test                                 running                        
160    tcl          test/wal2.test                                 running                        
161    tcl          test/skipscan2.test                            running                        
162    tcl          test/corrupt8.test                             running                        
163    tcl          test/json104.test                              running                        
164    tcl          test/unhex.test                                running                        
7654   tcl          config=inmemory_journal test/vacuummem.test    running                        
8280   tcl          config=prepare ext/fts5/test/fts5secure7.test  running                        
8382   tcl          config=prepare test/temptable2.test            running                        
8646   tcl          config=prepare ext/fts5/test/fts5ah.test       running                        
8692   tcl          config=prepare ext/fts5/test/fts5secure3.test  running                        
8719   tcl          config=prepare test/round1.test                running                        
8766   tcl          config=prepare test/memjournal2.test           running                        
8892   tcl          config=prepare test/vacuum6.test               running                        
8954   tcl          config=prepare test/vacuummem.test             running                        
9604   tcl          config=mmap test/temptable2.test               running                        
9868   tcl          config=mmap ext/fts5/test/fts5ah.test          running                        
9914   tcl          config=mmap ext/fts5/test/fts5secure3.test     running                        
9941   tcl          config=mmap test/round1.test                   running                        
9988   tcl          config=mmap test/memjournal2.test              running                        
10116  tcl          config=mmap test/vacuum6.test                  running                        
10178  tcl          config=mmap test/vacuummem.test                running                        

Skips/out-of-scope unchanged: selected public SQLite Tcl all permutation is in scope; TH3/private and other non-public/out-of-scope SQLite suites remain skipped/out of scope.

Next action: diagnose the first write-path trap from this quarantine, fix the platform root cause, rebuild the kernel if needed, rerun fresh Node/browser all --explain, and retry the selected public Tcl suite under the same hard-timeout policy. Publication status: this blocker/quarantine is being published to PR #692 and will also be recorded on bead kd-nbh.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh update after pipe-write fix/explain

Milestone: fixed the next platform crash from the post-closefix Node run and completed the required fresh all --explain on both hosts.

Root cause: the first trap in test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-closefix-20260618-2023/quarantine-kernel-crash-20260618-2033 was Syscall::Write (syscall=4) on fd 2 with a 7-byte buffer. Wasm frame mapping showed fd 2 was a kernel pipe and the abort came from PipeBuffer::write via VecDeque::extend, which allocated infallibly under kernel-memory pressure.

Change: PipeBuffer::write now reserves the wakeup event and pipe storage fallibly before mutating the pipe and returns ENOMEM instead of aborting. sys_write, socket pipe sends, and kernel_pipe_write propagate the errno. This is a kernel-only change; no host runtime behavior changed.

Validation commands and counts:

  • scripts/dev-shell.sh bash -lc "cargo fmt -p kandelo && cargo test -p kandelo --target aarch64-apple-darwin pipe --lib && cargo test -p kandelo --target aarch64-apple-darwin write --lib && cargo test -p kandelo --target aarch64-apple-darwin socket --lib && cargo test -p kandelo --target aarch64-apple-darwin --lib"
    • pipe filter: 52 passed, 0 failed
    • write filter: 37 passed, 0 failed
    • socket filter: 34 passed, 0 failed
    • full kernel lib: 966 passed, 0 failed
  • scripts/dev-shell.sh bash -c "cargo build -p kandelo --target wasm32-unknown-unknown --release -Z build-std=core,alloc"
    • copied target/wasm32-unknown-unknown/release/kandelo_kernel.wasm to local-binaries/kernel.wasm and host/wasm/kandelo-kernel.wasm
    • artifact size: 675156 bytes; timestamp: 2026-06-18 20:39 EDT
  • Fresh explain command: scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-explain-after-pipewritefix-20260618-2040"
    • node: runner exit 0; total_jobs=10523, done=0, failed=0, omitted=0, running=0, ready=10523, cases=0, case_errors=0
    • browser: runner exit 0; total_jobs=10523, done=0, failed=0, omitted=0, running=0, ready=10523, cases=0, case_errors=0

Artifacts:

  • Combined explain summary: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-pipewritefix-20260618-2040/combined-summary.md
  • Node explain: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-pipewritefix-20260618-2040/node
  • Browser explain: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-pipewritefix-20260618-2040/browser

Skipped/out of scope remains unchanged: TH3/private SQLite suites and non-public SQLite test assets are out of scope for this public package audit. The selected public Tcl suite remains all with 10523 discovered jobs.

Next: retry Node public all with --jobs 32, runner timeout 86400000 ms, outer hard timeout 86400s. A timeout/crash/wedge will be reported as incomplete, not as a SQLite pass/fail result.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh quarantine after post-pipewritefix Node all/jobs32 retry

Milestone: new platform blocker/quarantine. The run is incomplete and must not be counted as a SQLite pass/fail result.

Run command/policy:

  • Active run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipewritefix-20260618-2044
  • Command: timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipewritefix-20260618-2044 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipewritefix-20260618-2044/workdir --keep-workdir"
  • Outer hard timeout: 86400s
  • Runner timeout: 86400000 ms
  • Stop reason: kernel RuntimeError: unreachable, not hard timeout.

First trap/signature:

  • Latest progress before trap: 08:12 tcl(171/10523) f37 r32 ETC 08:15:52
  • [kernel] exec error for pid 1278: RuntimeError: unreachable
  • [handleSyscall] kernel threw for pid=258 syscall=9 args=[64872,0,0,0,0,0]: RuntimeError: unreachable
  • Stack includes CentralizedKernelWorker.kernelExecSetup, so this is being treated as an execve/exec-state platform crash.

Quarantine artifacts:

  • Quarantine root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipewritefix-20260618-2044/quarantine-kernel-crash-20260618-2051
  • Logs: command.pre-stop.log, command.post-stop.log, command.post-stop.normalized.log, command-tail-before-stop.txt
  • Trap scan: trap-signatures-before-stop.txt
  • Process state: ps-before-stop.txt, ps-before-direct-kill.txt, ps-after-term.txt, ps-final-after-kill.txt
  • Stopped DB copies: raw-stopped-db/testrunner.db*, query-db/testrunner.db*
  • DB summaries: db-integrity.txt, db-counts.txt, db-state-counts.txt, db-config-counts.txt, failed-running-omitted-jobs.txt, failures.tsv

Stopped copied DB status:

  • PRAGMA integrity_check: ok
  • total_jobs=10523
  • done_jobs=134
  • failed_jobs=37
  • omitted_jobs=0
  • running_jobs=32
  • ready_jobs=10320
  • total_cases=15087
  • total_case_errors=860
  • State counts: done 134 jobs / 13245 cases / 0 errors; failed 37 jobs / 1842 cases / 860 errors; running 32 jobs / 0 cases / 0 errors; ready 10320 jobs / 0 cases / 0 errors.

Partial failed/running rows are captured in failed-running-omitted-jobs.txt and failures.tsv. They are not a complete SQLite failure catalog because the platform trap stopped the suite. Skips/out-of-scope remain unchanged: TH3/private SQLite suites and non-public assets are out of scope; selected public Tcl suite remains all with 10523 discovered jobs.

Next: diagnose and fix the exec-state/kernel crash before another full retry. Fresh Node/browser all --explain will be rerun after any source/kernel refresh.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: post-exec/pipefix Node all/jobs32 stopped during testrunner control DB initialization

This run is incomplete and is not a SQLite pass/fail result. It failed before any jobs launched because SQLite's testrunner control database became malformed while building the all testset.

Command/policy:

  • Command: timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execpipeallocfix-20260618-2104 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execpipeallocfix-20260618-2104/workdir --keep-workdir"
  • Runner timeout: 86400000 ms
  • Outer hard timeout: 86400s
  • Outcome: runner exit 1 before job execution; no hard timeout
  • Policy: incomplete, not pass/fail

Observed failure:

  • built testset in 35636ms..
  • /Users/brandon/src/kandelo/packages/registry/sqlite/bin/testfixture.wasm: database disk image is malformed
  • trdb one { SELECT value FROM config WHERE name='njob' }
  • Summary query also failed: malformed database schema (jobs) - incomplete input

Counts:

  • Total jobs/done/failed/omitted/running/ready: unavailable because the jobs table schema is malformed.
  • SQLite cases/case errors: unavailable for the same reason.
  • host-status.tsv: node exit 1.
  • failures.tsv: empty; no SQLite job failure list exists because no usable jobs table was created.

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execpipeallocfix-20260618-2104
  • Combined summary: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execpipeallocfix-20260618-2104/combined-summary.md
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execpipeallocfix-20260618-2104/quarantine-control-db-malformed-20260618-2110
  • Quarantined DBs: raw-workdir-db/testrunner.db and node-db/testrunner.db, both 5,463,040 bytes
  • Integrity/schema output: workdir-db-integrity.txt, workdir-db-schema.txt, node-db-integrity.txt, node-db-schema.txt
  • Logs: command.log, workdir-testrunner.log, node-testrunner.log

Stopped-copy diagnostics:

  • pragma integrity_check on both DB copies fails immediately: Error: in prepare, malformed database schema (jobs) - incomplete input (11)
  • .schema on both DB copies fails: Error: malformed database schema (jobs) - incomplete input
  • No RuntimeError, kernel threw, memory access out of bounds, or unreachable signatures were found in the command log.
  • No leftover runner/testfixture/node processes remain.

Next action: diagnose the testrunner-control-DB corruption path as the current platform blocker. This is separate from the earlier kernel trap signatures and still blocks completing the public Tcl all audit.

@brandonpayton

Copy link
Copy Markdown
Member Author

2026-06-18 21:10 EDT diagnostic update for the post-exec/pipefix control-DB blocker.

After the malformed testrunner.db quarantine at test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execpipeallocfix-20260618-2104/quarantine-control-db-malformed-20260618-2110, I ran three diagnostics to see whether control-DB initialization corruption was stable:

  1. Node all --explain --jobs 32 using the runner temp workdir:

    • Command: scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-diagnostic-node-explain-jobs32-20260618-2118'
    • Result: exit 0; total_jobs=10523 done=0 failed=0 omitted=0 running=0 ready=10523 cases=0 case_errors=0
    • Artifact: test-runs/sqlite-project-unit-all/kd-nbh-diagnostic-node-explain-jobs32-20260618-2118/combined-summary.md
  2. Node all --explain --jobs 32 with explicit preserved workdir:

    • Command: scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-diagnostic-node-explain-jobs32-explicitwd-20260618-2121 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-diagnostic-node-explain-jobs32-explicitwd-20260618-2121/workdir --keep-workdir'
    • Result: exit 0; total_jobs=10523 done=0 failed=0 omitted=0 running=0 ready=10523 cases=0 case_errors=0
    • Artifact: test-runs/sqlite-project-unit-all/kd-nbh-diagnostic-node-explain-jobs32-explicitwd-20260618-2121/combined-summary.md
  3. Short Node non-explain diagnostic with syscall logging:

    • Command: timeout 300s scripts/dev-shell.sh bash -c "KERNEL_SYSCALL_LOG=1 scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 120000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-node-jobs32-syscalllog-after-pipeallocfix-20260618-2128 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-node-jobs32-syscalllog-after-pipeallocfix-20260618-2128/workdir --keep-workdir"
    • Result: runner exit 1 after the intentional 120000 ms per-host timeout. This is incomplete, not pass/fail. It reached job execution and did not reproduce the malformed control DB.
    • Stopped-state diagnostic counts only: total_jobs=10523 done=23 failed=20 omitted=0 running=32 ready=10448 total_cases=487 total_case_errors=58
    • Artifacts: test-runs/sqlite-project-unit-all/kd-nbh-node-jobs32-syscalllog-after-pipeallocfix-20260618-2128/command.log, node/summary.txt, node/failures.tsv, node/testrunner.db, workdir/testrunner.db*

Interpretation: the malformed jobs schema from the previous full retry has not reproduced in setup-only or short execution diagnostics. I am treating that incident as quarantined and still incomplete, then retrying the selected public SQLite Tcl all suite on Node with the normal full-run policy: jobs=32, runner timeout 86400000 ms, outer hard timeout 86400s. Any crash, wedge, or timeout remains incomplete rather than a SQLite pass/fail result.

Skip/out-of-scope remains unchanged: TH3/private and other non-public SQLite suites are out of scope; selected public Tcl suite is all with 10,523 discovered jobs.

@brandonpayton

Copy link
Copy Markdown
Member Author

2026-06-18 21:20 EDT additional kernel refresh before full retry: pipe-read allocation/wakeup path.

Before restarting the full Node run, I found the current kernel artifact had advanced again after the previous diagnostics. The change is kernel-only: PipeBuffer::read now returns Result<usize, Errno>, reserves the wakeup event fallibly before consuming bytes, and propagates ENOMEM through pipe/socket read paths plus kernel_pipe_read instead of allowing another low-memory kernel abort in the pipe read path. This is the read-side companion to the earlier pipe-write/pipe-allocation fixes.

Validation for this source/kernel state:

  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin pipe --lib && cargo test -p kandelo --target aarch64-apple-darwin read --lib && cargo test -p kandelo --target aarch64-apple-darwin --lib'
    • pipe filter: 52 passed, 0 failed
    • read filter: 84 passed, 0 failed
    • full kernel lib: 966 passed, 0 failed
  • Rebuilt kernel artifacts already present before this note:
    • target/wasm32-unknown-unknown/release/kandelo_kernel.wasm
    • local-binaries/kernel.wasm
    • host/wasm/kandelo-kernel.wasm
    • size 665,476 bytes, timestamp Jun 18 21:12 EDT
  • Required fresh all --explain after this source/kernel refresh:
    • Command: scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-explain-after-pipereadfix-20260618-2112'
    • Node: exit 0; total_jobs=10523 done=0 failed=0 omitted=0 running=0 ready=10523 cases=0 case_errors=0
    • Browser: exit 0; total_jobs=10523 done=0 failed=0 omitted=0 running=0 ready=10523 cases=0 case_errors=0
    • Summary: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-pipereadfix-20260618-2112/combined-summary.md

Sequencing note: the full retry announced in the previous comment was stopped before producing audit evidence after this newer refresh became visible. That abandoned start is test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execpipeallocfix-retry2-20260618-2112; wrapper status 130, no pass/fail value.

Next run policy remains unchanged: selected public SQLite Tcl suite all, Node first, jobs=32, runner timeout 86400000 ms, outer hard timeout 86400s; timeout/crash/wedge remains incomplete rather than a SQLite pass/fail result. TH3/private and non-public suites remain out of scope.

@brandonpayton

Copy link
Copy Markdown
Member Author

2026-06-18 21:27 EDT run-state update after pipe-read refresh.

One full Node retry was interrupted before completion and is not a SQLite pass/fail result:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-20260618-2121
  • Command: timeout 86400s scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-20260618-2121 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-20260618-2121/workdir --keep-workdir'
  • Outcome: exec session status 143/SIGTERM, no wrapper summary, not a hard timeout, not pass/fail.
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-20260618-2121/quarantine-interrupted-20260618-2126
  • Copied DB integrity: ok
  • Stopped-state diagnostic counts only: total_jobs=10523 done=48 failed=21 omitted=0 running=32 ready=10422 total_cases=1083 total_case_errors=61
  • Failed/running/omitted diagnostic list: quarantine-interrupted-20260618-2126/failed-running-omitted-jobs.txt
  • No RuntimeError, kernel threw, memory access out of bounds, unreachable, malformed DB, or handleFcntlLock signatures were found in the interrupted run log.

There is now one active full retry, which I am treating as the current Node full run:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-retry3-20260618-212257
  • Command: timeout 86400s scripts/dev-shell.sh bash -c 'KERNEL_IO_DIAG=1 scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-retry3-20260618-212257 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-retry3-20260618-212257/workdir --keep-workdir'
  • Policy: selected public SQLite Tcl all, Node, jobs=32, runner timeout 86400000 ms, outer hard timeout 86400s; timeout/crash/wedge remains incomplete, not pass/fail.
  • Current log-only state: passed control-DB setup (built testset in 35221ms) and is executing jobs; around tcl(46/10523) f20 r32; no kernel/runtime trap signatures so far.

Skipped/out of scope unchanged: TH3/private and non-public SQLite suites remain out of scope; selected public suite remains all with 10,523 discovered jobs per Node/browser explain.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: exec/CLOEXEC refresh + active retry

Publication status: posting this milestone to PR #692 and mirroring it to bead kd-nbh. The SQLite public Tcl all suite remains incomplete; this is not a pass/fail report.

Previous full retry quarantined as platform crash/incomplete:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-retry3-20260618-212257
  • Command: timeout 86400s scripts/dev-shell.sh bash -c 'KERNEL_IO_DIAG=1 scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-retry3-20260618-212257 --workdir /Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-retry3-20260618-212257/workdir --keep-workdir'
  • Timeout policy: runner timeout 86400000 ms, outer hard timeout 86400s; timeout/crash/wedge is incomplete, not pass/fail.
  • Outcome: kernel/runtime crash around 10:14 tcl(226/10523) f37 r32; first signature [kernel] exec error for pid 1864: RuntimeError: unreachable.
  • Exit evidence: wrapper exit-code.txt = 120; host-status.tsv shows node 141.
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-pipereadfix-retry3-20260618-212257/quarantine-kernel-crash-20260618-2137
  • Quarantine artifacts include command.log, command-tail.txt, trap-signatures.txt, first-trap-signatures.txt, host-status.tsv, exit-code.txt, testrunner.log, raw-stopped-db/testrunner.db*, query-db/testrunner.db*, db-integrity.txt, db-counts.txt, failed-running-omitted-jobs.txt, and failures.tsv.
  • Copied stopped DB integrity: ok.
  • Diagnostic stopped counts only: total_jobs 10523, done 191, failed_rows 37, omitted 0, running 32, ready 10263, total_cases 17121, total_case_errors 484. These rows are diagnostic only because the platform crashed before suite completion.

Root cause addressed in the latest kernel refresh:

  • kernel_exec_setup now closes FD_CLOEXEC descriptors before exec-state serialization so pipe and host-handle refcounts are decremented instead of silently dropping descriptors from the serialized fd table. This is a kernel-side exec/CLOEXEC lifecycle fix; no host-runtime behavior divergence is intended.
  • Rebuilt kernel artifacts: target/wasm32-unknown-unknown/release/kandelo_kernel.wasm, local-binaries/kernel.wasm, host/wasm/kandelo-kernel.wasm.
  • Artifact size/hash: all three are 664577 bytes, timestamp Jun 18 21:38:08 2026, sha256 c9493012adf18409af4a11164fe19024362fc6ebf03bdde23dc4d6ca823774f5.

Validation after refresh:

  • scripts/dev-shell.sh bash -lc 'cargo test -p kandelo --target aarch64-apple-darwin exec_state --lib && cargo test -p kandelo --target aarch64-apple-darwin cloexec --lib && cargo test -p kandelo --target aarch64-apple-darwin --lib'
    • exec_state: 4 passed, 0 failed
    • cloexec: 12 passed, 0 failed
    • full kernel lib: 966 passed, 0 failed
  • Fresh required explain after source/kernel refresh: scripts/dev-shell.sh bash -c 'scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root test-runs/sqlite-project-unit-all/kd-nbh-explain-after-execcloexecfix-20260618-213815'
    • node: exit 0; total_jobs 10523, done 0, failed 0, omitted 0, running 0, ready 10523, cases 0, case_errors 0
    • browser: exit 0; total_jobs 10523, done 0, failed 0, omitted 0, running 0, ready 10523, cases 0, case_errors 0
    • combined summary: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-execcloexecfix-20260618-213815/combined-summary.md

Active retry:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execcloexecfix-20260618-213916
  • Command: timeout 86400s scripts/dev-shell.sh bash -c "KERNEL_IO_DIAG=1 scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execcloexecfix-20260618-213916' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execcloexecfix-20260618-213916/workdir' --keep-workdir"
  • Current log-only progress at publication time: about 01:16 tcl(56/10523) f19 r32; no RuntimeError, kernel threw, memory access out of bounds, unreachable, handleFcntlLock, or malformed DB signatures in command.log so far.
  • I am not inspecting the live testrunner.db.

Skipped/out-of-scope unchanged: TH3/private SQLite suites and non-public assets are out of scope for this Kandelo package audit. Selected public suite remains upstream test/testrunner.tcl all with 10,523 discovered jobs per host.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: Node retry quarantined on Wasm memory-growth hang

Publication status: posting this milestone to PR #692 and mirroring it to bead kd-nbh. The selected public SQLite Tcl all suite is still incomplete; this is not a pass/fail result and not the final failure catalog.

Run/quarantine:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execcloexecfix-20260618-213916
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execcloexecfix-20260618-213916/quarantine-wasm-memorygrow-hang-20260618-220008
  • Command: timeout 86400s scripts/dev-shell.sh bash -c "KERNEL_IO_DIAG=1 scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execcloexecfix-20260618-213916' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execcloexecfix-20260618-213916/workdir' --keep-workdir"
  • Timeout policy: runner timeout 86400000 ms, outer hard timeout 86400s; timeout/crash/hang is incomplete, not pass/fail.
  • Stop time: pre-stop snapshot 2026-06-18 22:00:08 EDT; actual stop 2026-06-18 22:00:22 EDT.
  • Exit evidence: root exit-code.txt = 120; host-status.tsv shows node 141.
  • Last command-log progress before the long quiet section: 09:16 tcl(186/10523) f30 r32 ETC 08:34:37.
  • No RuntimeError, kernel threw, memory access out of bounds, unreachable, handleFcntlLock, or malformed-DB signature was found in command.log.

Hang evidence:

  • Text log was still inside csv01-style case output near the stop (testrunner.pre-stop.log/testrunner.post-stop.log).
  • Samples: node-sample-20260618-215429.txt and node-sample-20260618-215539.txt both show the active Node worker spending sampled time in V8 Wasm memory growth paths: Builtins_WasmMemoryGrow, v8::internal::Runtime_WasmMemoryGrow, and v8::internal::WasmMemoryObject::Grow.
  • This is being treated as the current Kandelo/host/runtime platform blocker for completing the full suite, not as a SQLite test failure item.

Stopped diagnostic counts only:

  • Source: node/summary.txt and quarantine db-state-before-stop.txt/copied artifacts.
  • total_jobs 10523, done 158, failed_rows 30, omitted 0, running 32, ready 10303, total_cases 11874, total_case_errors 446.
  • State breakdown: done 158 jobs / 10557 cases / 0 errors; failed 30 jobs / 1317 cases / 446 errors; running 32; ready 10303.
  • Copied quarantine DB pragma integrity_check returned ok; the copied DB still has stale locking state for normal queries, so counts above come from the preserved runner summary/state artifact.
  • Provisional failed/running rows are in node/failures.tsv, quarantine-wasm-memorygrow-hang-20260618-220008/node-failures.tsv, and quarantine-wasm-memorygrow-hang-20260618-220008/running-jobs-before-stop.csv; they are diagnostic only because the run was interrupted before completion.

Key artifacts:

  • command.pre-stop.log, command.post-stop.log
  • testrunner.pre-stop.log, testrunner.post-stop.log
  • node-summary.txt, node-failures.tsv
  • db-state-before-stop.txt, testrunner.db
  • running-jobs-before-stop.csv
  • node-sample-20260618-215429.txt, node-sample-20260618-215539.txt
  • ps-before-stop.txt, ps-before-actual-stop.txt, ps-after-actual-int.txt, kill-target.txt, exit-code.txt

Skipped/out-of-scope unchanged: TH3/private SQLite suites and non-public assets are out of scope. Selected public suite remains upstream test/testrunner.tcl all, discovered as 10,523 jobs per host.

Next action: diagnose the focused Wasm memory-growth hang path before retrying the full Node all/jobs32 suite. Any fix will be followed by the required fresh Node/browser all --explain before another full retry.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh active retry update, 2026-06-18 22:06 EDT\n\nA Node public SQLite testrunner retry is currently running after the exec/CLOEXEC refresh and prior Wasm-memory-growth quarantine. This start was not yet published, so recording it now.\n\nCommand/policy:\n- Command: timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-cleanpatch-20260618-220351'"\n- Host/permutation/jobs: node / all / 32\n- Runner timeout: 86400000 ms\n- Outer hard timeout: 86400s\n- Timeout/crash/wedge policy: incomplete, not pass/fail\n- Public suite scope remains upstream SQLite testrunner.tcl all, 10523 discovered jobs from the last required Node/browser all --explain after the source/kernel refresh. TH3/private/non-public suites remain out of scope.\n\nArtifacts/current state:\n- Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-cleanpatch-20260618-220351\n- Live command log: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-cleanpatch-20260618-220351/command.log\n- Live temp workdir from runner log: /tmp/kandelo-sqlite-official.vL0qZf\n- Caveat: this start did not pass explicit --workdir/--keep-workdir. I have identified the temp workdir and will preserve it before any stop/quarantine; I am not reading the live testrunner.db.\n- Current log-only progress at publication: at least 81/10523 jobs, f24, r32, with no RuntimeError/kernel threw/memory access out of bounds/unreachable/handleFcntlLock/malformed DB signatures in command.log. These visible FAILED lines are diagnostic only until the suite completes.\n\nNext milestone will be either normal runner completion with exact counts/failure catalog, or a new quarantine/blocker comment with preserved artifacts and stopped-state counts.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh progress milestone, 2026-06-18 22:12 EDT\n\nThe active Node public SQLite all/jobs32 retry has passed the previous exec/CLOEXEC-refresh quarantine point. Prior run test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-execcloexecfix-20260618-213916 was stopped around 186/10523 with CPU-active samples in V8 Wasm memory growth. Current run test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-cleanpatch-20260618-220351 has reached log-only progress 205/10523, f35, r32.\n\nStill not a pass/fail result:\n- Command/log root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-cleanpatch-20260618-220351\n- Temp workdir: /tmp/kandelo-sqlite-official.vL0qZf\n- Policy: runner_timeout_ms=86400000, outer timeout=86400s; timeout/crash/wedge remains incomplete, not pass/fail\n- I am not reading the live testrunner.db.\n- No RuntimeError/kernel threw/memory access out of bounds/unreachable/handleFcntlLock/malformed DB signatures are present in command.log at this checkpoint.\n\nCurrent interpretation: the earlier Wasm-memory-growth sample is not yet proven to be a hard platform wedge; this retry is being allowed to continue under the documented timeout policy until normal completion or a clear crash/wedge.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh interruption update, 2026-06-18 22:15 EDT

The active Node retry was interrupted before completion and before the wrapper produced host-status.tsv, node/summary.txt, node/failures.tsv, or combined-summary.md. This is incomplete, not a SQLite pass/fail result. I did not issue this stop from the current monitor path.

Interrupted run:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-cleanpatch-20260618-220351
  • Command: timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-cleanpatch-20260618-220351'"
  • Policy: runner_timeout_ms=86400000, outer timeout=86400s; timeout/crash/wedge/interruption incomplete, not pass/fail
  • Exit evidence: root exit-code.txt = 130; host-status.tsv empty
  • Stop artifact root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-cleanpatch-20260618-220351/partial-crossed-exec-crash-boundary-20260618-221349
  • Stop time recorded there: 2026-06-18 22:13:49 EDT
  • Kill target recorded there: pgid=73644
  • Last log-only progress before stop: 09:28 tcl(229/10523) f37 r32 ETC 07:05:50
  • Trap scan before stop: empty; no RuntimeError/kernel threw/memory access out of bounds/unreachable/handleFcntlLock/malformed DB signature in command.log
  • Because this background start did not use explicit --workdir --keep-workdir, the temp workdir /tmp/kandelo-sqlite-official.vL0qZf was removed by wrapper cleanup. There is no stopped DB to query for case counts/failure catalog. Visible FAILED lines remain diagnostic only.

Next action: restarting the selected public Tcl all suite on Node with the same 32 jobs and timeout policy, but with explicit preserved workdir: run root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555, workdir .../workdir, --keep-workdir. No source/kernel refresh occurred, so the previous Node/browser all --explain after the exec/CLOEXEC refresh remains the current enumeration evidence: 10523 ready jobs per host.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh restart confirmation, 2026-06-18 22:17 EDT

Replacement Node public SQLite all/jobs32 run is live with explicit artifact preservation.

Command:
timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555/workdir' --keep-workdir"

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555
  • Live log: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555/command.log
  • Preserved workdir: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555/workdir

Policy unchanged: selected public SQLite Tcl all, Node host, jobs=32, runner_timeout_ms=86400000, outer timeout=86400s. Timeout/crash/wedge/interruption remains incomplete, not pass/fail. I am not querying the live testrunner.db.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh platform blocker/quarantine, 2026-06-18 22:40 EDT

The preserved-workdir Node public SQLite all/jobs32 run stopped incomplete on a testrunner control-DB lock failure. This is not a SQLite pass/fail result and not a final failure catalog.

Command/policy:

  • Command: timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555/workdir' --keep-workdir"
  • Selected suite: upstream public SQLite test/testrunner.tcl all
  • Host/jobs: Node, jobs=32
  • Runner timeout: 86400000 ms
  • Outer hard timeout: 86400s
  • Stop reason: manual quarantine after parent testrunner emitted repeated control DB database is locked errors and no longer made trustworthy audit progress
  • Exit evidence: run root exit-code.txt = 120; host-status.tsv = node 141
  • Timeout/crash/wedge/control-DB failure policy: incomplete, not pass/fail

First/current blocker signature:

  • database is locked while parent testrunner ran trdb eval { BEGIN EXCLUSIVE } in r_write_db, called from mark_job_as_finished / script_input_ready.
  • 32 database is locked signatures captured in signatures-before-stop.txt.
  • No RuntimeError/kernel threw/memory access out of bounds/unreachable/handleFcntlLock/malformed DB signature in command.log.
  • Process sample node-sample-lockburst-20260618-2237.txt shows runtime work mostly in V8 GC/worker paths, not a kernel trap.

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555
  • Quarantine root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-interrupt-20260618-221555/quarantine-control-db-locked-20260618-223935
  • Logs: command.pre-stop.log, command.post-stop.log, testrunner.pre-stop.log, testrunner.post-stop.log
  • Process/sample: ps-before-stop.txt, ps-after-int.txt, node-sample-lockburst-20260618-2237.txt
  • DB copies: raw-stopped-db/testrunner.db*, query-db/testrunner.db*
  • Counts: db-counts.txt, db-state-counts.txt, db-integrity.txt
  • Diagnostic rows: failed-running-omitted-jobs.csv

Stopped copied DB status:

  • integrity_check: ok
  • total_jobs=10523
  • done_jobs=164
  • failed_rows=33
  • omitted_jobs=0
  • running_jobs=32
  • ready_jobs=10294
  • total_cases=14655
  • total_case_errors=452
  • State breakdown: done 164 / 13286 cases / 0 errors; failed 33 / 1369 cases / 452 errors; running 32; ready 10294.

The failed/running rows are diagnostic only because the parent control DB lock failure stopped the audit before suite completion. Next action: diagnose the testrunner control-DB locking path as the current Kandelo platform blocker before another full retry. Browser remains unvalidated for full execution; the last fresh Node/browser all --explain after the source/kernel refresh still enumerated 10523 ready jobs per host.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite PR #692 audit update: control-DB lock cleanup fix + fresh explain

Latest incomplete blocker addressed locally: the preserved-workdir Node all/jobs32 run stopped after repeated parent testrunner.db database is locked errors in trdb eval { BEGIN EXCLUSIVE } / r_write_db / mark_job_as_finished, with no runtime trap signatures. I treated that as a Kandelo host/kernel lock-lifecycle blocker, not a SQLite failure item.

Root-cause fix in this refresh:

  • crates/kernel/src/syscalls.rs: sys_close no longer clones the OFD path just to release host-backed POSIX/OFD advisory locks. It uses the existing OFD path before dec_ref, avoiding an ENOMEM-prone close-time allocation while preserving close/exit lock release semantics.
  • host/src/kernel-worker.ts: normal main-process SYS_EXIT now performs a shared host lock-table cleanup for POSIX process-associated locks held by the exiting pid. This uses a non-forcing cleanup path shared by Node and browser host runtime; crash/deactivation cleanup still uses the existing forced spinlock reset path.
  • Focused tests added/updated in crates/kernel/src/syscalls.rs and host/test/multi-worker.test.ts.

Validation run after the fix:

  • scripts/dev-shell.sh bash -c 'cargo test -p kandelo --target aarch64-apple-darwin fcntl_locks --lib' => 2 passed.
  • scripts/dev-shell.sh bash -c 'cargo test -p kandelo --target aarch64-apple-darwin close --lib && cargo test -p kandelo --target aarch64-apple-darwin fcntl --lib && cargo test -p kandelo --target aarch64-apple-darwin exit --lib' => 29 close-filter tests passed, 18 fcntl-filter tests passed, 5 exit-filter tests passed.
  • cd host && npx vitest run test/cross-process-lock.test.ts test/shared-lock-table.test.ts test/kernel-fcntl-lock.test.ts test/multi-worker.test.ts => 4 files passed, 38 tests passed.
  • scripts/dev-shell.sh bash -c 'cargo test -p kandelo --target aarch64-apple-darwin --lib' => 966 passed.
  • scripts/dev-shell.sh bash -c 'bash scripts/check-abi-version.sh' => snapshot/header/TS ABI bindings in sync; ABI_VERSION and snapshot consistent.
  • cd host && npm run build => host dist rebuilt.
  • cd host && npx vitest run was attempted and is not green in this dirty checkout: 74 files passed, 22 failed, 22 skipped; 712 tests passed, 91 failed, 59 skipped. Dominant failures are stale ABI 14 package/program artifacts against kernel ABI 15, missing wasm program binaries, missing /opt/homebrew/opt/llvm@21/bin/clang, and an existing mariadbd timeout. The lock-focused host tests passed both standalone and inside the full run.
  • cargo fmt -p kandelo could not be run because the dev shell currently has no cargo fmt command; I did not run host rustfmt over the already-dirty file.

Runtime artifacts refreshed:

  • Kernel wasm copied to target/wasm32-unknown-unknown/release/kandelo_kernel.wasm, local-binaries/kernel.wasm, and host/wasm/kandelo-kernel.wasm.
  • Kernel wasm size: 664486 bytes.
  • Kernel wasm sha256: 93e6f0ab2649505acb613e639267d9bd7188195e2fa1734810795c3b83f8f91e for all three copies.

Required fresh post-refresh explain:

  • Command: scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-explain-after-lockexitfix-20260618-224905'"
  • Artifact: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-lockexitfix-20260618-224905/combined-summary.md
  • Node: runner exit 0; total_jobs=10523, done=0, failed=0, omitted=0, running=0, ready=10523, cases=0, case_errors=0.
  • Browser: runner exit 0; total_jobs=10523, done=0, failed=0, omitted=0, running=0, ready=10523, cases=0, case_errors=0.

Next action: retry selected public SQLite Tcl all on Node with jobs=32, explicit preserved workdir, runner timeout 86400000 ms, and outer hard timeout 86400s. Timeout/crash/wedge remains incomplete, not pass/fail. TH3/private/non-public suites remain out of scope.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite audit kd-nbh live milestone for the Node public all/jobs32 retry after the lock/exit cleanup.

Status: still running, incomplete, not a pass/fail result.

Command under audit:

timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/workdir' --keep-workdir"

Timeout policy: runner timeout 86400000 ms, outer shell timeout 86400s. A timeout, wedge, crash, or interrupted run remains incomplete, not SQLite pass/fail.

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026
  • Log: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/command.log
  • Command file: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/command.txt
  • Preserved workdir: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/workdir
  • Current pointer: test-runs/sqlite-project-unit-all/kd-nbh-current-node-run.txt

Log-only checkpoint, without inspecting the live testrunner.db: latest runner progress is 07:48 tcl(178/10523) f31 r32 ETC 07:32:56. This crosses the previous preserved-workdir control-DB-lock quarantine point (done=164, failed_rows=33, running=32) without reproducing that control DB lock signature.

Signature scan of command.log at this checkpoint found no matches for RuntimeError, KERNEL THROW, memory access out of bounds, unreachable, handleFcntlLock, database is locked, malformed, or UNCAUGHT.

Process checkpoint: Node testfixture process 74206 was still alive and CPU-active (STAT R, elapsed about 08:24, CPU about 31.6%, memory about 2.7%).

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite audit kd-nbh quarantine update for the Node public all/jobs32 retry after the lock/exit cleanup.

Status: incomplete / platform blocker, not a SQLite pass/fail result. I stopped the run after it wedged: command.log stopped changing at 2026-06-18 22:58:39 EDT, the latest runner progress remained 07:48 tcl(178/10523) f31 r32 ETC 07:32:56, and repeated process samples showed Node sleeping at 0.0% CPU. The wrapper exit artifact is 143 from the manual TERM.

Command under audit:

timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/workdir' --keep-workdir"

Timeout policy: runner timeout 86400000 ms, outer shell timeout 86400s. A timeout, wedge, crash, or interrupted run is incomplete, not pass/fail. This run was manually quarantined before the hard timeout because the log and CPU were flat.

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026
  • Main log: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/command.log
  • Exit code: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/exit-code.txt
  • Preserved workdir: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/workdir
  • Quarantine dir: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/quarantine-wedge-20260618-230203
  • Copied stopped DB/WAL/SHM: .../quarantine-wedge-20260618-230203/db-copy/
  • Counts: .../quarantine-wedge-20260618-230203/stopped-counts.txt, stopped-state-counts.txt
  • Failed jobs: .../quarantine-wedge-20260618-230203/failed-jobs.txt
  • Running jobs at stop: .../quarantine-wedge-20260618-230203/running-jobs-at-stop.txt
  • Process diagnostics: ps-node.txt, ps-tree.txt, lsof-node.txt, sample-node-5s.txt, sample-key-frames.txt

Copied-DB integrity: PRAGMA integrity_check returned ok. I did not inspect the live testrunner.db; counts below come from the stopped DB copy.

Stopped-state job counts:

  • Total public all jobs discovered: 10523
  • Done jobs: 147
  • Failed jobs: 31
  • Omitted/skipped jobs: 0
  • Running at stop: 32
  • Ready/not-started: 10313
  • Halt/blank: 0

Stopped-state case counts from completed/failed job rows:

  • Cases reached: 12194
  • Case errors: 452
  • Reached non-error cases: 11742

Complete failed-job list observed before the wedge:

2      test/sysfault.test                                        ntest=1    nerr=1
4      test/writecrash.test                                      ntest=44   nerr=5
21     test/manydb.test                                          ntest=901  nerr=348
26     ext/rtree/rtreeA.test                                     ntest=1    nerr=1
34     ext/fts5/test/fts5ubsan.test                              ntest=8    nerr=7
59     test/delete.test                                          ntest=68   nerr=6
87     test/exists.test                                          ntest=73   nerr=9
108    test/bigfile2.test                                        ntest=4    nerr=2
746    test/sort4.test                                           ntest=11   nerr=5
974    ext/fts5/test/fts5optimize3.test                          ntest=4    nerr=2
1480   config=memsubsys1 ext/fts5/test/fts5optimize2.test        ntest=4    nerr=2
2365   config=memsubsys1 ext/fts5/test/fts5optimize3.test        ntest=4    nerr=2
2809   config=memsubsys2 ext/fts5/test/fts5optimize2.test        ntest=4    nerr=2
3695   config=memsubsys2 ext/fts5/test/fts5optimize3.test        ntest=4    nerr=2
4137   config=multithread test/sort4.test                        ntest=43   nerr=26
4277   config=no_mutex_try ext/fts5/test/fts5optimize2.test      ntest=4    nerr=2
5164   config=no_mutex_try ext/fts5/test/fts5optimize3.test      ntest=4    nerr=2
5617   config=journaltest ext/fts5/test/fts5optimize2.test       ntest=4    nerr=2
6394   config=journaltest ext/fts5/test/fts5optimize3.test       ntest=4    nerr=2
6782   config=inmemory_journal ext/fts5/test/fts5optimize2.test  ntest=4    nerr=2
7620   config=inmemory_journal ext/fts5/test/fts5optimize3.test  ntest=4    nerr=2
8103   config=prepare ext/fts5/test/fts5optimize2.test           ntest=4    nerr=2
8583   config=prepare test/busy2.test                            ntest=29   nerr=4
8855   config=prepare test/walsetlk.test                         ntest=40   nerr=1
8918   config=prepare ext/fts5/test/fts5optimize3.test           ntest=4    nerr=2
9032   config=prepare test/wal3.test                             ntest=1    nerr=1
9324   config=mmap ext/fts5/test/fts5optimize2.test              ntest=4    nerr=2
9805   config=mmap test/busy2.test                               ntest=29   nerr=4
10079  config=mmap test/walsetlk.test                            ntest=40   nerr=1
10142  config=mmap ext/fts5/test/fts5optimize3.test              ntest=4    nerr=2
10256  config=mmap test/wal3.test                                ntest=1    nerr=1

The 32 jobs still marked running are listed in running-jobs-at-stop.txt; they include base tests, prepare, and mmap jobs such as test/boundary2.test, test/fkey_malloc.test, test/wal2.test, config=prepare test/round1.test, config=mmap test/round1.test, and config=mmap ext/fts5/test/fts5ah.test.

Signature scan of command.log found no RuntimeError, KERNEL THROW, memory access out of bounds, unreachable, handleFcntlLock, database is locked, malformed, or UNCAUGHT. The process sample points at a host-runtime wait/deadlock shape: main thread in uv__io_poll/kevent, many worker threads in V8 FutexEmulation::WaitSync / __psynch_cvwait, with AtomicsWaitAsync/AtomicsNotify and a WasmMemoryGrow frame present. I am treating this as the next platform blocker to diagnose before retrying the full suite.

Skips/out-of-scope remain unchanged: TH3/private SQLite tests are skipped because they are not public package tests for this audit. Public all has not completed.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite audit update: raw wakeup host fix + fresh all explain

This milestone addresses the latest incomplete Node all/jobs32 quarantine (test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockexitfix-20260618-225026/quarantine-wedge-20260618-230203). That run wedged with no runtime trap signatures; the sampled process state showed an all-waiting host-runtime shape. The visible SQLite failed/running rows from that run remain diagnostic only, not a final package failure catalog.

Root-cause fix in shared host runtime:

  • host/src/kernel-worker.ts: raw completions after kernel-backed large/vector I/O now drain the kernel wakeup queue, so large writes/read-side drains cannot leave host-tracked pipe/socket readers or writers asleep indefinitely.
  • host/test/multi-worker.test.ts: added a regression that drives handleLargeWrite through raw kernel completions and proves a pending pipe reader is retried from the drained wake event.
  • This is shared CentralizedKernelWorker behavior, so it applies to both Node and browser hosts. No kernel ABI or wasm refresh was needed for this host-only change.

Validation:

  • cd host && npx vitest run test/multi-worker.test.ts => 12 tests passed.
  • git diff --check -- host/src/kernel-worker.ts host/test/multi-worker.test.ts => passed.
  • scripts/dev-shell.sh bash -c 'cd host && npx vitest run test/multi-worker.test.ts' => 12 tests passed.
  • scripts/dev-shell.sh bash -c 'cd host && npm run build' => passed; existing tsup CJS import.meta warnings only.

Fresh required SQLite enumeration after the shared host source refresh:

  • Command: scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-explain-after-rawwakeupfix-20260618-2317'"
  • Combined artifact: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-rawwakeupfix-20260618-2317/combined-summary.md
  • Node: exit 0, total_jobs=10523, done=0, failed=0, omitted=0, running=0, ready=10523, SQLite cases=0, case_errors=0.
  • Browser: exit 0, total_jobs=10523, done=0, failed=0, omitted=0, running=0, ready=10523, SQLite cases=0, case_errors=0.

Next action: start a new preserved Node public SQLite Tcl all/jobs32 retry with runner timeout 86400000 ms and outer shell timeout 86400s. Timeout/crash/wedge/interruption remains incomplete, not pass/fail. TH3/private/non-public suites remain out of scope.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh SQLite audit update: Node all/jobs32 retry active after raw wakeup fix

A new preserved Node public SQLite Tcl all/jobs32 retry is running after the shared host raw-wakeup fix and fresh Node/browser all --explain refresh.

Command and timeout policy:

  • Command: timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/workdir' --keep-workdir"
  • Runner timeout: 86400000 ms.
  • Outer shell timeout: 86400s.
  • Timeout/crash/wedge/interruption policy: incomplete, not pass/fail.

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320
  • Command log: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/command.log
  • Command record: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/command.txt
  • Preserved workdir: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/workdir
  • Current pointer: test-runs/sqlite-project-unit-all/kd-nbh-current-node-run.txt

Live log-only checkpoint, no live DB inspection:

  • Testset built in 44386ms.
  • Latest progress at publication: 00:36 tcl(5/10523) r32 ETC 21:18:07.
  • Signature scan found no RuntimeError, KERNEL THROW, memory access out of bounds, unreachable, handleFcntlLock, database is locked, malformed, or UNCAUGHT entries.
  • Node process PID 4179 is CPU-active (STAT R, about 228.7% CPU at sample).

The selected public upstream scope remains test/testrunner.tcl all with 10,523 jobs per host from the latest explain artifact. TH3/private/non-public suites remain out of scope.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite PR #692 audit update for kd-nbh: the Node all/jobs32 retry after the raw-wakeup fix is quarantined as incomplete due to a runner DB lock signature, not counted as a SQLite suite pass/fail result.

Command/timeouts:

  • timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/workdir' --keep-workdir"
  • Runner timeout: 86400000 ms; outer timeout: 86400s. Timeout/crash/wedge/interruption remains incomplete, not pass/fail.

What happened:

  • Latest normal runner counter before invalidation: 15:52 tcl(170/10523) f35 r32 ETC 16:05:46.
  • The log then emitted repeated database is locked exceptions inside upstream testrunner.tcl while mark_job_as_finished called trdb eval { BEGIN EXCLUSIVE }.
  • Signature scan artifact has 31 database is locked hits and no RuntimeError, KERNEL THROW, memory access out of bounds, unreachable, handleFcntlLock, malformed, or UNCAUGHT hits.
  • Preserved a process sample, sent SIGTERM to PGID 3286, then copied the stopped testrunner.db* before querying.

Stopped DB copy counts (integrity_check=ok):

  • total_jobs: 10523
  • states: done 135, failed 35, running 32, ready 10321, omitted 0, halt 0, blank 0
  • total_cases recorded: 11592
  • total_case_errors recorded: 479

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320
  • Command: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/command.txt
  • Log: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/command.log
  • Process sample: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/diagnostics/node-sample-before-stop-20260618-2342.txt
  • Quarantine DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/quarantine-db-lock-20260618-2343/testrunner.db*
  • Stopped counts/failure/running list: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/quarantine-db-lock-20260618-2343/stopped-db-counts.txt
  • Signature lines: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-rawwakeupfix-20260618-2320/quarantine-db-lock-20260618-2343/signature-lines.txt

Publication status: PR #692 updated here; bead notes update follows. Next step is to isolate/fix the runner DB lock/platform locking blocker before another full all retry.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite PR #692 audit update for kd-nbh: applied and validated a host-runtime fix for the runner DB lock blocker.

Change made:

  • host/src/shared-lock-table.ts: raised the default shared advisory lock table capacity from 256 entries to 8192 entries. This is about 256 KiB of SAB capacity and avoids false EAGAIN from lock-table saturation during high-concurrency SQLite byte-range locking.
  • host/test/shared-lock-table.test.ts: added default capacity handles SQLite-style lock fan-out, which inserts 1024 independent SQLite-style whole-file locks and would fail at the previous 256-entry default.

Why this targets the blocker:

  • The quarantined run emitted 31 database is locked traces from upstream testrunner.tcl while marking jobs finished.
  • The most plausible platform-side cause is lock-table saturation: SharedLockTable.setLock() returned false both for real conflicts and capacity exhaustion, which maps to EAGAIN and then SQLite database is locked.
  • This is a host-runtime change shared by Node and browser; no kernel ABI/snapshot change is involved.

Validation run after the fix:

  • git diff --check -- host/src/shared-lock-table.ts host/test/shared-lock-table.test.ts => pass
  • scripts/dev-shell.sh bash -c 'cd host && npx vitest run test/shared-lock-table.test.ts test/kernel-fcntl-lock.test.ts test/multi-worker.test.ts' => 3 files passed, 32 tests passed
  • scripts/dev-shell.sh bash -c 'cd host && npm run build' => pass; existing tsup CJS import.meta warnings only
  • Fresh required explain: scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-explain-after-lockcapacityfix-20260618-2350'"
    • Node explain: exit 0; total_jobs=10523, done=0, failed=0, omitted=0, running=0, ready=10523, cases=0, case_errors=0
    • Browser explain: exit 0; same counts
    • Combined artifact: test-runs/sqlite-project-unit-all/kd-nbh-explain-after-lockcapacityfix-20260618-2350/combined-summary.md

Publication status: PR #692 updated here; bead notes update follows. Next step: retry Node all/jobs32 with the same hard-timeout policy and classify timeout/crash/wedge/interruption as incomplete, not pass/fail.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite PR #692 audit update for kd-nbh: started the Node all/jobs32 full retry after the shared-lock-table capacity fix.

Command/timeouts:

  • timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/workdir' --keep-workdir"
  • Runner timeout: 86400000 ms; outer timeout: 86400s. Timeout/crash/wedge/interruption remains incomplete, not pass/fail.

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356
  • Command record: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/command.txt
  • Live command log: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/command.log
  • Current-run pointer: test-runs/sqlite-project-unit-all/kd-nbh-current-node-run.txt

Publication status: PR #692 updated here; bead notes update follows. I will not inspect the live runner DB; monitoring uses text log/process state only until the DB is stopped/copied.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite PR #692 audit update for kd-nbh: Node all/jobs32 after the shared-lock-table capacity change reproduced the same runner DB-lock blocker and is quarantined as incomplete. This is not a SQLite suite pass/fail result, and the capacity hypothesis is not sufficient for the full-run blocker.

Command/timeouts:

  • timeout 86400s scripts/dev-shell.sh bash -c "scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/workdir' --keep-workdir"
  • Runner timeout: 86400000 ms; outer timeout: 86400s. Manual SIGTERM after platform blocker; wrapper exit code 143.

What happened:

  • The run reached the early job-completion burst after the last visible progress counter 06:26 tcl(108/10523) f4 r32 ETC 10:20:13.
  • The log then emitted repeated database is locked exceptions inside upstream testrunner.tcl while mark_job_as_finished and one progress_report call attempted trdb eval { BEGIN EXCLUSIVE }.
  • Signature artifact has 31 database is locked hits and no RuntimeError, KERNEL THROW, memory access out of bounds, unreachable, handleFcntlLock, malformed, or UNCAUGHT hits.
  • Preserved a process sample, sent SIGTERM to PGID 77593, then copied the stopped testrunner.db* before querying.

Stopped DB copy counts (integrity_check=ok):

  • total_jobs: 10523
  • states: done 104, failed 4, running 32, ready 10383, omitted 0, halt 0, blank 0
  • total_cases recorded: 6645
  • total_case_errors recorded: 18
  • failed jobs recorded before invalidation: test/sysfault.test, test/delete.test, test/exists.test, test/bigfile2.test

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356
  • Command: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/command.txt
  • Exit code: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/exit-code.txt
  • Log: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/command.log
  • Process sample: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/diagnostics/node-sample-before-stop-20260619-0004.txt
  • Quarantine DB copy: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/quarantine-db-lock-20260619-0004/testrunner.db*
  • Stopped counts/failure/running list: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/quarantine-db-lock-20260619-0004/stopped-db-counts.txt
  • Signature lines: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-after-lockcapacityfix-20260618-2356/quarantine-db-lock-20260619-0004/signature-lines.txt

Publication status: PR #692 updated here; bead notes update follows. Next step: isolate why the runner control DB stays locked beyond the 10s upstream busy timeout. I will not retry full all again until there is a new focused diagnosis/fix and fresh Node/browser all --explain.

@brandonpayton

Copy link
Copy Markdown
Member Author

Diagnostic update for the SQLite project-unit all audit on Kandelo PR #692:

  • Added env-gated fcntl lock diagnostics (KERNEL_FCNTL_LOCK_LOG / KERNEL_FCNTL_LOCK_FILTER) and ran a bounded Node diagnostic. This was not a full completion attempt.
  • Command: timeout 1800s scripts/dev-shell.sh bash -c "KERNEL_FCNTL_LOCK_LOG=blocked KERNEL_FCNTL_LOCK_FILTER=testrunner.db scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 1800000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-fcntl-diag-node-jobs32-20260619-0012' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-fcntl-diag-node-jobs32-20260619-0012/workdir' --keep-workdir"
  • Outcome: incomplete. It did not reproduce the testrunner.db lock signature: 0 [fcntl-lock], 0 database is locked, 0 RuntimeError, 0 KERNEL THROW, 0 memory access out of bounds, 0 unreachable, 0 handleFcntlLock, 0 malformed, 0 UNCAUGHT.
  • Instead it reproduced the no-signature host-runtime wait/deadlock wedge: command.log stopped at 2026-06-19 00:15:23 EDT, latest progress 04:14 tcl(138/10523) f4 r32; Node was sleeping near 0 CPU. I manually TERM'd the process group; wrapper exit-code.txt=143.
  • Artifacts: test-runs/sqlite-project-unit-all/kd-nbh-fcntl-diag-node-jobs32-20260619-0012/; quarantine .../quarantine-wedge-20260619-0018/ with command logs, ps/lsof/sample diagnostics, signature-summary.txt, stopped-db-counts.txt, and stopped db-copy/testrunner.db*.
  • Stopped copied DB integrity_check=ok. Diagnostic counts only: total jobs 10,523; done 134; failed rows 4; running 32; ready 10,353; omitted 0; total cases recorded 8,730; case errors recorded 18.

These rows are not final SQLite pass/fail results because the suite did not complete. Current blocker is again the host-runtime wait/deadlock wedge shape, not a final SQLite failure list.

@brandonpayton

Copy link
Copy Markdown
Member Author

SQLite audit update: broad fcntl diagnostic hit process-worker OOB

This was a bounded diagnostic run, not a completed SQLite pass/fail result.

Command and timeout policy:

  • timeout 1200s scripts/dev-shell.sh bash -c "KERNEL_FCNTL_LOCK_LOG=blocked scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 1200000 --results-root 'test-runs/sqlite-project-unit-all/kd-nbh-fcntl-diag-alllocks-node-jobs32-20260619-0029' --workdir '/Users/brandon/src/kandelo/test-runs/sqlite-project-unit-all/kd-nbh-fcntl-diag-alllocks-node-jobs32-20260619-0029/workdir' --keep-workdir"
  • Runner timeout: 1,200,000 ms
  • Outer timeout: 1200s
  • Stop outcome: manual TERM after process-worker memory access out of bounds; wrapper exit-code.txt=143
  • Timeout/crash/wedge/interruption remains incomplete, not pass/fail.

Observed blocker:

  • Last visible progress before stop: 08:30 tcl(130/10523) f7 r32 ETC 11:19:11
  • First failure signature: [process-worker] Kernel worker failed: memory access out of bounds
  • Wasm stack top: dash.printf_core -> dash.vfprintf -> dash.vsnprintf -> dash.xvsnprintf -> dash.xvasprintf -> dash.doformat -> dash.outfmt -> dash.exvwarning2 -> dash.exverror -> dash.sh_error
  • Signature counts in command.post-stop.log: fcntl_lock_lines=11, database_is_locked=0, runtime_error=1, kernel_throw=0, memory_oob=2, unreachable=0, handle_fcntl_lock=0, malformed=0, uncaught=0.
  • The 11 fcntl diagnostics were job-local SQLite locks under workdir/testdir*/testdir/test.db using the expected SQLite lock-byte ranges (65536, 65537, 65538..+510), not parent testrunner.db control-DB locks.

Stopped-copy counts from copied testrunner.db only:

  • Integrity: ok
  • Total jobs: 10,523
  • Done: 124
  • Failed rows: 7
  • Running: 32
  • Ready: 10,360
  • Omitted/skipped in runner DB: 0
  • Halt: 0
  • Blank: 0
  • Cases recorded: 7,355
  • Case errors recorded: 39

Diagnostic failed rows before invalidation, not final SQLite failures:

  • test/sysfault.testntest=1, nerr=1, span=501217
  • test/like.testntest=159, nerr=1, span=24901
  • test/delete.testntest=68, nerr=6, span=12247
  • test/exists.testntest=73, nerr=9, span=72121
  • test/bigfile2.testntest=4, nerr=2, span=13971
  • test/sort4.testntest=11, nerr=5, span=142288
  • config=multithread test/sort4.testntest=27, nerr=15, span=347974

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-fcntl-diag-alllocks-node-jobs32-20260619-0029
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-fcntl-diag-alllocks-node-jobs32-20260619-0029/quarantine-process-oob-20260619-0030
  • Logs: command.pre-stop.log, command.post-stop.log, testrunner.log
  • Diagnostics: ps-run-before-stop.txt, lsof-node-before-stop.txt, sample-node-5s-before-stop.txt, fcntl-lock-lines.txt, signature-summary.txt, signature-lines-pre-stop.txt
  • Stopped DB copy: db-copy/testrunner.db*
  • Count/failure artifacts: stopped-counts.txt, stopped-state-counts.txt, failed-jobs.txt, running-jobs-at-stop.txt, failed-running-omitted-jobs.csv

Scope/skips status:

  • Public SQLite Tcl all remains the selected suite: 10,523 discovered jobs per host from the latest fresh Node/browser all --explain after the last source refresh.
  • TH3/private/non-public SQLite suites remain out of scope because they are not public upstream test assets.
  • Browser full execution has not been completed; the latest browser evidence is still enumeration/explain only.

Publication status: PR milestone comment published here; no new commits pushed for this diagnostic yet. Next step is to isolate whether the dash OOB is a standalone process crash path or a downstream symptom of one of the running SQLite jobs, then rerun fresh Node/browser all --explain after any source/host/kernel refresh before another public all retry.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh resumed the full SQLite project-unit audit from the post-OOB-fix PR head.

State:

  • Worktree: /Users/brandon/src/kandelo-gascity/worktrees/kandelo/kd-nbh-pr692-full-run
  • Commit: 0f3ce766680ac98592ac860f42c5b7f9e0167bd5
  • Target: public SQLite upstream test/testrunner.tcl all on both Kandelo Node and browser hosts.
  • Prior pre-OOB-fix SQLite rows remain diagnostic only; I am starting fresh from this commit.

Next visible milestones:

  1. Materialize/build Tcl, SQLite testfixture.wasm, and local runtime/browser artifacts in the clean worktree.
  2. Run fresh --explain for Node+browser to record selected job counts.
  3. Start the full Node run and then the full browser run with preserved workdirs/logs.

I will preserve exact commands, timeout policy, pass/fail/omit/running/job/case counts, incomplete reasons, and artifacts. I will not claim tests pass unless a selected target actually completes."

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh PR #692 progress checkpoint (2026-06-19T07:15Z): resumed from post-OOB-fix head 0f3ce766680ac98592ac860f42c5b7f9e0167bd5 in dedicated worktree /Users/brandon/src/kandelo-gascity/worktrees/kandelo/kd-nbh-pr692-full-run.

Setup is complete enough to start the full SQLite project-unit run: repo build, zlib, SQLite CLI, Tcl, and testfixture.wasm are built. Setup logs are under test-runs/sqlite-project-unit-all/kd-nbh-setup/.

Fresh --explain for scripts/run-sqlite-project-unit-tests.sh --host both --permutation all --jobs 1 --timeout-ms 600000 --explain completed with exit 0 for both hosts. Artifacts: test-runs/sqlite-project-unit-all/kd-nbh-explain-post-oobfix-20260619T071236Z/.

Explain counts:

  • Node: total_jobs=10353, done=0, failed=0, omitted=0, running=0, ready=10353, SQLite cases=0, case_errors=0.
  • Browser: total_jobs=10366, done=0, failed=0, omitted=0, running=0, ready=10366, SQLite cases=0, case_errors=0.

No pass/fail claim for the full suite yet. Starting the full Node execution next, then browser after Node reaches a terminal result.

@brandonpayton

Copy link
Copy Markdown
Member Author

kd-nbh full Node run started (2026-06-19T07:17Z).

Command/artifacts: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/
Selected command: scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z --workdir <worktree>/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/workdir --keep-workdir, wrapped with timeout 86400s inside scripts/dev-shell.sh.

I am not claiming pass/fail yet; this comment is just the visible start marker for the full Node target.

@brandonpayton

Copy link
Copy Markdown
Member Author

PR #692 kd-nbh Node full run checkpoint - incomplete

Status: incomplete platform/runtime crash-wedge, not a SQLite pass/fail result and not a test-pass claim. The runner was stopped after command.log stopped changing and the Node process was sleeping at 0% CPU. Wrapper exit code after SIGTERM: 143.

Worktree: /Users/brandon/src/kandelo-gascity/worktrees/kandelo/kd-nbh-pr692-full-run

Commit: 0f3ce766680ac98592ac860f42c5b7f9e0167bd5

Command:

timeout 86400s scripts/run-sqlite-project-unit-tests.sh --host node --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z --workdir /Users/brandon/src/kandelo-gascity/worktrees/kandelo/kd-nbh-pr692-full-run/test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/workdir --keep-workdir

Environment highlights:

  • started_utc: 2026-06-19T07:15:55Z
  • rustc: rustc 1.97.0-nightly (ca9a134e0 2026-04-26)
  • cargo: cargo 1.97.0-nightly (eb9b60f1f 2026-04-24)
  • node: /nix/store/b06wg8qk5bcvq5xcqrgq9nah23h71kj2-nodejs-24.15.0/bin/node v24.15.0
  • npm: /nix/store/b06wg8qk5bcvq5xcqrgq9nah23h71kj2-nodejs-24.15.0/bin/npm 11.12.1
  • timeout: /opt/homebrew/bin/timeout timeout (GNU coreutils) 9.11
  • bzip2_lib: /nix/store/6q4672lfdg9z419sv34q1hsnr2hlf1nq-bzip2-1.0.8/lib

Counts from copied stopped testrunner.db:

State Jobs SQLite cases Case errors
done 60 2536 0
failed 250 457 264
ready 10011 0 0
running 32 0 0
omit 0 0 0

Total jobs: 10,353. Recorded cases before stop: 2,993. Recorded case errors before stop: 264. Remaining unrun jobs: 10,011 ready plus 32 running at termination.

Omissions/skips: harness omit count is 0. Completed job output includes SQLite upstream warning Multi-threaded tests skipped: Linked against a non-threadsafe Tcl build in 59 done jobs and 7 failed jobs; the harness does not expose those as omit rows or subtest skip counts.

Failure cause buckets from the copied DB:

Likely cause Failed jobs SQLite cases Case errors
cannot-determine-platform 241 241 241
database-locked 4 93 11
expected-got-mismatch 2 53 4
allocation-abort 1 1 1
other-sqlite-error 1 1 1
readonly/permissions-mismatch 1 68 6

Command-log crash signatures before termination:

Signature Count
RuntimeError 1784
memory access out of bounds 8
unreachable 1780
handleFcntlLock 849
database is locked 0
cannot determine platform 0
UNCAUGHT 854
kernel threw 926

First trap lines:

259:[handleSyscall] UNCAUGHT ERROR pid=888: RuntimeError: unreachable
272:[handleSyscall] kernel threw for pid=528 syscall=11 args=[3408488,62936,0,0,0,0]: RuntimeError: unreachable
283:[handleSyscall] kernel threw for pid=887 syscall=19 args=[52864,56960,3962,0,0,0]: RuntimeError: unreachable
294:[handleSyscall] UNCAUGHT ERROR pid=528: RuntimeError: unreachable
300:    at CentralizedKernelWorker.handleFcntlLock (file:///Users/brandon/src/kandelo-gascity/worktrees/kandelo/kd-nbh-pr692-full-run/host/dist/node-kernel-worker-entry.js:6967:7)

Last visible runner progress lines:

05:26 tcl(236/10353) f190 r32 ETC 03:52:41                                     
05:43 tcl(237/10353) f190 r32 ETC 04:04:18                                     
FAILED: config=mmap test/tkt2927.test (22)                                     
05:59 tcl(243/10353) f196 r32 ETC 04:08:52                                     
FAILED: config=mmap test/hook.test (22)                                        
06:09 tcl(249/10353) f202 r32 ETC 04:09:14                                     
FAILED: config=mmap test/main.test (22)                                        
06:20 tcl(257/10353) f208 r32 ETC 04:09:02                                     
FAILED: config=mmap test/tkt3791.test (22)                                     
06:47 tcl(264/10353) f214 r32 ETC 04:19:17                                     
FAILED: config=mmap test/shmlock.test (22)                                     
06:56 tcl(271/10353) f220 r32 ETC 04:18:06                                     
FAILED: config=mmap test/join4.test (22)                                       
07:04 tcl(279/10353) f226 r32 ETC 04:15:12                                     
FAILED: config=mmap ext/rbu/rbuvacuum3.test (22)                               
07:13 tcl(286/10353) f232 r32 ETC 04:13:51                                     
FAILED: config=mmap test/subquery.test (22)                                    
07:28 tcl(294/10353) f238 r32 ETC 04:15:30                                     
07:52 tcl(296/10353) f238 r32 ETC 04:27:11                                     
FAILED: config=mmap test/fts4noti.test (22)                                    
08:05 tcl(302/10353) f244 r32 ETC 04:28:52                                     
FAILED: config=mmap ext/rtree/rtree1.test (22)                                 

Artifacts:

  • Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z
  • Quarantine: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/quarantine-20260619T073039Z
  • Command log before termination: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/quarantine-20260619T073039Z/command.log.before-terminate
  • Copied DB/WAL/SHM: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/quarantine-20260619T073039Z/testrunner.db*
  • Failure catalog with excerpts: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/quarantine-20260619T073039Z/node-failures.tsv
  • Failure catalog grouped by cause: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/quarantine-20260619T073039Z/node-failure-catalog-by-cause.tsv
  • Running jobs at termination: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/quarantine-20260619T073039Z/node-running.tsv
  • Process diagnostics: test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/quarantine-20260619T073039Z/ps-before-terminate.txt, test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/quarantine-20260619T073039Z/sample-node-22078.txt, test-runs/sqlite-project-unit-all/kd-nbh-all-node-jobs32-post-oobfix-20260619T071553Z/quarantine-20260619T073039Z/lsof-node-22078.txt

Browser status: fresh post-OOB-fix explain completed earlier with browser total_jobs=10,366, done=0, failed=0, omitted=0, running=0, ready=10,366, cases=0, case_errors=0. Full browser execution has not started yet in this resumed session; starting it next unless this Node platform crash-wedge is declared blocking for browser.

Complete failed-job catalog grouped by likely cause (output excerpts are in node-failure-catalog-by-cause.tsv):

allocation-abort

  • job 1082 test/wal3.test cases=1 errors=1 span_ms=131942

cannot-determine-platform

  • job 1611 config=memsubsys1 ext/fts5/test/fts5secure7.test cases=1 errors=1 span_ms=13536
  • job 1745 config=memsubsys1 test/temptable2.test cases=1 errors=1 span_ms=13132
  • job 1809 config=memsubsys1 ext/fts5/test/fts5ah.test cases=1 errors=1 span_ms=13135
  • job 1844 config=memsubsys1 ext/rbu/rbumulti.test cases=1 errors=1 span_ms=13124
  • job 1890 config=memsubsys1 ext/rbu/rburesume.test cases=1 errors=1 span_ms=13217
  • job 1894 config=memsubsys1 ext/fts5/test/fts5optimize3.test cases=1 errors=1 span_ms=21196
  • job 2052 config=memsubsys1 ext/fts5/test/fts5secure3.test cases=1 errors=1 span_ms=13291
  • job 2108 config=memsubsys1 test/round1.test cases=1 errors=1 span_ms=13236
  • job 2136 config=memsubsys1 test/joinD.test cases=1 errors=1 span_ms=13237
  • job 2144 config=memsubsys1 ext/rbu/rbutemplimit.test cases=1 errors=1 span_ms=13061
  • job 2161 config=memsubsys1 test/memjournal2.test cases=1 errors=1 span_ms=28182
  • job 2255 config=memsubsys1 test/walsetlk.test cases=1 errors=1 span_ms=28660
  • job 2307 config=memsubsys1 test/vacuum6.test cases=1 errors=1 span_ms=28950
  • job 2330 config=memsubsys1 ext/fts5/test/fts5optimize2.test cases=1 errors=1 span_ms=21608
  • job 2372 config=memsubsys1 test/vacuummem.test cases=1 errors=1 span_ms=28986
  • job 2447 config=memsubsys1 test/wal3.test cases=1 errors=1 span_ms=29398
  • job 2717 config=memsubsys1 test/walvfs.test cases=1 errors=1 span_ms=29734
  • job 2718 config=memsubsys1 test/win32lock.test cases=1 errors=1 span_ms=29931
  • job 2918 config=memsubsys2 ext/fts5/test/fts5secure7.test cases=1 errors=1 span_ms=30060
  • job 3052 config=memsubsys2 test/temptable2.test cases=1 errors=1 span_ms=30159
  • job 3117 config=memsubsys2 ext/fts5/test/fts5ah.test cases=1 errors=1 span_ms=30216
  • job 3152 config=memsubsys2 ext/rbu/rbumulti.test cases=1 errors=1 span_ms=30114
  • job 3198 config=memsubsys2 ext/rbu/rburesume.test cases=1 errors=1 span_ms=30447
  • job 3202 config=memsubsys2 ext/fts5/test/fts5optimize3.test cases=1 errors=1 span_ms=21853
  • job 3360 config=memsubsys2 ext/fts5/test/fts5secure3.test cases=1 errors=1 span_ms=30682
  • job 3416 config=memsubsys2 test/round1.test cases=1 errors=1 span_ms=30655
  • job 3444 config=memsubsys2 test/joinD.test cases=1 errors=1 span_ms=31192
  • job 3452 config=memsubsys2 ext/rbu/rbutemplimit.test cases=1 errors=1 span_ms=10503
  • job 3469 config=memsubsys2 test/memjournal2.test cases=1 errors=1 span_ms=10129
  • job 3563 config=memsubsys2 test/walsetlk.test cases=1 errors=1 span_ms=9815
  • job 3615 config=memsubsys2 test/vacuum6.test cases=1 errors=1 span_ms=9719
  • job 3638 config=memsubsys2 ext/fts5/test/fts5optimize2.test cases=1 errors=1 span_ms=22126
  • job 3680 config=memsubsys2 test/vacuummem.test cases=1 errors=1 span_ms=9602
  • job 3755 config=memsubsys2 test/wal3.test cases=1 errors=1 span_ms=9379
  • job 4025 config=memsubsys2 test/walvfs.test cases=1 errors=1 span_ms=9193
  • job 4026 config=memsubsys2 test/win32lock.test cases=1 errors=1 span_ms=9097
  • job 4070 config=multithread test/sort4.test cases=1 errors=1 span_ms=22161
  • job 4363 config=no_mutex_try ext/fts5/test/fts5secure7.test cases=1 errors=1 span_ms=8938
  • job 4497 config=no_mutex_try test/temptable2.test cases=1 errors=1 span_ms=8972
  • job 4562 config=no_mutex_try ext/fts5/test/fts5ah.test cases=1 errors=1 span_ms=8823
  • job 4598 config=no_mutex_try ext/rbu/rbumulti.test cases=1 errors=1 span_ms=8271
  • job 4644 config=no_mutex_try ext/rbu/rburesume.test cases=1 errors=1 span_ms=8025
  • job 4648 config=no_mutex_try ext/fts5/test/fts5optimize3.test cases=1 errors=1 span_ms=22332
  • job 4807 config=no_mutex_try ext/fts5/test/fts5secure3.test cases=1 errors=1 span_ms=8402
  • job 4863 config=no_mutex_try test/round1.test cases=1 errors=1 span_ms=7810
  • job 4891 config=no_mutex_try test/joinD.test cases=1 errors=1 span_ms=7429
  • job 4899 config=no_mutex_try ext/rbu/rbutemplimit.test cases=1 errors=1 span_ms=7425
  • job 4916 config=no_mutex_try test/memjournal2.test cases=1 errors=1 span_ms=7530
  • job 5010 config=no_mutex_try test/walsetlk.test cases=1 errors=1 span_ms=7255
  • job 5062 config=no_mutex_try test/vacuum6.test cases=1 errors=1 span_ms=7361
  • job 5085 config=no_mutex_try ext/fts5/test/fts5optimize2.test cases=1 errors=1 span_ms=22498
  • job 5127 config=no_mutex_try test/vacuummem.test cases=1 errors=1 span_ms=7455
  • job 5202 config=no_mutex_try test/wal3.test cases=1 errors=1 span_ms=7691
  • job 5472 config=no_mutex_try test/walvfs.test cases=1 errors=1 span_ms=7706
  • job 5473 config=no_mutex_try test/win32lock.test cases=1 errors=1 span_ms=7728
  • job 5663 config=journaltest ext/fts5/test/fts5secure7.test cases=1 errors=1 span_ms=7688
  • job 5790 config=journaltest test/temptable2.test cases=1 errors=1 span_ms=7860
  • job 5843 config=journaltest ext/fts5/test/fts5ah.test cases=1 errors=1 span_ms=7892
  • job 5870 config=journaltest ext/rbu/rbumulti.test cases=1 errors=1 span_ms=7827
  • job 5908 config=journaltest ext/rbu/rburesume.test cases=1 errors=1 span_ms=13826
  • job 5912 config=journaltest ext/fts5/test/fts5optimize3.test cases=1 errors=1 span_ms=22492
  • job 6056 config=journaltest ext/fts5/test/fts5secure3.test cases=1 errors=1 span_ms=14017
  • job 6103 config=journaltest test/round1.test cases=1 errors=1 span_ms=17020
  • job 6131 config=journaltest test/joinD.test cases=1 errors=1 span_ms=17015
  • job 6139 config=journaltest ext/rbu/rbutemplimit.test cases=1 errors=1 span_ms=17274
  • job 6152 config=journaltest test/memjournal2.test cases=1 errors=1 span_ms=17629
  • job 6275 config=journaltest test/vacuum6.test cases=1 errors=1 span_ms=17694
  • job 6293 config=journaltest ext/fts5/test/fts5optimize2.test cases=1 errors=1 span_ms=22642
  • job 6330 config=journaltest test/vacuummem.test cases=1 errors=1 span_ms=17719
  • job 6635 config=journaltest test/win32lock.test cases=1 errors=1 span_ms=17549
  • job 6816 config=inmemory_journal ext/fts5/test/fts5secure7.test cases=1 errors=1 span_ms=17525
  • job 6947 config=inmemory_journal test/temptable2.test cases=1 errors=1 span_ms=17458
  • job 7009 config=inmemory_journal ext/fts5/test/fts5ah.test cases=1 errors=1 span_ms=17436
  • job 7042 config=inmemory_journal ext/rbu/rbumulti.test cases=1 errors=1 span_ms=17362
  • job 7088 config=inmemory_journal ext/rbu/rburesume.test cases=1 errors=1 span_ms=17503
  • job 7092 config=inmemory_journal ext/fts5/test/fts5optimize3.test cases=1 errors=1 span_ms=22760
  • job 7238 config=inmemory_journal ext/fts5/test/fts5secure3.test cases=1 errors=1 span_ms=17524
  • job 7289 config=inmemory_journal test/round1.test cases=1 errors=1 span_ms=9214
  • job 7317 config=inmemory_journal test/joinD.test cases=1 errors=1 span_ms=9161
  • job 7325 config=inmemory_journal ext/rbu/rbutemplimit.test cases=1 errors=1 span_ms=8940
  • job 7341 config=inmemory_journal test/memjournal2.test cases=1 errors=1 span_ms=8588
  • job 7480 config=inmemory_journal test/vacuum6.test cases=1 errors=1 span_ms=8458
  • job 7501 config=inmemory_journal ext/fts5/test/fts5optimize2.test cases=1 errors=1 span_ms=23003
  • job 7541 config=inmemory_journal test/vacuummem.test cases=1 errors=1 span_ms=8390
  • job 7868 config=inmemory_journal test/win32lock.test cases=1 errors=1 span_ms=8358
  • job 8119 config=prepare ext/fts5/test/fts5secure7.test cases=1 errors=1 span_ms=8348
  • job 8248 config=prepare test/temptable2.test cases=1 errors=1 span_ms=8525
  • job 8305 config=prepare ext/fts5/test/fts5ah.test cases=1 errors=1 span_ms=8540
  • job 8333 config=prepare ext/rbu/rbumulti.test cases=1 errors=1 span_ms=8457
  • job 8372 config=prepare ext/rbu/rburesume.test cases=1 errors=1 span_ms=8514
  • job 8376 config=prepare ext/fts5/test/fts5optimize3.test cases=1 errors=1 span_ms=22843
  • job 8528 config=prepare ext/fts5/test/fts5secure3.test cases=1 errors=1 span_ms=8549
  • job 8579 config=prepare test/round1.test cases=1 errors=1 span_ms=18332
  • job 8607 config=prepare test/joinD.test cases=1 errors=1 span_ms=18252
  • job 8615 config=prepare ext/rbu/rbutemplimit.test cases=1 errors=1 span_ms=18169
  • job 8629 config=prepare test/memjournal2.test cases=1 errors=1 span_ms=18127
  • job 8710 config=prepare test/walsetlk.test cases=1 errors=1 span_ms=18172
  • job 8757 config=prepare test/vacuum6.test cases=1 errors=1 span_ms=18312
  • job 8778 config=prepare ext/fts5/test/fts5optimize2.test cases=1 errors=1 span_ms=22945
  • job 8818 config=prepare test/vacuummem.test cases=1 errors=1 span_ms=18533
  • job 8891 config=prepare test/wal3.test cases=1 errors=1 span_ms=18704
  • job 9135 config=prepare test/walvfs.test cases=1 errors=1 span_ms=18808
  • job 9136 config=prepare test/win32lock.test cases=1 errors=1 span_ms=18866
  • job 9319 config=mmap ext/fts5/test/fts5secure7.test cases=1 errors=1 span_ms=18896
  • job 9448 config=mmap test/temptable2.test cases=1 errors=1 span_ms=18975
  • job 9505 config=mmap ext/fts5/test/fts5ah.test cases=1 errors=1 span_ms=19015
  • job 9533 config=mmap ext/rbu/rbumulti.test cases=1 errors=1 span_ms=9993
  • job 9572 config=mmap ext/rbu/rburesume.test cases=1 errors=1 span_ms=10198
  • job 9576 config=mmap ext/fts5/test/fts5optimize3.test cases=1 errors=1 span_ms=22834
  • job 9728 config=mmap ext/fts5/test/fts5secure3.test cases=1 errors=1 span_ms=10327
  • job 9779 config=mmap test/round1.test cases=1 errors=1 span_ms=10427
  • job 9807 config=mmap test/joinD.test cases=1 errors=1 span_ms=10772
  • job 9815 config=mmap ext/rbu/rbutemplimit.test cases=1 errors=1 span_ms=10739
  • job 9829 config=mmap test/memjournal2.test cases=1 errors=1 span_ms=10749
  • job 9912 config=mmap test/walsetlk.test cases=1 errors=1 span_ms=10799
  • job 9959 config=mmap test/vacuum6.test cases=1 errors=1 span_ms=10911
  • job 9980 config=mmap ext/fts5/test/fts5optimize2.test cases=1 errors=1 span_ms=22805
  • job 10020 config=mmap test/vacuummem.test cases=1 errors=1 span_ms=11109
  • job 10093 config=mmap test/wal3.test cases=1 errors=1 span_ms=11089
  • job 10232 config=mmap test/pragma3.test cases=1 errors=1 span_ms=17799
  • job 10233 config=mmap test/tkt3824.test cases=1 errors=1 span_ms=18532
  • job 10234 config=mmap test/fts3ac.test cases=1 errors=1 span_ms=18870
  • job 10235 config=mmap test/capi3d.test cases=1 errors=1 span_ms=18701
  • job 10236 config=mmap test/in6.test cases=1 errors=1 span_ms=18501
  • job 10237 config=mmap ext/rtree/rtree1.test cases=1 errors=1 span_ms=18296
  • job 10238 config=mmap ext/rbu/rbu13.test cases=1 errors=1 span_ms=36916
  • job 10239 config=mmap test/enc2.test cases=1 errors=1 span_ms=35871
  • job 10240 config=mmap test/indexA.test cases=1 errors=1 span_ms=35730
  • job 10241 config=mmap test/coveridxscan.test cases=1 errors=1 span_ms=35973
  • job 10242 config=mmap test/in3.test cases=1 errors=1 span_ms=36062
  • job 10243 config=mmap test/fts4noti.test cases=1 errors=1 span_ms=36200
  • job 10244 config=mmap test/e_uri.test cases=1 errors=1 span_ms=15176
  • job 10245 config=mmap ext/rbu/rbu10.test cases=1 errors=1 span_ms=15202
  • job 10246 config=mmap test/bigmmap.test cases=1 errors=1 span_ms=14989
  • job 10247 config=mmap test/tkt2450.test cases=1 errors=1 span_ms=14793
  • job 10248 config=mmap test/superlock.test cases=1 errors=1 span_ms=14765
  • job 10249 config=mmap test/subquery.test cases=1 errors=1 span_ms=14752
  • job 10250 config=mmap test/subquery2.test cases=1 errors=1 span_ms=8173
  • job 10251 config=mmap ext/session/sessionat.test cases=1 errors=1 span_ms=7980
  • job 10252 config=mmap ext/rbu/rbuC.test cases=1 errors=1 span_ms=7887
  • job 10253 config=mmap test/join7.test cases=1 errors=1 span_ms=7830
  • job 10254 config=mmap test/fts3aux1.test cases=1 errors=1 span_ms=7970
  • job 10255 config=mmap ext/rbu/rbuvacuum3.test cases=1 errors=1 span_ms=7940
  • job 10256 config=mmap test/index8.test cases=1 errors=1 span_ms=7854
  • job 10257 config=mmap test/temptable3.test cases=1 errors=1 span_ms=7905
  • job 10258 config=mmap test/bestindexD.test cases=1 errors=1 span_ms=7990
  • job 10259 config=mmap test/tkt-8454a207b9.test cases=1 errors=1 span_ms=8131
  • job 10260 config=mmap test/alter3.test cases=1 errors=1 span_ms=7894
  • job 10261 config=mmap test/join4.test cases=1 errors=1 span_ms=7970
  • job 10262 config=mmap ext/fts5/test/fts5umlaut.test cases=1 errors=1 span_ms=9156
  • job 10263 config=mmap test/index5.test cases=1 errors=1 span_ms=9628
  • job 10264 config=mmap test/keyword1.test cases=1 errors=1 span_ms=9879
  • job 10265 config=mmap test/collateB.test cases=1 errors=1 span_ms=10052
  • job 10266 config=mmap test/bestindexA.test cases=1 errors=1 span_ms=10256
  • job 10267 config=mmap test/shmlock.test cases=1 errors=1 span_ms=10262
  • job 10268 config=mmap ext/recover/recoverpgsz.test cases=1 errors=1 span_ms=26718
  • job 10269 config=mmap test/index2.test cases=1 errors=1 span_ms=26291
  • job 10270 config=mmap test/tkt-f777251dc7a.test cases=1 errors=1 span_ms=26060
  • job 10271 config=mmap test/tempdb.test cases=1 errors=1 span_ms=25897
  • job 10272 config=mmap test/interrupt2.test cases=1 errors=1 span_ms=25658
  • job 10273 config=mmap test/tkt3791.test cases=1 errors=1 span_ms=25592
  • job 10274 config=mmap ext/rtree/rtreeconnect.test cases=1 errors=1 span_ms=11593
  • job 10275 config=mmap ext/fts5/test/fts5rowid.test cases=1 errors=1 span_ms=11593
  • job 10276 config=mmap test/tableapi.test cases=1 errors=1 span_ms=11510
  • job 10277 config=mmap ext/rbu/rbu7.test cases=1 errors=1 span_ms=11799
  • job 10278 config=mmap ext/rbu/rburename.test cases=1 errors=1 span_ms=12297
  • job 10279 config=mmap test/main.test cases=1 errors=1 span_ms=12320
  • job 10280 config=mmap test/collate9.test cases=1 errors=1 span_ms=9590
  • job 10281 config=mmap ext/fts5/test/fts5integrity2.test cases=1 errors=1 span_ms=9777
  • job 10282 config=mmap test/bestindex8.test cases=1 errors=1 span_ms=9754
  • job 10283 config=mmap test/ptrchng.test cases=1 errors=1 span_ms=9618
  • job 10284 config=mmap ext/fts5/test/fts5doclist.test cases=1 errors=1 span_ms=9211
  • job 10285 config=mmap test/hook.test cases=1 errors=1 span_ms=9287
  • job 10286 config=mmap test/fordelete.test cases=1 errors=1 span_ms=33772
  • job 10287 config=mmap test/bestindex5.test cases=1 errors=1 span_ms=34032
  • job 10288 config=mmap ext/rbu/rbu1.test cases=1 errors=1 span_ms=34181
  • job 10289 config=mmap ext/rbu/rbucrash2.test cases=1 errors=1 span_ms=34371
  • job 10290 config=mmap test/collate3.test cases=1 errors=1 span_ms=34595
  • job 10291 config=mmap test/tkt2927.test cases=1 errors=1 span_ms=34882
  • job 10292 config=mmap test/bestindex2.test cases=1 errors=1 span_ms=45270
  • job 10293 config=mmap test/sqldiff1.test cases=1 errors=1 span_ms=44839
  • job 10294 config=mmap test/mmap2.test cases=1 errors=1 span_ms=44687
  • job 10295 config=mmap test/close.test cases=1 errors=1 span_ms=44512
  • job 10296 config=mmap test/eqp.test cases=1 errors=1 span_ms=44250
  • job 10297 config=mmap test/expr.test cases=1 errors=1 span_ms=43832
  • job 10298 config=mmap test/date2.test cases=1 errors=1 span_ms=8300
  • job 10299 config=mmap test/icu.test cases=1 errors=1 span_ms=8214
  • job 10300 config=mmap ext/fts5/test/fts5al.test cases=1 errors=1 span_ms=8321
  • job 10301 config=mmap test/init.test cases=1 errors=1 span_ms=8201
  • job 10302 config=mmap test/view2.test cases=1 errors=1 span_ms=8125
  • job 10303 config=mmap test/memdb.test cases=1 errors=1 span_ms=8338
  • job 10304 config=mmap test/selectF.test cases=1 errors=1 span_ms=8323
  • job 10305 config=mmap ext/fts5/test/fts5ai.test cases=1 errors=1 span_ms=8315
  • job 10306 config=mmap test/tkt3522.test cases=1 errors=1 span_ms=8260
  • job 10307 config=mmap test/tkt2409.test cases=1 errors=1 span_ms=8226
  • job 10308 config=mmap test/fts-9fd058691.test cases=1 errors=1 span_ms=8212
  • job 10309 config=mmap test/selectC.test cases=1 errors=1 span_ms=8580
  • job 10310 config=mmap ext/fts5/test/fts5af.test cases=1 errors=1 span_ms=10354
  • job 10311 config=mmap test/sorterref.test cases=1 errors=1 span_ms=10403
  • job 10312 config=mmap test/without_rowid5.test cases=1 errors=1 span_ms=10478
  • job 10313 config=mmap test/misuse.test cases=1 errors=1 span_ms=10559
  • job 10314 config=mmap test/shell8.test cases=1 errors=1 span_ms=11130
  • job 10315 config=mmap ext/fts5/test/fts5tokenizer2.test cases=1 errors=1 span_ms=8483
  • job 10316 config=mmap test/without_rowid2.test cases=1 errors=1 span_ms=8504
  • job 10317 config=mmap test/printf.test cases=1 errors=1 span_ms=8454
  • job 10318 config=mmap test/tkt3761.test cases=1 errors=1 span_ms=7998
  • job 10319 config=mmap test/shell5.test cases=1 errors=1 span_ms=7889
  • job 10320 config=mmap test/multiplex2.test cases=1 errors=1 span_ms=8877
  • job 10321 config=mmap test/pushdown.test cases=1 errors=1 span_ms=8793
  • job 10322 config=mmap test/schema.test cases=1 errors=1 span_ms=8659
  • job 10323 config=mmap test/tkt3757.test cases=1 errors=1 span_ms=8660
  • job 10324 config=mmap test/windowerr.test cases=1 errors=1 span_ms=8690
  • job 10325 config=mmap test/shell2.test cases=1 errors=1 span_ms=9257
  • job 10326 config=mmap test/tkt1536.test cases=1 errors=1 span_ms=9670
  • job 10327 config=mmap test/tkt3508.test cases=1 errors=1 span_ms=10139
  • job 10328 config=mmap test/tkt2141.test cases=1 errors=1 span_ms=10482
  • job 10329 config=mmap test/subtype1.test cases=1 errors=1 span_ms=10829
  • job 10330 config=mmap test/select7.test cases=1 errors=1 span_ms=10554
  • job 10331 config=mmap test/exclusive.test cases=1 errors=1 span_ms=10179
  • job 10332 config=mmap test/offset1.test cases=1 errors=1 span_ms=10013
  • job 10333 config=mmap ext/fts5/test/fts5hash.test cases=1 errors=1 span_ms=10257
  • job 10334 config=mmap test/tkt-d82e3f3721.test cases=1 errors=1 span_ms=9821
  • job 10335 config=mmap test/sharedB.test cases=1 errors=1 span_ms=15311
  • job 10336 config=mmap test/dbstatus.test cases=1 errors=1 span_ms=15514
  • job 10337 config=mmap test/walvfs.test cases=1 errors=1 span_ms=11259
  • job 10338 config=mmap test/win32lock.test cases=1 errors=1 span_ms=11252
  • job 10339 config=mmap test/select4.test cases=1 errors=1 span_ms=15309
  • job 10340 config=mmap ext/intck/intck1.test cases=1 errors=1 span_ms=23111
  • job 10341 config=mmap test/jrnlmode2.test cases=1 errors=1 span_ms=23642
  • job 10342 config=mmap test/autoanalyze1.test cases=1 errors=1 span_ms=23187
  • job 10343 config=mmap test/tkt-4c86b126f2.test cases=1 errors=1 span_ms=23324
  • job 10344 config=mmap test/select1.test cases=1 errors=1 span_ms=23359
  • job 10345 config=mmap test/tkt-99378177930f87bd.test cases=1 errors=1 span_ms=14830
  • job 10346 config=mmap test/shared9.test cases=1 errors=1 span_ms=23962
  • job 10347 config=mmap ext/fts5/test/fts5simple3.test cases=1 errors=1 span_ms=19020
  • job 10348 config=mmap test/transitive1.test cases=1 errors=1 span_ms=19070
  • job 10349 config=mmap test/alterdropcol.test cases=1 errors=1 span_ms=19068
  • job 10350 config=mmap test/windowpushd.test cases=1 errors=1 span_ms=19075
  • job 10351 config=mmap test/shared6.test cases=1 errors=1 span_ms=6367
  • job 10352 config=mmap ext/rtree/tkt3363.test cases=1 errors=1 span_ms=6748
  • job 10353 config=mmap test/tkt-3998683a16.test cases=1 errors=1 span_ms=7021

database-locked

  • job 5 test/writecrash.test cases=74 errors=4 span_ms=85767
  • job 15 ext/fts5/test/fts5bigid.test cases=11 errors=3 span_ms=66611
  • job 501 ext/fts5/test/fts5optimize3.test cases=4 errors=2 span_ms=49455
  • job 963 ext/fts5/test/fts5optimize2.test cases=4 errors=2 span_ms=64957

expected-got-mismatch

  • job 24 test/oserror.test cases=13 errors=3 span_ms=43042
  • job 879 test/walsetlk.test cases=40 errors=1 span_ms=35707

other-sqlite-error

  • job 8 ext/fts5/test/fts5fault4.test cases=1 errors=1 span_ms=153550

readonly/permissions-mismatch

  • job 54 test/delete.test cases=68 errors=6 span_ms=55236

Running jobs at termination:

  • job 2 test/sysfault.test
  • job 6 ext/recover/recoverfault2.test
  • job 12 ext/fts5/test/fts5fault1.test
  • job 22 test/manydb.test
  • job 27 ext/recover/recovercorrupt.test
  • job 38 test/boundary2.test
  • job 42 test/vtabC.test
  • job 48 test/fkey_malloc.test
  • job 51 test/savepoint6.test
  • job 58 test/walfault.test
  • job 61 test/e_walckpt.test
  • job 72 test/savepoint.test
  • job 73 test/vtab4.test
  • job 74 test/autoinc.test
  • job 75 test/rowvalue7.test
  • job 76 ext/fts5/test/fts5corrupt.test
  • job 191 ext/fts5/test/fts5secure7.test
  • job 334 test/temptable2.test
  • job 406 ext/fts5/test/fts5ah.test
  • job 666 ext/fts5/test/fts5secure3.test
  • job 721 test/sort4.test
  • job 725 test/round1.test
  • job 755 test/joinD.test
  • job 780 test/memjournal2.test
  • job 938 test/vacuum6.test
  • job 1007 test/vacuummem.test
  • job 10226 config=mmap test/sidedelete.test
  • job 10227 config=mmap test/rdonly.test
  • job 10228 config=mmap test/altertab2.test
  • job 10229 config=mmap test/temptrigger.test
  • job 10230 config=mmap test/softheap1.test
  • job 10231 config=mmap test/joinC.test

@brandonpayton

Copy link
Copy Markdown
Member Author

PR #692 kd-nbh browser full run started

After the Node run was published as incomplete at #692 (comment), I started the full browser target from the dedicated worktree.

Command: timeout 86400s scripts/run-sqlite-project-unit-tests.sh --host browser --permutation all --jobs 32 --timeout-ms 86400000 --results-root test-runs/sqlite-project-unit-all/kd-nbh-all-browser-jobs32-post-oobfix-20260619T073638Z --workdir /Users/brandon/src/kandelo-gascity/worktrees/kandelo/kd-nbh-pr692-full-run/test-runs/sqlite-project-unit-all/kd-nbh-all-browser-jobs32-post-oobfix-20260619T073638Z/workdir --keep-workdir

Environment: PR head 0f3ce766680ac98592ac860f42c5b7f9e0167bd5, Node v24.15.0 from Nix, Rust nightly 2026-04-27, same bzip2 link workaround as setup.

Artifacts: test-runs/sqlite-project-unit-all/kd-nbh-all-browser-jobs32-post-oobfix-20260619T073638Z/command.log, environment.txt, command.txt. Initial log reached Vite at http://127.0.0.1:5200/pages/sqlite-test/. No browser pass/fail claim yet.

@brandonpayton

Copy link
Copy Markdown
Member Author

PR #692 kd-nbh browser active-run checkpoint

The full browser target is still running, not complete and not a pass claim.

Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-browser-jobs32-post-oobfix-20260619T073638Z
Command log: test-runs/sqlite-project-unit-all/kd-nbh-all-browser-jobs32-post-oobfix-20260619T073638Z/command.log

Latest log-only progress sample: 09:12 tcl(134/10366) f102 r32 ETC 11:41:45. The log has grown through Jun 19 03:46:54 local time; renderer process is still CPU-active. Signature scan so far found no RuntimeError, memory access out of bounds, unreachable, handleFcntlLock, database is locked, cannot determine platform, UNCAUGHT, kernel threw, malformed, pageerror, or console.error lines in the browser command log.

I am not reading live testrunner.db; counts above are log-only until the runner exits or is quarantined.

@brandonpayton

Copy link
Copy Markdown
Member Author

PR #692 kd-nbh browser 30-minute checkpoint

Browser full target is still running. This is not complete and not a pass claim.

Run root: test-runs/sqlite-project-unit-all/kd-nbh-all-browser-jobs32-post-oobfix-20260619T073638Z
Latest log-only progress: 27:31 tcl(182/10366) f134 r32 ETC 25:39:59
Log size/mtime: 9568 bytes, Jun 19 04:05:39 2026 local.
Process state: browser runner and Chromium renderer are alive; renderer sampled at nonzero CPU.
Signature scan remains empty for RuntimeError, memory access out of bounds, unreachable, handleFcntlLock, database is locked, cannot determine platform, UNCAUGHT, kernel threw, malformed, pageerror, and console.error.

Still not reading live testrunner.db; final counts will come from the harness result or from a copied stopped DB only if the run has to be quarantined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants