Summary
A live Fusion project root is both the engine's data directory (.fusion/, with fusion.db held open in WAL mode and written continuously) and a git work tree that Fusion itself drives with heavy automated git activity (per-task fusion/fn-* branches, auto-merges into main, and per-task worktrees under .worktrees/).
If .fusion/ (or just fusion.db) ever ends up tracked in any branch, then any git checkout / merge / rebase that touches that tracked blob makes git overwrite fusion.db byte-for-byte on disk while SQLite still has it open. The on-disk file instantly desyncs from the live WAL/page cache, producing genuine page-level corruption:
Error: database disk image is malformed
[runtime] Failed to start InProcessRuntime: database disk image is malformed
This is distinct from the internal write-pattern corruption in #24 (unbatched INSERTs / WAL churn). Here the engine's writes are fine — an external process (git, invoked by Fusion's own workflow) truncates and rewrites the file under the open handle.
Environment
- Fusion version:
0.48.0
- Node.js:
v25.5.0
- SQLite driver:
node:sqlite
- OS / FS: Linux (Ubuntu), local ext4 on LVM,
noatime — not a network/fuse filesystem, so this is not an NFS-locking issue
- DB:
fusion.db, WAL mode, synchronous=normal
Observed impact
In one project (an OCR pipeline with very high automated-merge volume — ~64 merges/checkouts in a week), fusion.db corrupted 4 times in 5 days. Each time the engine crash-looped at store.init and every /api/* call returned HTTP 500, while fusion.service itself stayed active (only the child engine dies). Recovery each time required manual sqlite3 fusion.db '.recover'.
Tell-tale signs that pointed at git rather than internal corruption:
- The corruption backups were named after the task in flight (
fusion.db.corrupted-FN070, -FN088) — i.e. corruption coincided with task/branch git operations.
PRAGMA integrity_check fails outright (page-level), and FTS rebuild / WAL replay do not help — only .recover does.
- Inspecting the branches:
.fusion/ was tracked, and not just fusion.db — the entire runtime dir was committed (agent-memory/, agents/*.jsonl runlogs, tasks/*, archive.db, fusion-central.db, plus *-wal/*-shm sidecars).
Why the obvious mitigation is insufficient
The project had already tried the natural fix: a commit adding .fusion/ to .gitignore and untracking it on main. That stopped corruption originating from operations on main, but corruption recurred because:
.gitignore does not untrack already-committed files. Several unmerged task branches created before the ignore commit still carried the full .fusion/ tree.
- Per-task branches/worktrees re-materialize the tracked blob. Fusion checks out task branches into
.worktrees/<name>. Creating/refreshing such a worktree for a branch that tracks .fusion/ writes a stale fusion.db into it; merging/rebasing that branch toward main reintroduces and rewrites the tracked DB. I directly observed an active per-task worktree showing M .fusion/fusion.db — a tracked, modified live DB inside a worktree.
So as long as any ref anywhere tracks .fusion/, the live DB remains a landmine that the next automated merge can detonate.
Root cause (design flaw)
Fusion stores the live engine state inside the git work tree that it also performs automated git operations on, and does nothing to guarantee .fusion/ stays untracked. Git is fundamentally allowed to overwrite any tracked path during checkout/merge/rebase, with no awareness that a path is an open SQLite database. The combination is inherently unsafe.
Suggested fixes (in rough priority order)
- Keep the live DB out of the work tree entirely. Store the engine's mutable state (
fusion.db, archive.db, fusion-central.db, WAL/SHM, runlogs) under a per-project state dir outside the versioned tree (e.g. keyed by project id under the Fusion home), so no git operation can ever touch it. The work tree should hold only source + intentionally-versioned task artifacts.
- If
.fusion/ must live in the tree, hard-enforce that it is never tracked. On project init and on engine start, write the ignore rule (and .git/info/exclude), and scan all refs — if .fusion/ is tracked in HEAD or any branch, refuse to start (or loudly warn) and offer to auto-run git rm -r --cached .fusion/ + commit on the offending refs. A .gitignore entry alone is not sufficient and should not be treated as the fix.
- Make the auto-merge / per-task-worktree machinery
.fusion-aware. When creating worktrees or performing internal merges, explicitly exclude/strip .fusion/ so a tracked blob can never be materialized over a live engine dir.
- Add a pre-operation guard. Before any engine-initiated
checkout/merge/rebase, verify the live DB path is not a tracked path in the target tree; abort with a clear error if it is.
Workaround (for anyone hitting this now)
Per incident: stop the engine, sqlite3 fusion.db '.recover' | sqlite3 fusion.db.recovered, verify PRAGMA integrity_check, swap it in, restart. To stop recurrence, untrack .fusion/ on every branch, not just main:
# for each branch that tracks .fusion/ (check: git ls-tree -r <branch> --name-only -- .fusion/)
git rm -r --cached .fusion/ && echo '.fusion/' >> .gitignore && git commit -m "untrack .fusion runtime state"
Do this via a throwaway git worktree (never git checkout the branch in the live root — that itself clobbers the live DB).
Related
Summary
A live Fusion project root is both the engine's data directory (
.fusion/, withfusion.dbheld open in WAL mode and written continuously) and a git work tree that Fusion itself drives with heavy automated git activity (per-taskfusion/fn-*branches, auto-merges intomain, and per-task worktrees under.worktrees/).If
.fusion/(or justfusion.db) ever ends up tracked in any branch, then anygit checkout/merge/rebasethat touches that tracked blob makes git overwritefusion.dbbyte-for-byte on disk while SQLite still has it open. The on-disk file instantly desyncs from the live WAL/page cache, producing genuine page-level corruption:This is distinct from the internal write-pattern corruption in #24 (unbatched INSERTs / WAL churn). Here the engine's writes are fine — an external process (git, invoked by Fusion's own workflow) truncates and rewrites the file under the open handle.
Environment
0.48.0v25.5.0node:sqlitenoatime— not a network/fuse filesystem, so this is not an NFS-locking issuefusion.db, WAL mode,synchronous=normalObserved impact
In one project (an OCR pipeline with very high automated-merge volume — ~64 merges/checkouts in a week),
fusion.dbcorrupted 4 times in 5 days. Each time the engine crash-looped atstore.initand every/api/*call returned HTTP 500, whilefusion.serviceitself stayedactive(only the child engine dies). Recovery each time required manualsqlite3 fusion.db '.recover'.Tell-tale signs that pointed at git rather than internal corruption:
fusion.db.corrupted-FN070,-FN088) — i.e. corruption coincided with task/branch git operations.PRAGMA integrity_checkfails outright (page-level), and FTS rebuild / WAL replay do not help — only.recoverdoes..fusion/was tracked, and not justfusion.db— the entire runtime dir was committed (agent-memory/,agents/*.jsonlrunlogs,tasks/*,archive.db,fusion-central.db, plus*-wal/*-shmsidecars).Why the obvious mitigation is insufficient
The project had already tried the natural fix: a commit adding
.fusion/to.gitignoreand untracking it onmain. That stopped corruption originating from operations onmain, but corruption recurred because:.gitignoredoes not untrack already-committed files. Several unmerged task branches created before the ignore commit still carried the full.fusion/tree..worktrees/<name>. Creating/refreshing such a worktree for a branch that tracks.fusion/writes a stalefusion.dbinto it; merging/rebasing that branch towardmainreintroduces and rewrites the tracked DB. I directly observed an active per-task worktree showingM .fusion/fusion.db— a tracked, modified live DB inside a worktree.So as long as any ref anywhere tracks
.fusion/, the live DB remains a landmine that the next automated merge can detonate.Root cause (design flaw)
Fusion stores the live engine state inside the git work tree that it also performs automated git operations on, and does nothing to guarantee
.fusion/stays untracked. Git is fundamentally allowed to overwrite any tracked path during checkout/merge/rebase, with no awareness that a path is an open SQLite database. The combination is inherently unsafe.Suggested fixes (in rough priority order)
fusion.db,archive.db,fusion-central.db, WAL/SHM, runlogs) under a per-project state dir outside the versioned tree (e.g. keyed by project id under the Fusion home), so no git operation can ever touch it. The work tree should hold only source + intentionally-versioned task artifacts..fusion/must live in the tree, hard-enforce that it is never tracked. On project init and on engine start, write the ignore rule (and.git/info/exclude), and scan all refs — if.fusion/is tracked inHEADor any branch, refuse to start (or loudly warn) and offer to auto-rungit rm -r --cached .fusion/+ commit on the offending refs. A.gitignoreentry alone is not sufficient and should not be treated as the fix..fusion-aware. When creating worktrees or performing internal merges, explicitly exclude/strip.fusion/so a tracked blob can never be materialized over a live engine dir.checkout/merge/rebase, verify the live DB path is not a tracked path in the target tree; abort with a clear error if it is.Workaround (for anyone hitting this now)
Per incident: stop the engine,
sqlite3 fusion.db '.recover' | sqlite3 fusion.db.recovered, verifyPRAGMA integrity_check, swap it in, restart. To stop recurrence, untrack.fusion/on every branch, not justmain:Do this via a throwaway
git worktree(nevergit checkoutthe branch in the live root — that itself clobbers the live DB).Related
.fusion/.fusion/in worktrees (same ".fusionends up in worktrees" theme)