Skip to content

Git operations on the live project work tree corrupt fusion.db when .fusion/ is tracked on any branch #1816

Description

@abeperl

Summary

A live Fusion project root is both the engine's data directory (.fusion/, with fusion.db held open in WAL mode and written continuously) and a git work tree that Fusion itself drives with heavy automated git activity (per-task fusion/fn-* branches, auto-merges into main, and per-task worktrees under .worktrees/).

If .fusion/ (or just fusion.db) ever ends up tracked in any branch, then any git checkout / merge / rebase that touches that tracked blob makes git overwrite fusion.db byte-for-byte on disk while SQLite still has it open. The on-disk file instantly desyncs from the live WAL/page cache, producing genuine page-level corruption:

Error: database disk image is malformed
[runtime] Failed to start InProcessRuntime: database disk image is malformed

This is distinct from the internal write-pattern corruption in #24 (unbatched INSERTs / WAL churn). Here the engine's writes are fine — an external process (git, invoked by Fusion's own workflow) truncates and rewrites the file under the open handle.

Environment

  • Fusion version: 0.48.0
  • Node.js: v25.5.0
  • SQLite driver: node:sqlite
  • OS / FS: Linux (Ubuntu), local ext4 on LVM, noatimenot a network/fuse filesystem, so this is not an NFS-locking issue
  • DB: fusion.db, WAL mode, synchronous=normal

Observed impact

In one project (an OCR pipeline with very high automated-merge volume — ~64 merges/checkouts in a week), fusion.db corrupted 4 times in 5 days. Each time the engine crash-looped at store.init and every /api/* call returned HTTP 500, while fusion.service itself stayed active (only the child engine dies). Recovery each time required manual sqlite3 fusion.db '.recover'.

Tell-tale signs that pointed at git rather than internal corruption:

  • The corruption backups were named after the task in flight (fusion.db.corrupted-FN070, -FN088) — i.e. corruption coincided with task/branch git operations.
  • PRAGMA integrity_check fails outright (page-level), and FTS rebuild / WAL replay do not help — only .recover does.
  • Inspecting the branches: .fusion/ was tracked, and not just fusion.db — the entire runtime dir was committed (agent-memory/, agents/*.jsonl runlogs, tasks/*, archive.db, fusion-central.db, plus *-wal/*-shm sidecars).

Why the obvious mitigation is insufficient

The project had already tried the natural fix: a commit adding .fusion/ to .gitignore and untracking it on main. That stopped corruption originating from operations on main, but corruption recurred because:

  1. .gitignore does not untrack already-committed files. Several unmerged task branches created before the ignore commit still carried the full .fusion/ tree.
  2. Per-task branches/worktrees re-materialize the tracked blob. Fusion checks out task branches into .worktrees/<name>. Creating/refreshing such a worktree for a branch that tracks .fusion/ writes a stale fusion.db into it; merging/rebasing that branch toward main reintroduces and rewrites the tracked DB. I directly observed an active per-task worktree showing M .fusion/fusion.db — a tracked, modified live DB inside a worktree.

So as long as any ref anywhere tracks .fusion/, the live DB remains a landmine that the next automated merge can detonate.

Root cause (design flaw)

Fusion stores the live engine state inside the git work tree that it also performs automated git operations on, and does nothing to guarantee .fusion/ stays untracked. Git is fundamentally allowed to overwrite any tracked path during checkout/merge/rebase, with no awareness that a path is an open SQLite database. The combination is inherently unsafe.

Suggested fixes (in rough priority order)

  1. Keep the live DB out of the work tree entirely. Store the engine's mutable state (fusion.db, archive.db, fusion-central.db, WAL/SHM, runlogs) under a per-project state dir outside the versioned tree (e.g. keyed by project id under the Fusion home), so no git operation can ever touch it. The work tree should hold only source + intentionally-versioned task artifacts.
  2. If .fusion/ must live in the tree, hard-enforce that it is never tracked. On project init and on engine start, write the ignore rule (and .git/info/exclude), and scan all refs — if .fusion/ is tracked in HEAD or any branch, refuse to start (or loudly warn) and offer to auto-run git rm -r --cached .fusion/ + commit on the offending refs. A .gitignore entry alone is not sufficient and should not be treated as the fix.
  3. Make the auto-merge / per-task-worktree machinery .fusion-aware. When creating worktrees or performing internal merges, explicitly exclude/strip .fusion/ so a tracked blob can never be materialized over a live engine dir.
  4. Add a pre-operation guard. Before any engine-initiated checkout/merge/rebase, verify the live DB path is not a tracked path in the target tree; abort with a clear error if it is.

Workaround (for anyone hitting this now)

Per incident: stop the engine, sqlite3 fusion.db '.recover' | sqlite3 fusion.db.recovered, verify PRAGMA integrity_check, swap it in, restart. To stop recurrence, untrack .fusion/ on every branch, not just main:

# for each branch that tracks .fusion/ (check: git ls-tree -r <branch> --name-only -- .fusion/)
git rm -r --cached .fusion/ && echo '.fusion/' >> .gitignore && git commit -m "untrack .fusion runtime state"

Do this via a throwaway git worktree (never git checkout the branch in the live root — that itself clobbers the live DB).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions