Skip to content

JSON channel binding/instance stores flush after releasing the lock (write reorder data loss) and don't roll back on flush failure #52

@TYRMars

Description

@TYRMars

Summary

The JSON-file channel binding/instance stores mutate in-memory state under the write lock, then release the lock before flushing to disk. Two concurrent writers can have their atomic_write calls reordered (last write wins on disk), durably losing a change even though in-memory state is correct. Separately, a failed flush() leaves the in-memory mutation in place with no rollback, so memory diverges from disk.

Details

  • crates/harness-store/src/json_file.rsJsonFileChannelBindingStore::upsert (~lines 1921-1933), delete (~1955-1967), delete_for_conversation (~1969-1981); same pattern in JsonFileChannelInstanceStore::upsert/delete (~2034-2071).

Each method does: mutate the in-memory Vec under the write lock → clone a snapshot → drop(guard)flush()/atomic_write(snapshot). Because the lock is dropped before the disk write:

  1. Durability race: writer A (snapshot {A}) and writer B (snapshot {A,B}) both release the lock; if B's atomic_write lands before A's, the file ends up {A} — B's change is lost on disk (in-memory still shows {A,B}, so it only surfaces after restart).
  2. No rollback on flush error: the in-memory Vec is mutated first; if flush() returns Err, the error propagates but the mutation is not undone. lookup/list then report a binding that was never persisted.

These are the only two whole-file JSON stores; the per-record JSON stores avoid this by writing one atomic_write per id (true last-write-wins per record).

Impact

Operator-configured channel bindings/instances are low-volume (typically <20 rows), so the race window is narrow, but it is real durable data loss + state divergence. Severity: medium.

Suggested fix

Hold the lock across the flush (flush the snapshot before drop(guard)), or flush first and only commit the in-memory mutation on success (rollback on Err).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions