Summary
The JSON-file channel binding/instance stores mutate in-memory state under the write lock, then release the lock before flushing to disk. Two concurrent writers can have their atomic_write calls reordered (last write wins on disk), durably losing a change even though in-memory state is correct. Separately, a failed flush() leaves the in-memory mutation in place with no rollback, so memory diverges from disk.
Details
crates/harness-store/src/json_file.rs — JsonFileChannelBindingStore::upsert (~lines 1921-1933), delete (~1955-1967), delete_for_conversation (~1969-1981); same pattern in JsonFileChannelInstanceStore::upsert/delete (~2034-2071).
Each method does: mutate the in-memory Vec under the write lock → clone a snapshot → drop(guard) → flush()/atomic_write(snapshot). Because the lock is dropped before the disk write:
- Durability race: writer A (snapshot
{A}) and writer B (snapshot {A,B}) both release the lock; if B's atomic_write lands before A's, the file ends up {A} — B's change is lost on disk (in-memory still shows {A,B}, so it only surfaces after restart).
- No rollback on flush error: the in-memory
Vec is mutated first; if flush() returns Err, the error propagates but the mutation is not undone. lookup/list then report a binding that was never persisted.
These are the only two whole-file JSON stores; the per-record JSON stores avoid this by writing one atomic_write per id (true last-write-wins per record).
Impact
Operator-configured channel bindings/instances are low-volume (typically <20 rows), so the race window is narrow, but it is real durable data loss + state divergence. Severity: medium.
Suggested fix
Hold the lock across the flush (flush the snapshot before drop(guard)), or flush first and only commit the in-memory mutation on success (rollback on Err).
Summary
The JSON-file channel binding/instance stores mutate in-memory state under the write lock, then release the lock before flushing to disk. Two concurrent writers can have their
atomic_writecalls reordered (last write wins on disk), durably losing a change even though in-memory state is correct. Separately, a failedflush()leaves the in-memory mutation in place with no rollback, so memory diverges from disk.Details
crates/harness-store/src/json_file.rs—JsonFileChannelBindingStore::upsert(~lines 1921-1933),delete(~1955-1967),delete_for_conversation(~1969-1981); same pattern inJsonFileChannelInstanceStore::upsert/delete(~2034-2071).Each method does: mutate the in-memory
Vecunder the write lock → clone a snapshot →drop(guard)→flush()/atomic_write(snapshot). Because the lock is dropped before the disk write:{A}) and writer B (snapshot{A,B}) both release the lock; if B'satomic_writelands before A's, the file ends up{A}— B's change is lost on disk (in-memory still shows{A,B}, so it only surfaces after restart).Vecis mutated first; ifflush()returnsErr, the error propagates but the mutation is not undone.lookup/listthen report a binding that was never persisted.These are the only two whole-file JSON stores; the per-record JSON stores avoid this by writing one
atomic_writeper id (true last-write-wins per record).Impact
Operator-configured channel bindings/instances are low-volume (typically <20 rows), so the race window is narrow, but it is real durable data loss + state divergence. Severity: medium.
Suggested fix
Hold the lock across the flush (flush the snapshot before
drop(guard)), or flush first and only commit the in-memory mutation on success (rollback onErr).