Skip to content

Fix JSON channel store flush race and missing rollback (#52)#55

Draft
TYRMars wants to merge 1 commit into
mainfrom
claude/vibrant-dijkstra-87LF1
Draft

Fix JSON channel store flush race and missing rollback (#52)#55
TYRMars wants to merge 1 commit into
mainfrom
claude/vibrant-dijkstra-87LF1

Conversation

@TYRMars
Copy link
Copy Markdown
Owner

@TYRMars TYRMars commented May 30, 2026

Fixes #52.

Problem

JsonFileChannelBindingStore and JsonFileChannelInstanceStore mutated in-memory state under the write lock, then released the lock before flushing to disk. Two failure modes:

  1. Durability race — two concurrent writers each dropped the lock before their atomic_write, so the disk writes could reorder (last write wins on disk). A change correct in memory could be durably lost, surfacing only after restart.
  2. No rollback on flush error — the in-memory Vec was mutated first; a failing flush() propagated the error but left the mutation in place, so lookup/list reported a row that was never persisted.

Fix

Each write method now:

  • builds the next state without touching the guard,
  • flushes it while still holding the write lock (serialising disk writes so they cannot reorder), and
  • commits the in-memory mutation only after the flush succeeds.

A failed flush leaves both memory and disk unchanged. Applied to upsert / delete / delete_for_conversation on the binding store and upsert / delete on the instance store.

Tests

Adds channel_binding_rolls_back_on_flush_failure, which forces a flush failure (occupies the atomic_write temp path with a directory) and asserts the in-memory state rolls back, then confirms a later write commits cleanly. cargo test -p harness-store and cargo clippy -p harness-store --all-targets -- -D warnings both pass.

https://claude.ai/code/session_01FhsDL2qEDL8MQnS7gmSdim


Generated by Claude Code

JsonFileChannelBindingStore / JsonFileChannelInstanceStore mutated
in-memory state under the write lock, then released the lock before
flushing the snapshot to disk. This allowed two concurrent writers'
atomic_write calls to reorder (last write wins on disk, durably losing
a change while memory still showed it), and a failed flush left the
in-memory mutation in place with no rollback, so memory diverged from
disk.

Build the next state without touching the guard, flush it while still
holding the write lock (serialising disk writes so they cannot
reorder), and commit the in-memory mutation only after the flush
succeeds. A failed flush now leaves both memory and disk unchanged.

Applied to upsert / delete / delete_for_conversation on the binding
store and upsert / delete on the instance store. Adds a regression
test that forces a flush failure and asserts the in-memory state rolls
back.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JSON channel binding/instance stores flush after releasing the lock (write reorder data loss) and don't roll back on flush failure

2 participants