refactor+reliability: extract atomic helpers + lock-protect decide.py#17
Open
CryptoJones wants to merge 1 commit into
Open
refactor+reliability: extract atomic helpers + lock-protect decide.py#17CryptoJones wants to merge 1 commit into
CryptoJones wants to merge 1 commit into
Conversation
Pulls the read/write atomicity pattern into one place and uses it to fix two new failure modes in `socrates decide`. New module `_atomic.py`: - `atomic_write_text(path, text)` — tempfile + os.replace. Same pattern as the in-flight interview-atomic-save and patterns-cache- atomic-save fixes; centralized here so a third caller (decide) and any future ones don't reinvent it. - `locked_read_modify_write(path, mutate)` — exclusive read-modify- write guarded by a POSIX file lock. Holds the lock through the entire read → mutate → atomic-write cycle, so concurrent callers serialize. Two subtleties worth flagging for review: 1. The lock is taken on a SIBLING `.<name>.lock` file, NOT on `path` itself. We initially tried locking `path`, but `atomic_write_text` renames a tempfile onto it — which orphans the old inode and the flock with it. A second worker would then acquire the (stale) old-inode lock and read pre-rename content, producing a silent lost update. The sibling lock file's inode is stable across renames, so serialization is preserved. There's a regression test that proves this (timing assertion: two 300ms holds must take ≥500ms in serial). 2. On non-POSIX (no `fcntl`) the lock silently no-ops, matching the pre-fix unlocked behavior. The atomic-write half still applies, so a mid-write SIGINT doesn't corrupt the file. No regression on Windows; just no new concurrency protection. `socrates decide` now uses `locked_read_modify_write`. This fixes: - Data loss under concurrent writes. Pre-fix: two `socrates decide` invocations both read the same pre-state, both compute new content, and the second writer silently clobbers the first. An operator scripting batch decisions (or a CI hook) loses decisions with no error to detect them. - Corruption under SIGINT. Pre-fix: Ctrl-C during the `decisions_path.write_text(new_body)` call leaves DECISIONS.md truncated — `socrates timeline` then crashes parsing the file and the project audit reports it broken. Also folds in the multi-line whitespace collapse from the bugfix/decide-multiline-collapse branch so this PR is self-contained (`" ".join(text.split())`) — a multi-line `socrates decide $'foo\\nbar'` no longer produces a broken bullet whose closing `**` lands on a different line. Tests (new tests/test_atomic.py — 8 tests + 2 in test_decide.py): `atomic_write_text`: - writes full content with UTF-8 by default (em-dashes, smart quotes, 漢字 round-trip) - overwrites existing files - leaves no .tmp on success - mid-rename failure: pre-existing data untouched, .tmp cleaned `locked_read_modify_write`: - single-process: mutate sees current text, write lands - non-existent file raises FileNotFoundError up-front - TWO subprocesses each holding the lock for 300ms must both land their updates (no lost-update) AND total elapsed time is ≥500ms (proving lock-blocking, not just luck) `decide` integration: - two concurrent `record_decision` calls keep BOTH decisions (subprocess test; skipped on Windows) 156 tests pass total (was 149); ruff + mypy strict clean. Future work: interview.py and patterns.py have their own copies of the atomic-write logic from in-flight PRs. Once those merge, a follow-on can dedupe both call sites onto `_atomic.atomic_write_text`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pulls the read/write atomicity pattern into one place and uses it to
fix two new failure modes in
socrates decide.New module
_atomic.py:atomic_write_text(path, text)— tempfile + os.replace. Samepattern as the in-flight interview-atomic-save and patterns-cache-
atomic-save fixes; centralized here so a third caller (decide) and
any future ones don't reinvent it.
locked_read_modify_write(path, mutate)— exclusive read-modify-write guarded by a POSIX file lock. Holds the lock through the
entire read → mutate → atomic-write cycle, so concurrent callers
serialize.
Two subtleties worth flagging for review:
The lock is taken on a SIBLING
.<name>.lockfile, NOT onpathitself. We initially tried locking
path, butatomic_write_textrenames a tempfile onto it — which orphans the old inode and the
flock with it. A second worker would then acquire the (stale)
old-inode lock and read pre-rename content, producing a silent
lost update. The sibling lock file's inode is stable across
renames, so serialization is preserved. There's a regression test
that proves this (timing assertion: two 300ms holds must take
≥500ms in serial).
On non-POSIX (no
fcntl) the lock silently no-ops, matching thepre-fix unlocked behavior. The atomic-write half still applies,
so a mid-write SIGINT doesn't corrupt the file. No regression on
Windows; just no new concurrency protection.
socrates decidenow useslocked_read_modify_write. This fixes:socrates decideinvocations both read the same pre-state, both compute new content,
and the second writer silently clobbers the first. An operator
scripting batch decisions (or a CI hook) loses decisions with no
error to detect them.
decisions_path.write_text(new_body)call leaves DECISIONS.mdtruncated —
socrates timelinethen crashes parsing the file andthe project audit reports it broken.
Also folds in the multi-line whitespace collapse from the
bugfix/decide-multiline-collapse branch so this PR is self-contained
(
" ".join(text.split())) — a multi-linesocrates decide $'foo\\nbar'no longer produces a broken bullet whose closing
**lands on adifferent line.
Tests (new tests/test_atomic.py — 8 tests + 2 in test_decide.py):
atomic_write_text:quotes, 漢字 round-trip)
locked_read_modify_write:their updates (no lost-update) AND total elapsed time is ≥500ms
(proving lock-blocking, not just luck)
decideintegration:record_decisioncalls keep BOTH decisions(subprocess test; skipped on Windows)
156 tests pass total (was 149); ruff + mypy strict clean.
Future work: interview.py and patterns.py have their own copies of
the atomic-write logic from in-flight PRs. Once those merge, a
follow-on can dedupe both call sites onto
_atomic.atomic_write_text.Merge-order note (important): this PR overlaps with two in-flight ones:
Cleanest sequence:
from socrates120x._atomic import atomic_write_textand drop their private copies.Functionally each PR is correct on its own; the order only matters for diff cleanliness.
Also folds in the multi-line whitespace collapse from #4 (bugfix/decide-multiline-collapse) so this PR is self-contained on the decide.py side. If you merge this first, #4 becomes a no-op.