fix: add data directory lockfile to prevent concurrent access#6
Open
slvDev wants to merge 1 commit intovicnaum:masterfrom
Open
fix: add data directory lockfile to prevent concurrent access#6slvDev wants to merge 1 commit intovicnaum:masterfrom
slvDev wants to merge 1 commit intovicnaum:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add an exclusive filesystem lock (
{data_dir}/.lock) that prevents two shinode processes from opening the same data directory simultaneously. A second instance now fails immediately with a clear error instead of silently corrupting storage.Problem
SHiNode is designed as a single-instance-per-machine node. But nothing enforced this — running two processes with the same
--data-dir(even accidentally) would cause silent corruption of the WAL, shard metadata, bitsets, and peer cache. All synchronization in Storage is in-memory (parking_lot::Mutex), which only protects threads within one process, not across processes.Approach
Uses the
fs2crate for cross-platform advisory file locking (flock()on Unix,LockFileExon Windows). This is the same primitive that SQLite, LMDB, and most databases use.Reth uses a different approach — writing PID + start-time to a file, then checking
sysinfoon next startup. That's ~100 lines and needs the heaviersysinfocrate. Thefs2/flock()approach is simpler and more reliable for our use case: the kernel guarantees lock release on crash, kill, or power loss — no stale lock files possible.Changes
node/Cargo.toml— addfs2 = "0.4"(thin wrapper aroundlibc, no transitive deps)node/src/storage/sharded/mod.rs:_lock_file: fs::Filefield toStoragestruct (held for struct lifetime, auto-released on drop)open_with_progress()right aftercreate_dir_all— before any metadata or shard I/Orepair()independently (standalone static method, doesn't construct Storage)db compact,db rebuild-cache, and--repairVerification
Second instance with same data dir:
After first instance exits: lock released by kernel, second instance starts normally.