Skip to content

fix: add data directory lockfile to prevent concurrent access#6

Open
slvDev wants to merge 1 commit intovicnaum:masterfrom
slvDev:fix/data-dir-lockfile
Open

fix: add data directory lockfile to prevent concurrent access#6
slvDev wants to merge 1 commit intovicnaum:masterfrom
slvDev:fix/data-dir-lockfile

Conversation

@slvDev
Copy link
Copy Markdown
Contributor

@slvDev slvDev commented Mar 16, 2026

Summary

Add an exclusive filesystem lock ({data_dir}/.lock) that prevents two shinode processes from opening the same data directory simultaneously. A second instance now fails immediately with a clear error instead of silently corrupting storage.

Problem

SHiNode is designed as a single-instance-per-machine node. But nothing enforced this — running two processes with the same --data-dir (even accidentally) would cause silent corruption of the WAL, shard metadata, bitsets, and peer cache. All synchronization in Storage is in-memory (parking_lot::Mutex), which only protects threads within one process, not across processes.

Approach

Uses the fs2 crate for cross-platform advisory file locking (flock() on Unix, LockFileEx on Windows). This is the same primitive that SQLite, LMDB, and most databases use.

Reth uses a different approach — writing PID + start-time to a file, then checking sysinfo on next startup. That's ~100 lines and needs the heavier sysinfo crate. The fs2 / flock() approach is simpler and more reliable for our use case: the kernel guarantees lock release on crash, kill, or power loss — no stale lock files possible.

Changes

  • node/Cargo.toml — add fs2 = "0.4" (thin wrapper around libc, no transitive deps)
  • node/src/storage/sharded/mod.rs:
    • Add _lock_file: fs::File field to Storage struct (held for struct lifetime, auto-released on drop)
    • Acquire exclusive lock in open_with_progress() right after create_dir_all — before any metadata or shard I/O
    • Acquire exclusive lock in repair() independently (standalone static method, doesn't construct Storage)
    • All entry points covered: main sync, db compact, db rebuild-cache, and --repair

Verification

Second instance with same data dir:

Error: data directory is already in use by another shinode process

After first instance exits: lock released by kernel, second instance starts normally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant