Version 0.1
Scope: Single-process • Single-writer • Embedded library
No concurrency, no compaction, no mmap yet.
Deliver a minimal embedded key–value store that:
- Persists data to disk using an append-only log
- Supports
put,get, anddelete - Recovers correctly after process restart
- Maintains an in-memory index for fast reads
This phase prioritizes correctness, crash recovery, and clear invariants over performance.
A record is an immutable entry appended to the log.
Each record represents one logical operation:
- Insert / update
- Delete (tombstone)
- A single append-only file
- Never modified in place
- Only appended to
- Replayed in full at startup
Invariant:
The log is the source of truth.
The in-memory index is always reconstructible from the log.
- In-memory
HashMap<Key, LogOffset> - Points to the latest record for a given key
- Built by replaying the log on startup
- Deletes are represented by tombstone records
- Tombstones override previous values
- Tombstones are retained indefinitely in Phase 1
API
pub fn open(path: impl AsRef<Path>) -> Result<KvStore>;Behavior
- If file does not exist:
- Create it
- Initialize empty index
- If file exists:
- Open for read + append
- Replay log from byte 0
- Rebuild index
- Truncated tail record:
- Ignore partial record
- Log warning (not fatal)
API
pub fn put(&mut self, key: &[u8], value: &[u8]) -> Result<()>;Behavior
- Append record to log
- Flush write buffer
- Update index to new record offset
Constraints
key.len() <= 1024value.len() <= 1 MB- Reject empty key
Durability
- Record must reach OS buffers before returning
fsyncis not required in Phase 1
API
pub fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>>;Behavior
- If key not in index →
Ok(None) - Read record from log at stored offset
- If tombstone →
Ok(None) - Otherwise return value bytes
API
pub fn delete(&mut self, key: &[u8]) -> Result<()>;Behavior
- Append tombstone record
- Update index to point to tombstone offset
Idempotency
- Deleting a non-existent key is allowed
- Subsequent
getmust returnNone
All values stored in little-endian.
| magic (u32) | key_len (u32) | val_len (u32) | key bytes | value bytes |
| Field | Description |
|---|---|
| magic | Constant 0xBEEFCAFE for record validation |
| key_len | Length of key in bytes |
| val_len | Length of value in bytes; 0 indicates tombstone |
| key bytes | Raw key |
| value bytes | Raw value |
key_len > 0key_len ≤ 1024val_len ≤ 1 MB- Record must be fully readable
- Partial trailing record is ignored at startup
pub enum KvError {
Io(std::io::Error),
InvalidKey,
ValueTooLarge,
CorruptRecord { offset: u64 },
}- Corrupt record in middle of file → fatal
- Truncated record at end → ignored
- No panics in public API
| Aspect | Requirement |
|---|---|
| Durability | Survives process restart |
| Performance | ≥ 10k ops/sec on local SSD |
| Memory | Index entirely in memory |
| Threading | Single-threaded |
| Dependencies | Std only (no serde, no mmap, no async) |
| Logging | Optional debug logging (feature-gated) |
- One open writer at a time
- Store directory contains:
data.logonly
- Caller controls lifecycle (library, not daemon)
- No background threads
-
Basic Persistence
- Put key
- Restart process
- Get key → value matches
-
Overwrite
- Put key A = 1
- Put key A = 2
- Get A →
2
-
Delete
- Put key
- Delete key
- Restart
- Get key →
None
-
Truncated File Recovery
- Kill process mid-write
- Restart
- All previous keys intact
-
Corruption Detection
- Modify middle of file
- Restart → error returned
- Compaction
- Multiple log segments
- Concurrency
- Memory-mapped IO
- Checksums
- Compression
- Snapshots
- Structured checksums
- Multiple segments
- Background compaction
- Stronger durability guarantees (
fsyncmodes)
Phase 1 is complete when:
- All acceptance tests pass
- Log replay is deterministic
- Index can be deleted and rebuilt with no data loss
- You can explain every invariant without notes