A fully working log-structured filesystem (LFS) implemented in C and mounted via FUSE3. Built stage by stage from raw block I/O up to crash recovery and indirect block pointers.
In a traditional filesystem, updates overwrite data in place. In an LFS, every write goes to the end of a log — old data is never overwritten. This makes writes fast and sequential, but requires garbage collection to reclaim space occupied by old (dead) versions of blocks.
This is the same principle used in SSDs and flash storage (which cannot overwrite in place), database write-ahead logs (PostgreSQL, SQLite), and journaling filesystems (ext4, APFS).
lfs-fuse/
├── .gitignore
├── README.md
├── lfs # FUSE binary (built by make, gitignored)
├── mkfs_lfs # Format tool (built by make, gitignored)
├── lfs.img # 4MB disk image (created by mkfs_lfs, gitignored)
├── mount/ # FUSE mount point (gitignored)
└── src/
├── Makefile
├── lfs.h # Shared structs, constants, API declarations
├── lfs.c # FUSE frontend — all filesystem operations
├── disk.c # Block-level read/write (pread/pwrite)
├── log.c # Log append, checkpoint, crash recovery
├── inode.c # Inode read/write/alloc
├── gc.c # Garbage collector
└── mkfs_lfs.c # Disk formatter
Block 0 — Superblock (magic, total_blocks, log_tail, commit_seq)
Block 1 — Inode map (128 × uint32_t: inode_no → block number)
Block 2 — Commit block (crash recovery seal: magic, seq, crc)
Block 3 — Root inode (inode 0, created by mkfs)
Block 4 — Root dir data (hello.txt dirent, created by mkfs)
Block 5 — hello.txt data (created by mkfs)
Block 6 — hello.txt inode (inode 1, created by mkfs)
Block 7+ — Log (all writes go here, log_tail advances forward)
Each segment is 32 blocks (128 KB). The first block of every segment is a segment summary recording which inode owns each block — used by the garbage collector to distinguish live from dead blocks.
Two terminals are recommended — one for the filesystem process, one for testing.
# Terminal 1 — build and mount
cd ~/lfs-fuse/src
make clean && make all # compile everything
make format # write a fresh lfs.img
../lfs -f ../mount # mount (stays in foreground, prints debug output)
# Terminal 2 — unmount cleanly when done
fusermount3 -u ~/lfs-fuse/mountTo remount existing data without reformatting (keeps your files):
cd ~/lfs-fuse/src && ../lfs -f ../mountOnce mounted, the filesystem works like any normal Linux directory:
M=~/lfs-fuse/mount # shorthand — M is just a variable
# Files
echo "hello" > $M/file.txt
cat $M/file.txt
ls $M
rm $M/file.txt
# Directories
mkdir $M/docs
echo "note" > $M/docs/note.txt
cat $M/docs/note.txt
rmdir $M/docs # only works if empty
# Deep nesting
mkdir -p $M/a/b/c
echo "deep" > $M/a/b/c/deep.txt
# Copy files in/out
cp ~/somefile.txt $M/
cp $M/file.txt ~/
# Large files (up to ~4MB)
dd if=/dev/urandom of=$M/big.bin bs=1048576 count=1
wc -c $M/big.bin # 1048576Raw read/write of 4096-byte blocks to lfs.img using pread/pwrite.
Every higher layer goes through disk_read() / disk_write().
Every write appends to the end of the log instead of overwriting old data.
The superblock (block 0) records log_tail — where the next write goes.
Survives remount: log_tail is reloaded from the superblock on mount.
Files have inodes (metadata: type, size, block pointers).
An inode map (block 1) maps inode numbers to their current block in the log.
A root directory (inode 0) maps filenames to inode numbers.
Supports: create, read, write, getattr, readdir.
Because writes never overwrite, old versions of blocks accumulate as dead blocks.
GC scans the inode map to find live blocks, compacts them to the front of the log,
fixes all inode and indirect pointers, and rewinds log_tail — reclaiming hundreds
of blocks at once. Triggered automatically when free blocks fall below GC_THRESHOLD
(100 blocks).
Each inode has 10 direct block pointers (direct[0] through direct[9]),
supporting files up to 40 KB. Read and write work correctly across block boundaries.
rm file calls lfs_unlink which:
- Zeros the directory entry (slot marked free,
inode_no = 0) - Sets
inode_map[ino] = 0— inode and all data blocks become dead for GC - Checkpoints to disk
Works at any directory depth.
path_to_inode walks any depth by splitting the path on / and calling
lookup_in_dir at each component. All operations (create, read, write, unlink)
automatically work inside subdirectories.
mkdir allocates a new directory inode with a fresh empty data block.
rmdir refuses to remove non-empty directories (-ENOTEMPTY).
Deleted dirent slots are reused by new entries.
If power cuts out mid-write, the filesystem recovers cleanly on remount.
Every log_checkpoint writes in this strict order:
- Inode map → block 1
- Superblock (with incremented
commit_seq) → block 0 - Commit block (magic + seq + log_tail + imap XOR-CRC) → block 2 ← written last
On mount, log_recover reads the commit block and checks all four fields against
the superblock. If all match → clean mount. If any mismatch → incomplete checkpoint:
- Scan backwards from
log_tailto find the last non-zero block → rewindlog_tail - Scan all log blocks for valid inodes → rebuild inode map from scratch
- Write a fresh checkpoint to seal the recovered state
Adds a single indirect pointer to each inode. The indirect block holds 1024
uint32_t block pointers, extending the maximum file size from 40 KB to ~4 MB.
block_idx < 10→ resolved viadirect[](unchanged)block_idx >= 10→ resolved viaindirect[block_idx - 10]
The indirect block is copy-on-write: any write to the indirect region reads the existing indirect block, updates the relevant pointer, and appends a new copy to the log. GC marks the indirect block and all blocks it points to as live.