Ultra-high-performance file search for Windows using direct NTFS MFT reading and Polars DataFrames.
๐ฆ This is the Rust rewrite of UFFS, replacing the original C++ version with modern, safe, and blazing-fast code.
Traditional file search tools (including os.walk, FindFirstFile, etc.) work like this:
- Ask the OS to find a file
- OS reads the entire MFT (Master File Table) - the "phonebook" of all files
- Returns info for one file
- Throws away the MFT
- Repeat for the next file ๐
UFFS reads the MFT directly - once - and queries it in memory using Polars DataFrames. This is like reading the entire phonebook once instead of looking up each name individually.
| Drive Type | Records | Time | Throughput |
|---|---|---|---|
| SSD | 1.77M | 3.1s | 1,472 MB/s |
| SSD | 1.53M | 2.5s | 1,839 MB/s |
| HDD | 3.81M | 23.3s | 206 MB/s |
| HDD | 7.18M | 45.9s | 250 MB/s |
| All 7 drives | 18.7M | 142s | - |
| Comparison | Records | Time | Notes |
|---|---|---|---|
| UFFS v0.2.5 | 18.7 Million | ~142 seconds | All disks, fast mode |
| UFFS v0.1.30 | 18.7 Million | ~315 seconds | Baseline |
| Everything | 19 Million | 178 seconds | All disks |
| WizFile | 6.5 Million | 299 seconds | Single HDD |
UFFS is 55% faster than v0.1.30 baseline, and achieves ~4x SSD throughput improvement!
# Build from source (requires Rust 1.85+)
cargo build --release
# The binary will be at:
# Windows: target/release/uffs.exe
# Linux/macOS: target/release/uffs# Search for all .rs files on C: drive
uffs "*.rs" --drive C
# Search across multiple drives
uffs "*.txt" --drives C,D,E
# Search all drives (default)
uffs "project*"
# Use a pre-built index for instant searches
uffs index --drive C --output c_drive.parquet
uffs search "*.rs" --index c_drive.parquet| Command | Result |
|---|---|
uffs "c:/pro*" |
Files & folders starting with "pro" on C: |
uffs "*.txt" |
All .txt files on ALL drives |
uffs "*.txt" --drives C,D,M |
All .txt files on C:, D:, and M: |
uffs "project*" --ext rs,toml |
Rust project files |
# Files only (no directories)
uffs "*.log" --files-only
# Directories only
uffs "node_modules" --dirs-only
# Size filters
uffs "*.mp4" --min-size 100MB --max-size 4GB
# Limit results
uffs "*.tmp" --limit 100
# Case-sensitive search
uffs "README" --case# Output to CSV file
uffs "*.rs" --out results.csv
# Custom columns
uffs "*" --columns path,size,created --out files.csv
# Custom separator and quotes
uffs "*" --sep ";" --quotes "'" --out data.csv
# Include/exclude header
uffs "*" --header false --out raw.csv
# JSON output
uffs "*.rs" --format json| Column | Description |
|---|---|
path |
Full path including filename |
name |
Filename only |
pathonly |
Directory path only |
size |
File size in bytes |
sizeondisk |
Actual disk space used |
created |
Creation timestamp |
written |
Last modified timestamp |
accessed |
Last accessed timestamp |
type |
File type |
directory |
Is a directory |
compressed |
Is compressed |
encrypted |
Is encrypted |
hidden |
Hidden attribute |
system |
System attribute |
readonly |
Read-only attribute |
all |
All available columns |
Search for files matching a pattern.
uffs search "*.rs" --drive C --files-only --limit 100Build a persistent index for instant future searches.
# Index a single drive
uffs index --drive C --output c_drive.parquet
# Index multiple drives
uffs index --drives C,D,E --output all_drives.parquetDisplay information about an index file.
uffs info c_drive.parquetShow statistics about indexed files.
uffs stats --index c_drive.parquet --top 20Save raw MFT bytes for offline analysis.
uffs save-raw --drive C --output c_mft.raw --compressLoad and parse a saved raw MFT file.
uffs load-raw c_mft.raw --output parsed.parquetUFFS is built as a modular Rust workspace:
| Crate | Description | Documentation |
|---|---|---|
uffs-polars |
Polars facade (compilation isolation) | - |
uffs-mft |
Direct MFT reading โ Polars DataFrame | ๐ README |
uffs-core |
Query engine using Polars lazy API | - |
uffs-cli |
Command-line interface (uffs) |
- |
uffs-tui |
Terminal UI (uffs_tui) |
- |
uffs-gui |
Graphical UI (uffs_gui) |
- |
- Direct MFT Access: Bypasses Windows file enumeration APIs
- Polars DataFrames: Powerful, memory-efficient data manipulation
- Async I/O: High-throughput disk reading with Tokio
- Parquet Persistence: Compressed, columnar index storage
- Multi-drive Parallel Search: Query all drives concurrently
- SIMD-accelerated Pattern Matching: Fast glob and regex support
The uffs_mft binary provides direct MFT access for advanced users:
# Quick MFT info (~10ms)
uffs_mft info --drive C
# Full MFT analysis with file statistics (~10-30s)
uffs_mft info --drive C --deep
# Export MFT to Parquet (fast mode - default)
uffs_mft read --drive C --output mft.parquet
# Export with complete extension data (slower)
uffs_mft read --drive C --output mft.parquet --full
# Benchmark single drive
uffs_mft bench --drive C --runs 3
# Benchmark all drives
uffs_mft bench-all --output benchmark.json
# List NTFS drives
uffs_mft drivesRead Modes:
--mode auto(default): SSDโparallel, HDDโprefetch--mode parallel: Best for SSDs (8MB chunks)--mode prefetch: Best for HDDs (double-buffered 4MB chunks)--mode streaming: Low memory usage
Fast vs Full:
- Default (fast): Skips extension records (~1% of files), ~15-25% faster
--full: Merges extension records for complete hard link/ADS data
See uffs-mft README for detailed documentation.
UFFS employs multiple layers of optimization to achieve maximum performance when reading the NTFS Master File Table:
Instead of using Windows file enumeration APIs, UFFS opens the raw volume and reads the MFT directly using unbuffered I/O. This bypasses the Windows file system cache and gives us full control over read patterns.
UFFS automatically detects whether a drive is an SSD or HDD using Windows storage APIs (IOCTL_STORAGE_QUERY_PROPERTY) and tunes I/O parameters accordingly:
| Drive Type | Chunk Size | Rationale |
|---|---|---|
| SSD | 8 MB | Large sequential reads, no seek penalty |
| HDD | 4 MB | Balance between syscall overhead and seek time |
By using large chunk sizes (4-8 MB instead of the typical 1 MB), UFFS reduces the number of ReadFile system calls by 4-8x. For a 4.5 GB MFT, this means ~500-1000 syscalls instead of ~4,500.
Each thread uses a thread-local buffer for record parsing, eliminating per-record heap allocations. This is critical when processing millions of MFT records:
// Instead of allocating per record:
let mut record_buf = record_data.to_vec(); // โ Allocates
// We use thread-local buffers:
parse_record_zero_alloc(record_data, frs); // โ
Reuses bufferThe PrefetchMftReader uses two alternating buffers to overlap I/O with processing:
- Read into buffer A while processing buffer B
- Swap buffers and repeat
- CPU never waits for disk I/O
After reading chunks from disk, UFFS uses Rayon's parallel iterators to parse records across all CPU cores. Each core processes a portion of the chunk simultaneously.
The MFT can be scattered across multiple non-contiguous extents on disk. UFFS handles this by:
- Getting the extent map via
FSCTL_GET_RETRIEVAL_POINTERS - Mapping Virtual Cluster Numbers (VCN) to Logical Cluster Numbers (LCN)
- Reading from the correct physical locations
Query operations use Polars' lazy API, which optimizes the query plan before execution. Filters are pushed down, columns are pruned, and operations are parallelized automatically.
Instead of parsing into Vec<ParsedRecord> (Array-of-Structs) and then converting to DataFrame columns, UFFS parses directly into column vectors (Struct-of-Arrays). This eliminates the expensive AoSโSoA transpose and reduces df_build time by 90%.
Extension records (~1% of files) contain overflow attributes for files with many hard links or ADS. The fast path skips these for maximum speed, while --full mode merges them for complete data.
| Optimization | Impact |
|---|---|
| Direct MFT access | Bypasses slow Windows APIs |
| Large chunk sizes | 4-8x fewer syscalls |
| SSD/HDD detection | Optimal I/O parameters per drive |
| Thread-local buffers | ~0 allocations during parsing |
| Double-buffering | Overlapped I/O with processing |
| Rayon parallelism | All CPU cores utilized |
| Polars lazy eval | Optimized query execution |
| SoA layout | 90% faster df_build |
| Fast path | 15-25% faster on SSD |
- Windows only for MFT reading (the core functionality)
- Cross-platform for working with saved indexes
- Administrator privileges required for direct MFT access
- Windows will show a UAC prompt when running UFFS
- Rust 1.85+ (Edition 2024)
- Windows SDK (for MFT reading)
Pretty-printed table output for terminal viewing.
uffs "*.rs" --out results.csv --sep "," --header trueuffs "*.rs" --format json --out results.jsonIndexes are stored in Parquet format for efficient storage and fast loading.
Export to CSV or Parquet and load in your data analysis tools:
import polars as pl
# Load UFFS index
df = pl.read_parquet("c_drive.parquet")
# Analyze file distribution
df.group_by("extension").agg(
pl.count().alias("count"),
pl.col("size").sum().alias("total_size")
).sort("total_size", descending=True)# Find large log files and process with grep
uffs "*.log" --min-size 100MB --out console | grep "error"
# Export for further processing
uffs "*" --columns path,size --out - | sort -t, -k2 -n -r | head -100This project is licensed under the Mozilla Public License 2.0 (MPL-2.0).
See LICENSE for details.
This Rust implementation is inspired by the original C++ UFFS, which was based on SwiftSearch by wfunction.
- Author: Robert Nio
- Repository: github.com/githubrobbi/UltraFastFileSearch