mzip

Detection-based compression that finds patterns zstd, brotli, and bzip2 miss.

Store the formula, not the data.

Why mzip?

mzip detects mathematical structure in your data and compresses it optimally. Where other compressors see bytes, mzip sees patterns.

Pattern Detected	mzip	Best Alternative	Advantage
Sequential IDs (`1, 2, 3, ...`)	32 bytes	bzip2: 3.4KB	106x better
Repeating templates (JSON APIs)	10KB	brotli: 49KB	4.9x better
Audio PCM waveforms	2.1KB	bzip2: 4KB	2x better
Image gradients	124 bytes	brotli: 397B	3.2x better

Result: 75% win rate across 250 tests (50 data types × 5 sizes) against 8 compressors including brotli, bzip2, xz, 7z, and rar.

Data Category	Win Rate	Why
Numeric sequences	100%	Formula compression: `v[i] = a + b*i` beats any LZ77
Structured JSON/XML	90%	Template extraction captures repeating structure
Audio/sensor data	100%	Delta encoding exploits temporal correlation
Log files	80%	Columnar separation + BWT on each column
Large files (>256KB)	86%	More data = more patterns to detect

Scenario	Winner	Why
Text/code (small margins)	bzip2	BWT tuning differences (typically <100 bytes)
Random/encrypted data	zstd	No patterns to detect, just raw entropy coding

Most compressors treat all data as random bytes. But real data has structure:

Pattern	Example	What mzip does
Sequential values	`1, 2, 3, 4, ...`	Store formula `v[i] = start + i × step`
Repeating templates	Same function 100x with different IDs	Store template once + variable list
Columnar data	Log files with fixed columns	Separate columns, compress each optimally
Audio samples	Smooth waveforms	Delta encoding exploits sample-to-sample correlation

zstd-19 compresses 1MB of sequential IDs to 8KB. mzip compresses it to 32 bytes.

All benchmarks run on synthetic data generated by generators.hpp. Click sample links to download the exact input/output files.

Compressor	Avg Ratio	Range	MB/s	Wins	Win%	Score	Rank
mzip	8.16x	1.0-32768x	0.6	188	75.2%	153.6	1
bzip2:9	5.66x	1.0-1001x	0.6	63	25.2%	39.5	2
zstd:19	5.14x	1.0-2641x	1.4	30	12.0%	21.3	3
rar:m5	5.97x	1.0-1014x	2.6	0	0.0%	6.6	4
xz:9	5.89x	1.0-997x	2.3	0	0.0%	6.4	5
7z:mx9	5.88x	1.0-922x	2.3	0	0.0%	6.4	6
gzip:9	4.78x	1.0-240x	0.8	0	0.0%	4.7	7

Score = ratio × speed^0.1 × (1 + 0.1×wins). Total: 66.60 MB. lz4/snappy excluded (speed-focused).

Compressor	Time (ms)	Speed (MB/s)
zstd	96.4	690.8
mzip	3285.2	20.3

zstd decompresses 34.1x faster than mzip

Size	Wins	Total	Win%
4KB	44	50	88.0%
16KB	27	50	54.0%
64KB	30	50	60.0%
256KB	38	50	76.0%
1MB	49	50	98.0%

Type	mzip	2nd Best	Advantage
Database IDs (1MB)	32B (32768x)	3.4KB	106.8x better
Timestamps (1MB)	32B (32768x)	2.7KB	84.2x better
Database IDs (256KB)	32B (8192x)	937B	29.3x better
Timestamps (256KB)	32B (8192x)	772B	24.1x better
Database IDs (64KB)	32B (2048x)	301B	9.4x better
Timestamps (64KB)	32B (2048x)	287B	9.0x better
Image gradient (256KB)	53B (4946x)	323B	6.1x better
Image gradient (64KB)	39B (1680x)	212B	5.4x better
Timestamps (16KB)	32B (512x)	160B	5.0x better
JSON API (1MB)	10KB (104x)	49KB	4.9x better

bzip2's BWT implementation occasionally beats mzip by small margins on text/code files.