Docker Desktop Performance Benchmark

This project provides a benchmarking tool to measure Docker Desktop performance on macOS, specifically comparing native ARM64 and AMD64 (via Rosetta 2) execution.

The outputs in this repo are from running on a Nov 2023 MacBook Pro, with an M3 Pro CPU and 18GB of memory, on macOS 26.0.1.

Prerequisites

Docker Desktop for Mac
macOS (using Apple Silicon)
Python 3 with matplotlib and numpy (for graph generation)
jq (for JSON processing)

Benchmark Operations

The benchmark performs 15 different tests, each run 3 times to calculate reliable averages and standard deviations:

CPU-Intensive Tests (show emulation overhead)

Integer Arithmetic - Pure integer math operations
Floating Point Math - Mathematical functions (sin, cos, sqrt)
Gzip Compression - Fast compression algorithm
Bzip2 Compression - CPU-intensive compression
XZ Compression - Very CPU-intensive compression
Matrix Multiplication - Large matrix operations (5000x5000)
Crypto Operations - AES-256 encryption with OpenSSL
C Compilation - GCC compilation overhead
Binary Execution - Native binary execution performance

System & I/O Tests (show translation overhead)

System Call Intensive - Frequent system calls (file create/read/delete)
File I/O Operations - Large file operations
Process Creation - Fork/exec overhead
Context Switching - Multi-process context switching
String Processing - Memory and string manipulation
JSON Parsing - Mixed workload with jq

Each benchmark reports:

Average time across all iterations
Minimum time (best case)
Maximum time (worst case)
Standard deviation (consistency metric)

Usage

Make the build script executable:
```
chmod +x build_and_run.sh
```
Run the benchmark:
```
./build_and_run.sh
```

The script will:

Build the Docker image for both ARM64 and AMD64 platforms
Run the benchmark on each platform
Generate comparison graphs in the output directory
Display the results for comparison

Generated Graphs

The benchmark generates several comparison graphs in the output directory:

Overall Performance Comparison
- Shows all benchmarks with error bars (standard deviation)
- Displays slowdown factors for each test
Rosetta 2 Impact
- Horizontal bar chart showing relative performance impact
- Sorted by slowdown factor (worst to best)
CPU-Intensive Operations
- Compilation, compression, crypto, and math operations
- Highlights emulation overhead
I/O & System Operations
- File I/O, system calls, and process management
- Shows translation overhead

All graphs include:

Error bars showing standard deviation
Performance ratio labels (e.g., "2.5x slower")
Color coding (green for ARM64, red for AMD64/Rosetta 2)

Interpreting Results

Understanding the Metrics

ARM64: Native execution on Apple Silicon (baseline)
AMD64: Execution through Rosetta 2 translation
Slowdown Factor: AMD64 time / ARM64 time (e.g., 2.0x = twice as slow)
Error Bars: Show consistency (smaller = more reliable)

Expected Performance Patterns

CPU-bound tests: Typically 1.3x - 2.5x slower under Rosetta 2
System call heavy: May show higher overhead (2x - 3x)
I/O bound: Usually less impact (1.1x - 1.5x)
Compilation: Shows toolchain overhead

What to Look For

Tests with high slowdown + low stddev: Consistent Rosetta 2 penalty
Tests with high stddev: Inconsistent performance, may need investigation
Average slowdown: Overall Rosetta 2 impact on your workloads
Worst case tests: Identify workloads to optimize or avoid

Notes

Each benchmark runs 3 iterations to calculate reliable averages
Statistical analysis includes mean, min, max, and standard deviation
All operations are performed in a temporary directory that is cleaned up after completion
Results are saved in JSON format in output/results.json with full statistics
The benchmark takes approximately 20-40 minutes to complete both architectures
Tests are designed to stress different aspects of the emulation layer

Customization

You can adjust the number of iterations by editing benchmark.sh:

# Near the top of the file
ITERATIONS=3  # Change to 5 or 10 for more reliable statistics

For quicker testing (single iteration, less comprehensive):

ITERATIONS=1  # Faster but less statistically reliable

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
output		output
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
benchmark.sh		benchmark.sh
build_and_run.sh		build_and_run.sh
generate_graphs.py		generate_graphs.py
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker Desktop Performance Benchmark

Prerequisites

Benchmark Operations

CPU-Intensive Tests (show emulation overhead)

System & I/O Tests (show translation overhead)

Usage

Generated Graphs

Interpreting Results

Understanding the Metrics

Expected Performance Patterns

What to Look For

Notes

Customization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Docker Desktop Performance Benchmark

Prerequisites

Benchmark Operations

CPU-Intensive Tests (show emulation overhead)

System & I/O Tests (show translation overhead)

Usage

Generated Graphs

Interpreting Results

Understanding the Metrics

Expected Performance Patterns

What to Look For

Notes

Customization

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages