This project provides a benchmarking tool to measure Docker Desktop performance on macOS, specifically comparing native ARM64 and AMD64 (via Rosetta 2) execution.
The outputs in this repo are from running on a Nov 2023 MacBook Pro, with an M3 Pro CPU and 18GB of memory, on macOS 26.0.1.
- Docker Desktop for Mac
- macOS (using Apple Silicon)
- Python 3 with matplotlib and numpy (for graph generation)
- jq (for JSON processing)
The benchmark performs 15 different tests, each run 3 times to calculate reliable averages and standard deviations:
- Integer Arithmetic - Pure integer math operations
- Floating Point Math - Mathematical functions (sin, cos, sqrt)
- Gzip Compression - Fast compression algorithm
- Bzip2 Compression - CPU-intensive compression
- XZ Compression - Very CPU-intensive compression
- Matrix Multiplication - Large matrix operations (5000x5000)
- Crypto Operations - AES-256 encryption with OpenSSL
- C Compilation - GCC compilation overhead
- Binary Execution - Native binary execution performance
- System Call Intensive - Frequent system calls (file create/read/delete)
- File I/O Operations - Large file operations
- Process Creation - Fork/exec overhead
- Context Switching - Multi-process context switching
- String Processing - Memory and string manipulation
- JSON Parsing - Mixed workload with jq
Each benchmark reports:
- Average time across all iterations
- Minimum time (best case)
- Maximum time (worst case)
- Standard deviation (consistency metric)
-
Make the build script executable:
chmod +x build_and_run.sh
-
Run the benchmark:
./build_and_run.sh
The script will:
- Build the Docker image for both ARM64 and AMD64 platforms
- Run the benchmark on each platform
- Generate comparison graphs in the
outputdirectory - Display the results for comparison
The benchmark generates several comparison graphs in the output directory:
-
Overall Performance Comparison
- Shows all benchmarks with error bars (standard deviation)
- Displays slowdown factors for each test
-
- Horizontal bar chart showing relative performance impact
- Sorted by slowdown factor (worst to best)
-
- Compilation, compression, crypto, and math operations
- Highlights emulation overhead
-
- File I/O, system calls, and process management
- Shows translation overhead
All graphs include:
- Error bars showing standard deviation
- Performance ratio labels (e.g., "2.5x slower")
- Color coding (green for ARM64, red for AMD64/Rosetta 2)
- ARM64: Native execution on Apple Silicon (baseline)
- AMD64: Execution through Rosetta 2 translation
- Slowdown Factor: AMD64 time / ARM64 time (e.g., 2.0x = twice as slow)
- Error Bars: Show consistency (smaller = more reliable)
- CPU-bound tests: Typically 1.3x - 2.5x slower under Rosetta 2
- System call heavy: May show higher overhead (2x - 3x)
- I/O bound: Usually less impact (1.1x - 1.5x)
- Compilation: Shows toolchain overhead
- Tests with high slowdown + low stddev: Consistent Rosetta 2 penalty
- Tests with high stddev: Inconsistent performance, may need investigation
- Average slowdown: Overall Rosetta 2 impact on your workloads
- Worst case tests: Identify workloads to optimize or avoid
- Each benchmark runs 3 iterations to calculate reliable averages
- Statistical analysis includes mean, min, max, and standard deviation
- All operations are performed in a temporary directory that is cleaned up after completion
- Results are saved in JSON format in
output/results.jsonwith full statistics - The benchmark takes approximately 20-40 minutes to complete both architectures
- Tests are designed to stress different aspects of the emulation layer
You can adjust the number of iterations by editing benchmark.sh:
# Near the top of the file
ITERATIONS=3 # Change to 5 or 10 for more reliable statisticsFor quicker testing (single iteration, less comprehensive):
ITERATIONS=1 # Faster but less statistically reliable