Python Performance Benchmarking Suite

A comprehensive benchmarking suite for testing Python performance improvements across different workloads.

Overview

This benchmarking suite is designed to help analyze Python speed improvements in the latest versions, with a focus on:

Memory Management: How Python handles memory allocation, garbage collection, and object lifecycle
Process Startup/Teardown: Import times, module loading, subprocess overhead
Computational Performance: CPU-bound operations with PyTorch and NumPy
Async Performance: I/O-bound and concurrent operations

Directory Structure

benchmarks/
├── pytorch/        # PyTorch tensor operations, model training/inference
├── numpy/          # NumPy array operations, linear algebra
├── async/          # Asyncio operations, concurrent tasks
├── memory/         # Memory profiling and GC analysis
├── startup/        # Process startup, import time, module loading
├── utils/          # Shared utilities and helpers
└── README.md       # This file

Installation

Install the required dependencies:

pip install -e ".[dev]"
pip install torch numpy pytest-benchmark memory-profiler psutil

Running Benchmarks

Run all benchmarks

pytest benchmarks/ --benchmark-only

Run specific category

pytest benchmarks/numpy/ --benchmark-only
pytest benchmarks/pytorch/ --benchmark-only
pytest benchmarks/async/ --benchmark-only

Generate detailed reports

pytest benchmarks/ --benchmark-only --benchmark-json=results.json

Compare results

pytest benchmarks/ --benchmark-only --benchmark-compare=baseline

Benchmark Categories

1. NumPy Benchmarks (`numpy/`)

Array creation and manipulation
Linear algebra operations (matrix multiplication, eigenvalues)
Broadcasting and vectorization
Complex mathematical operations
Memory access patterns

2. PyTorch Benchmarks (`pytorch/`)

Tensor operations (creation, manipulation, slicing)
Neural network forward/backward passes
Model training loops
GPU vs CPU performance (if available)
Gradient computation

3. Async Benchmarks (`async/`)

Concurrent I/O operations
Task scheduling and event loop overhead
Mixed CPU/IO workloads
Context switching performance

4. Memory Benchmarks (`memory/`)

Object allocation patterns
Garbage collection timing
Reference counting overhead
Memory pooling effectiveness
Peak memory usage

5. Startup Benchmarks (`startup/`)

Import time for common modules
Module initialization overhead
Subprocess creation and communication
Dynamic loading performance

Analysis Guide

Memory Management Analysis

The benchmarks help identify improvements in:

Allocation Speed: How quickly Python can allocate new objects
GC Efficiency: Time spent in garbage collection cycles
Memory Overhead: Per-object memory overhead
Reference Counting: Performance of reference count operations

To focus on memory:

pytest benchmarks/memory/ --benchmark-only -v
python benchmarks/memory/profile_memory.py

Process Startup Analysis

The benchmarks help identify improvements in:

Import Time: How long it takes to import modules
Module Caching: Effectiveness of .pyc files and import caching
Subprocess Overhead: Cost of creating new Python processes
Lazy Loading: Benefits of deferred imports

To focus on startup:

pytest benchmarks/startup/ --benchmark-only -v
python -X importtime -c "import numpy" 2>&1 | grep numpy

Comparing Python Versions

Setup Multiple Python Versions

# Using pyenv
pyenv install 3.11.8
pyenv install 3.12.3
pyenv install 3.13.0

# Create virtual environments
python3.11 -m venv venv311
python3.12 -m venv venv312
python3.13 -m venv venv313

Run Benchmarks Across Versions

# Python 3.11
source venv311/bin/activate
pip install -e ".[dev]" torch numpy pytest-benchmark
pytest benchmarks/ --benchmark-only --benchmark-json=results_311.json
deactivate

# Python 3.12
source venv312/bin/activate
pip install -e ".[dev]" torch numpy pytest-benchmark
pytest benchmarks/ --benchmark-only --benchmark-json=results_312.json
deactivate

# Python 3.13
source venv313/bin/activate
pip install -e ".[dev]" torch numpy pytest-benchmark
pytest benchmarks/ --benchmark-only --benchmark-json=results_313.json
deactivate

Analyze Results

python benchmarks/utils/compare_results.py results_311.json results_312.json results_313.json

Performance Comparison Chart

Relative performance across Python versions (higher is better, 3.10 = baseline 1.00x):

                     Single-Thread Performance (vs 3.10 baseline)

Python 3.10  ████████████████████████████████████████  1.00x (baseline)
Python 3.11  ██████████████████████████████████████████████████  1.25x
Python 3.12  ████████████████████████████████████████████████████  1.30x
Python 3.13  ██████████████████████████████████████████████████████  1.35x
Python 3.14  ████████████████████████████████████████████████████████  1.40x
Python 3.14t ██████████████████████████████████████████████████████  1.35x*

                     Multi-Thread CPU-Bound (4 threads, vs 3.10 baseline)

Python 3.10  ████████████████████████████████████████  1.00x (GIL limited)
Python 3.11  ████████████████████████████████████████  1.00x (GIL limited)
Python 3.12  ████████████████████████████████████████  1.00x (GIL limited)
Python 3.13  ████████████████████████████████████████  1.00x (GIL limited)
Python 3.14  ████████████████████████████████████████  1.00x (GIL limited)
Python 3.14t ████████████████████████████████████████████████████████████████████████████████  ~2-4x**

* 3.14t has ~10-20% single-thread overhead due to atomic refcounting
** Multi-thread speedup depends on workload and core count; tested on 2-core system

                     Memory Efficiency (object creation, higher = better)

dict                   ████████████████████████████████████████████████  1.00x
namedtuple             ██████████████████████████████████  0.69x (creation overhead)
dataclass              ██████████████████████████████████████████████████████  1.09x
@dataclass(slots=True) ██████████████████████████████████████████████████████████  1.18x (recommended)

Key Findings from Benchmarks

Metric	3.14 (GIL)	3.14t (no-GIL)	Winner
Empty function call	378 μs	420 μs	3.14 (+10%)
Closure call	796 μs	795 μs	Tie
Function with args	933 μs	1169 μs	3.14 (+20%)
`args/*kwargs`	4301 μs	3992 μs	3.14t (+7%)
List creation (10k)	217 μs	185 μs	3.14t (+15%)
Parallel CPU (4 threads)	~5.4 ms	~5.9 ms	3.14t on multi-core

Note: Free-threaded Python (3.14t) trades single-thread performance for true parallelism. On multi-core systems with CPU-bound parallel workloads, 3.14t can achieve near-linear scaling.

Expected Improvements in Recent Python Versions

Python 3.11

10-60% faster than 3.10 (PEP 659 - Specializing Adaptive Interpreter)
Faster function calls
Better error messages
Optimized frame stack

Python 3.12

Per-interpreter GIL (experimental)
Improved error messages
f-string optimizations
Comprehension inlining

Python 3.13

JIT compilation (experimental)
Free-threaded Python (no-GIL mode, PEP 703)
Better immortal objects
Improved memory management

Python 3.14 / 3.14t

Continued performance improvements
3.14t: Production-ready free-threading (no GIL)
True parallel execution for CPU-bound threads
~10-20% single-thread overhead (atomic refcounting)

Reporting

After running benchmarks, you'll get:

Timing statistics: min, max, mean, stddev for each benchmark
Memory profiles: Peak memory usage and allocation patterns
Comparative analysis: Version-to-version improvements
Breakdown: Memory vs startup vs computation improvements

Contributing

When adding new benchmarks:

Use descriptive names: test_benchmark_<category>_<operation>
Add docstrings explaining what is being measured
Use appropriate benchmark rounds (more for fast operations)
Consider both warm and cold cache scenarios
Document expected performance characteristics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Performance Benchmarking Suite

Overview

Directory Structure

Installation

Running Benchmarks

Run all benchmarks

Run specific category

Generate detailed reports

Compare results

Benchmark Categories

1. NumPy Benchmarks (`numpy/`)

2. PyTorch Benchmarks (`pytorch/`)

3. Async Benchmarks (`async/`)

4. Memory Benchmarks (`memory/`)

5. Startup Benchmarks (`startup/`)

Analysis Guide

Memory Management Analysis

Process Startup Analysis

Comparing Python Versions

Setup Multiple Python Versions

Run Benchmarks Across Versions

Analyze Results

Performance Comparison Chart

Key Findings from Benchmarks

Expected Improvements in Recent Python Versions

Python 3.11

Python 3.12

Python 3.13

Python 3.14 / 3.14t

Reporting

Contributing

References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Python Performance Benchmarking Suite

Overview

Directory Structure

Installation

Running Benchmarks

Run all benchmarks

Run specific category

Generate detailed reports

Compare results

Benchmark Categories

1. NumPy Benchmarks (numpy/)

2. PyTorch Benchmarks (pytorch/)

3. Async Benchmarks (async/)

4. Memory Benchmarks (memory/)

5. Startup Benchmarks (startup/)

Analysis Guide

Memory Management Analysis

Process Startup Analysis

Comparing Python Versions

Setup Multiple Python Versions

Run Benchmarks Across Versions

Analyze Results

Performance Comparison Chart

Key Findings from Benchmarks

Expected Improvements in Recent Python Versions

Python 3.11

Python 3.12

Python 3.13

Python 3.14 / 3.14t

Reporting

Contributing

References

1. NumPy Benchmarks (`numpy/`)

2. PyTorch Benchmarks (`pytorch/`)

3. Async Benchmarks (`async/`)

4. Memory Benchmarks (`memory/`)

5. Startup Benchmarks (`startup/`)