-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
SLICOT C11 Benchmark Report
System Configuration
| Component | Value |
|---|---|
| CPU | Apple M1 Pro |
| RAM | 16 GB |
| OS | Darwin 25.2.0 arm64 |
| Compiler | Apple clang 17.0.0 |
| BLAS/LAPACK | Accelerate.framework |
| Build | debugoptimized (-O2 -g) |
Methodology
- Warmup: 3 iterations (cache priming)
- Timed runs: 10 iterations per benchmark
- Timer:
mach_absolute_time()(nanosecond resolution) - Statistics: min, max, mean, stddev
SB02MD — Continuous-time Algebraic Riccati Equation Solver
Solves Q + A'X + XA - XGX = 0 using Laub's Schur vector method.
| Dataset | N | Mean (μs) | Min | Max | σ | Info |
|---|---|---|---|---|---|---|
| BB01103 | 4 | 9.66 | 9.58 | 10.04 | 0.14 | ✓ |
| BB01104 | 8 | 32.73 | 32.17 | 36.21 | 1.24 | ✓ |
| BB01105 | 9 | 20.37 | 20.00 | 21.38 | 0.49 | ✓ |
| BB01404 | 21 | 164.11 | 162.67 | 166.46 | 1.49 | ✓ |
| BB01106 | 30 | 210.43 | 208.92 | 216.00 | 2.17 | ✓ |
| BB02107 | 4 | 4.68 | 4.62 | 4.79 | 0.05 | ✓ |
| BB02108 | 4 | 6.53 | 6.46 | 6.67 | 0.06 | ✓ |
| BB02110 | 4 | 10.17 | 10.08 | 10.33 | 0.08 | info=3 |
| BB02111 | 4 | 0.54 | 0.50 | 0.58 | 0.02 | info=4 |
| BB02113 | 4 | 4.65 | 4.58 | 4.79 | 0.07 | info=3 |
Notes:
info=3: Schur reordering failed (ill-conditioned problem)info=4: Fewer than N stable eigenvalues (expected for some benchmark cases)
Scaling Analysis
n=4: ~10 μs
n=8: ~33 μs (3.3x for 2x n, expect 8x for O(n³))
n=9: ~20 μs
n=21: ~164 μs
n=30: ~210 μs (21x for 7.5x n, expect 422x for O(n³))
Observed scaling is sub-cubic — likely dominated by BLAS L3 efficiency on M1.
BB01AD — CAREX Benchmark Generator
Generates continuous-time algebraic Riccati equation test problems.
Group 1: Fixed-Size Examples (Literature Problems)
| Example | N | Mean (μs) | Description | Info |
|---|---|---|---|---|
| 1.1 | 2 | 0.15 | Laub 1979, Ex.1 | ✓ |
| 1.2 | 2 | 0.16 | Laub 1979, Ex.2 (uncontrollable) | ✓ |
| 1.3 | — | 0.05 | L-1011 aircraft model | needs data |
| 1.4 | — | 0.09 | Binary distillation column | needs data |
| 1.5 | — | 0.12 | Tubular ammonia reactor | needs data |
| 1.6 | — | 0.42 | J-100 jet engine | needs data |
Group 2: Parameter-Dependent Examples
| Example | N | Mean (μs) | Description |
|---|---|---|---|
| 2.1 | 2 | 0.17 | Arnold/Laub Ex.1 (stabilizability limit) |
| 2.2 | 2 | 0.35 | Arnold/Laub Ex.3 (singular R) |
| 2.3 | 2 | 0.17 | Kenney/Laub/Wette Ex.2 |
| 2.4 | 2 | 0.14 | Bai/Qian (ill-conditioned H) |
| 2.5 | 2 | 0.16 | H∞ problem |
| 2.6 | 3 | 0.77 | Petkov (badly scaled) |
| 2.7 | 4 | 0.30 | Magnetic tape control |
| 2.8 | 4 | 0.25 | Arnold/Laub Ex.2 |
| 2.9 | — | 1.21 | Boeing B-767 flutter |
Group 3: Scalable Examples
| Example | N | Mean (μs) | σ | Description |
|---|---|---|---|---|
| 3.1 | 39 | 18.51 | 0.36 | String of high-speed vehicles |
| 3.2 | 64 | 32.82 | 0.27 | Circulant matrices |
BD01AD — CTDSX Descriptor System Generator
Generates continuous-time dynamical system benchmark examples.
Group 1 & 2: Fixed/Parameter-Dependent
| Example | N | Mean (μs) | Info |
|---|---|---|---|
| 1.1 | 2 | 0.02 | ✓ |
| 1.2 | 2 | 0.02 | ✓ |
| 2.1 | 4 | 0.05 | ✓ |
| 2.2 | 4 | 0.05 | ✓ |
| 2.4 | 3 | 0.05 | ✓ |
Group 3: Scalable
| Example | N | Mean (μs) | σ | Throughput |
|---|---|---|---|---|
| 3.1 | 39 | 1.67 | 0.02 | 23.4 M elem/s |
| 3.2 | 100 | 9.10 | 0.03 | 11.0 M elem/s |
Performance Summary
| Routine | Best Case | Worst Case | Typical |
|---|---|---|---|
| SB02MD (Riccati) | 4.7 μs (n=4) | 210 μs (n=30) | ~30 μs (n=8) |
| BB01AD (CAREX gen) | 0.15 μs (n=2) | 33 μs (n=64) | ~0.3 μs |
| BD01AD (CTDSX gen) | 0.01 μs (n=2) | 9.1 μs (n=100) | ~0.05 μs |
Throughput Estimates
For Riccati solver (2n × 2n Schur decomposition):
n=30: 210 μs → 4,762 solves/sec
Matrix ops: ~54,000 elements → 257 M elem/s
How to Run
# Build
meson setup build && ninja -C build
# Meson benchmark suite
meson test -C build --benchmark
# Individual runs
./build/benchmarks/bench_sb02md SLICOT-Reference/benchmark_data/BB01*.dat
./build/benchmarks/bench_bb01ad
./build/benchmarks/bench_bd01ad
# Python runner
python scripts/run_benchmarks.pyObservations
- Sub-cubic scaling: SB02MD shows better-than-expected scaling on M1, likely due to Accelerate's optimized BLAS L3 routines
- Generator routines are fast: BB01AD/BD01AD are dominated by setup cost, actual matrix generation is memory-bound
- Timer resolution: ~40ns minimum measurable on macOS, some BD01AD results show quantization
- Ill-conditioned cases: BB02110/111/113 correctly report numerical difficulties (info=3,4)
Benchmark infrastructure added in commit dd608f3.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels