Benchmark Results: C11 vs Reference Performance

# SLICOT C11 Benchmark Report

## System Configuration

| Component | Value |
|-----------|-------|
| CPU | Apple M1 Pro |
| RAM | 16 GB |
| OS | Darwin 25.2.0 arm64 |
| Compiler | Apple clang 17.0.0 |
| BLAS/LAPACK | Accelerate.framework |
| Build | debugoptimized (-O2 -g) |

## Methodology

- **Warmup**: 3 iterations (cache priming)
- **Timed runs**: 10 iterations per benchmark
- **Timer**: `mach_absolute_time()` (nanosecond resolution)
- **Statistics**: min, max, mean, stddev

---

## SB02MD — Continuous-time Algebraic Riccati Equation Solver

Solves `Q + A'X + XA - XGX = 0` using Laub's Schur vector method.

| Dataset | N | Mean (μs) | Min | Max | σ | Info |
|---------|--:|----------:|----:|----:|--:|------|
| BB01103 | 4 | 9.66 | 9.58 | 10.04 | 0.14 | ✓ |
| BB01104 | 8 | 32.73 | 32.17 | 36.21 | 1.24 | ✓ |
| BB01105 | 9 | 20.37 | 20.00 | 21.38 | 0.49 | ✓ |
| BB01404 | 21 | 164.11 | 162.67 | 166.46 | 1.49 | ✓ |
| BB01106 | 30 | 210.43 | 208.92 | 216.00 | 2.17 | ✓ |
| BB02107 | 4 | 4.68 | 4.62 | 4.79 | 0.05 | ✓ |
| BB02108 | 4 | 6.53 | 6.46 | 6.67 | 0.06 | ✓ |
| BB02110 | 4 | 10.17 | 10.08 | 10.33 | 0.08 | info=3 |
| BB02111 | 4 | 0.54 | 0.50 | 0.58 | 0.02 | info=4 |
| BB02113 | 4 | 4.65 | 4.58 | 4.79 | 0.07 | info=3 |

**Notes**:
- `info=3`: Schur reordering failed (ill-conditioned problem)
- `info=4`: Fewer than N stable eigenvalues (expected for some benchmark cases)

### Scaling Analysis

```
n=4:   ~10 μs
n=8:   ~33 μs   (3.3x for 2x n, expect 8x for O(n³))
n=9:   ~20 μs
n=21: ~164 μs
n=30: ~210 μs   (21x for 7.5x n, expect 422x for O(n³))
```

Observed scaling is sub-cubic — likely dominated by BLAS L3 efficiency on M1.

---

## BB01AD — CAREX Benchmark Generator

Generates continuous-time algebraic Riccati equation test problems.

### Group 1: Fixed-Size Examples (Literature Problems)

| Example | N | Mean (μs) | Description | Info |
|---------|--:|----------:|-------------|------|
| 1.1 | 2 | 0.15 | Laub 1979, Ex.1 | ✓ |
| 1.2 | 2 | 0.16 | Laub 1979, Ex.2 (uncontrollable) | ✓ |
| 1.3 | — | 0.05 | L-1011 aircraft model | needs data |
| 1.4 | — | 0.09 | Binary distillation column | needs data |
| 1.5 | — | 0.12 | Tubular ammonia reactor | needs data |
| 1.6 | — | 0.42 | J-100 jet engine | needs data |

### Group 2: Parameter-Dependent Examples

| Example | N | Mean (μs) | Description |
|---------|--:|----------:|-------------|
| 2.1 | 2 | 0.17 | Arnold/Laub Ex.1 (stabilizability limit) |
| 2.2 | 2 | 0.35 | Arnold/Laub Ex.3 (singular R) |
| 2.3 | 2 | 0.17 | Kenney/Laub/Wette Ex.2 |
| 2.4 | 2 | 0.14 | Bai/Qian (ill-conditioned H) |
| 2.5 | 2 | 0.16 | H∞ problem |
| 2.6 | 3 | 0.77 | Petkov (badly scaled) |
| 2.7 | 4 | 0.30 | Magnetic tape control |
| 2.8 | 4 | 0.25 | Arnold/Laub Ex.2 |
| 2.9 | — | 1.21 | Boeing B-767 flutter | needs data |

### Group 3: Scalable Examples

| Example | N | Mean (μs) | σ | Description |
|---------|--:|----------:|--:|-------------|
| 3.1 | 39 | 18.51 | 0.36 | String of high-speed vehicles |
| 3.2 | 64 | 32.82 | 0.27 | Circulant matrices |

---

## BD01AD — CTDSX Descriptor System Generator

Generates continuous-time dynamical system benchmark examples.

### Group 1 & 2: Fixed/Parameter-Dependent

| Example | N | Mean (μs) | Info |
|---------|--:|----------:|------|
| 1.1 | 2 | 0.02 | ✓ |
| 1.2 | 2 | 0.02 | ✓ |
| 2.1 | 4 | 0.05 | ✓ |
| 2.2 | 4 | 0.05 | ✓ |
| 2.4 | 3 | 0.05 | ✓ |

### Group 3: Scalable

| Example | N | Mean (μs) | σ | Throughput |
|---------|--:|----------:|--:|------------|
| 3.1 | 39 | 1.67 | 0.02 | 23.4 M elem/s |
| 3.2 | 100 | 9.10 | 0.03 | 11.0 M elem/s |

---

## Performance Summary

| Routine | Best Case | Worst Case | Typical |
|---------|----------:|----------:|--------:|
| **SB02MD** (Riccati) | 4.7 μs (n=4) | 210 μs (n=30) | ~30 μs (n=8) |
| **BB01AD** (CAREX gen) | 0.15 μs (n=2) | 33 μs (n=64) | ~0.3 μs |
| **BD01AD** (CTDSX gen) | 0.01 μs (n=2) | 9.1 μs (n=100) | ~0.05 μs |

### Throughput Estimates

For Riccati solver (2n × 2n Schur decomposition):
```
n=30:  210 μs → 4,762 solves/sec
       Matrix ops: ~54,000 elements → 257 M elem/s
```

---

## How to Run

```bash
# Build
meson setup build && ninja -C build

# Meson benchmark suite
meson test -C build --benchmark

# Individual runs
./build/benchmarks/bench_sb02md SLICOT-Reference/benchmark_data/BB01*.dat
./build/benchmarks/bench_bb01ad
./build/benchmarks/bench_bd01ad

# Python runner
python scripts/run_benchmarks.py
```

---

## Observations

1. **Sub-cubic scaling**: SB02MD shows better-than-expected scaling on M1, likely due to Accelerate's optimized BLAS L3 routines
2. **Generator routines are fast**: BB01AD/BD01AD are dominated by setup cost, actual matrix generation is memory-bound
3. **Timer resolution**: ~40ns minimum measurable on macOS, some BD01AD results show quantization
4. **Ill-conditioned cases**: BB02110/111/113 correctly report numerical difficulties (info=3,4)

---

Benchmark infrastructure added in commit dd608f3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Results: C11 vs Reference Performance #8

SLICOT C11 Benchmark Report

System Configuration

Methodology

SB02MD — Continuous-time Algebraic Riccati Equation Solver

Scaling Analysis

BB01AD — CAREX Benchmark Generator

Group 1: Fixed-Size Examples (Literature Problems)

Group 2: Parameter-Dependent Examples

Group 3: Scalable Examples

BD01AD — CTDSX Descriptor System Generator

Group 1 & 2: Fixed/Parameter-Dependent

Group 3: Scalable

Performance Summary

Throughput Estimates

How to Run

Observations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Component	Value
CPU	Apple M1 Pro
RAM	16 GB
OS	Darwin 25.2.0 arm64
Compiler	Apple clang 17.0.0
BLAS/LAPACK	Accelerate.framework
Build	debugoptimized (-O2 -g)

Dataset	N	Mean (μs)	Min	Max	σ	Info
BB01103	4	9.66	9.58	10.04	0.14	✓
BB01104	8	32.73	32.17	36.21	1.24	✓
BB01105	9	20.37	20.00	21.38	0.49	✓
BB01404	21	164.11	162.67	166.46	1.49	✓
BB01106	30	210.43	208.92	216.00	2.17	✓
BB02107	4	4.68	4.62	4.79	0.05	✓
BB02108	4	6.53	6.46	6.67	0.06	✓
BB02110	4	10.17	10.08	10.33	0.08	info=3
BB02111	4	0.54	0.50	0.58	0.02	info=4
BB02113	4	4.65	4.58	4.79	0.07	info=3

Example	N	Mean (μs)	Description	Info
1.1	2	0.15	Laub 1979, Ex.1	✓
1.2	2	0.16	Laub 1979, Ex.2 (uncontrollable)	✓
1.3	—	0.05	L-1011 aircraft model	needs data
1.4	—	0.09	Binary distillation column	needs data
1.5	—	0.12	Tubular ammonia reactor	needs data
1.6	—	0.42	J-100 jet engine	needs data

Example	N	Mean (μs)	Description
2.1	2	0.17	Arnold/Laub Ex.1 (stabilizability limit)
2.2	2	0.35	Arnold/Laub Ex.3 (singular R)
2.3	2	0.17	Kenney/Laub/Wette Ex.2
2.4	2	0.14	Bai/Qian (ill-conditioned H)
2.5	2	0.16	H∞ problem
2.6	3	0.77	Petkov (badly scaled)
2.7	4	0.30	Magnetic tape control
2.8	4	0.25	Arnold/Laub Ex.2
2.9	—	1.21	Boeing B-767 flutter

Example	N	Mean (μs)	σ	Description
3.1	39	18.51	0.36	String of high-speed vehicles
3.2	64	32.82	0.27	Circulant matrices

Routine	Best Case	Worst Case	Typical
SB02MD (Riccati)	4.7 μs (n=4)	210 μs (n=30)	~30 μs (n=8)
BB01AD (CAREX gen)	0.15 μs (n=2)	33 μs (n=64)	~0.3 μs
BD01AD (CTDSX gen)	0.01 μs (n=2)	9.1 μs (n=100)	~0.05 μs

Benchmark Results: C11 vs Reference Performance #8

Description

SLICOT C11 Benchmark Report

System Configuration

Methodology

SB02MD — Continuous-time Algebraic Riccati Equation Solver

Scaling Analysis

BB01AD — CAREX Benchmark Generator

Group 1: Fixed-Size Examples (Literature Problems)

Group 2: Parameter-Dependent Examples

Group 3: Scalable Examples

BD01AD — CTDSX Descriptor System Generator

Group 1 & 2: Fixed/Parameter-Dependent

Group 3: Scalable

Performance Summary

Throughput Estimates

How to Run

Observations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions