Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c3abaaf
Skip the slowest tests by default
May 19, 2026
d915b74
Trim n parametrize lists to {smallest, 12}
May 19, 2026
4b75e7e
Mark more single-test outliers as slow (round 2)
May 19, 2026
75a15d2
Slow-mark long-tail test_tile16 outliers (round 3)
May 19, 2026
9062dbc
Revert function-level @pytest.mark.slow decorators
May 19, 2026
9dfa84d
[Demo] cholesky_blocked: take N / N_ENVS / WARMUP / ITERS via argparse
May 19, 2026
6e936aa
[Test] test_tile16_cholesky_blocked_demo: invoke demo in smoke-mode
May 19, 2026
21c2877
[Test] test_matmul_chain_qipc_sizes: parametrize on matrix shapes
May 19, 2026
27a86a1
[Test] test_gdar_mpm: parametrize on particles_side / n_grid / num_steps
May 19, 2026
2ea335f
[Test] test_device_{reduce,exclusive_scan}: fuse {add,min,max} into o…
May 19, 2026
64fcbb0
[Style] black: reformat test_tile16_cholesky_blocked_demo cmd list + …
May 19, 2026
3031e14
[Test] test_subgroup_full_matches_tiled: fuse 20 thin subgroup-op wra…
May 19, 2026
edf53ea
[Test] test_block_reduce{,_all}: fuse {add,min,max} into op-parametri…
May 19, 2026
8fd433e
[Test] test_block_inclusive: fuse {add,min,max} into one op-parametri…
May 19, 2026
9238652
[Test] test_block_exclusive_minmax: fuse {min,max} into one op-parame…
May 19, 2026
62eb3aa
[Test] @pytest.mark.sample: per-test stochastic parametrize subsampling
May 19, 2026
a4badcf
[Test] test_tile16: bump @pytest.mark.sample(n=4) -> n=6
May 19, 2026
446aef7
[Style] Reflow @sample comment blocks to 120c (from AI default ~110c)
May 19, 2026
61718fe
[BugFix] @sample: propagate seed to xdist workers via workerinput
May 19, 2026
5205ef0
[BugFix] @sample: propagate seed to xdist workers via QD_SAMPLE_SEED …
May 19, 2026
7dd7a6d
[BugFix] @sample: pick seed in run_tests.py, propagate via --sample-s…
May 19, 2026
c4b81bd
[DebugTemp] @sample: log seed/argv/workerinput in each pytest_collect…
May 19, 2026
fb27ec5
[BugFix] @sample: sample from sorted(group) so xdist workers see iden…
May 19, 2026
908e3a2
[DebugTemp2] re-add sample debug log
May 19, 2026
06e50cd
[BugFix] @sample: derive per-test RNG via sha256, not tuple-seeding o…
May 19, 2026
21c56d2
[Style] Reflow CI-flagged 80c-wrapped comments to 120c
May 19, 2026
9d7518c
[Doc] contributing.md: shorten testing bullet per PR review
May 19, 2026
b4e3355
[Doc] unit_testing.md: apply PR review feedback
May 19, 2026
ca6e9f0
Merge branch 'main' into hp/mark-slow-tests
hughperkins May 19, 2026
5f4e664
[Doc] address second round of PR review on unit_testing + contributing
May 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 2 additions & 14 deletions docs/source/user_guide/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,13 @@

## Good practice reminder

* *testing*: Any new features or modified code should be tested. You have to run the test suite using `python tests/run_tests.py` which sets up the right test environment for `pytest`. CLI arguments are forwarded to `pytest`. Do not use `pytest` directly as it behaves differently. To see a per-file timing breakdown (useful for identifying slow test files), set `QD_FILE_TIMING=1` — e.g. `QD_FILE_TIMING=1 python tests/run_tests.py`. This is enabled by default in the Mac CI job and the results appear in the GitHub Actions job summary.
* *testing*: Any new features or modified code should be tested. see [unit_testing.md](unit_testing.md)
* *format/linter*: Before pushing any commits, ensure you set up `pre-commit` and run it using `pre-commit run -a`
* No need to force push to keep a clean history as the merging is eventually done by squashing commits.

## Running tests

Run the test suite with `python tests/run_tests.py`. CLI arguments are forwarded to pytest. For example, to run only Metal tests matching a keyword:

```
python tests/run_tests.py --arch metal -k "test_tile16_cholesky"
```

The target architecture can also be set via the `QD_WANTED_ARCHS` environment variable (comma-separated, e.g. `QD_WANTED_ARCHS=metal,vulkan`).

### Kernel compilation cache

During test runs, compiled kernels are cached to disk so that the same kernel is not recompiled after each `qd.reset()`/`qd.init()` cycle.

A fresh, empty cache directory is created for each test session by pytest's [`tmp_path_factory`](https://docs.pytest.org/en/stable/how-to/tmp_path.html) (typically under `/tmp/pytest-of-<user>/pytest-<N>/qdcache0/`). Old session directories are cleaned up automatically by pytest's retention policy. This cache is separate from the user-facing `~/.cache/quadrants/` cache.
See [unit_testing.md](unit_testing.md).

## Creating your build/dev environment

Expand Down
1 change: 1 addition & 0 deletions docs/source/user_guide/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ init_options
:maxdepth: 1
:titlesonly:

unit_testing
kernel_coverage
```

Expand Down
189 changes: 189 additions & 0 deletions docs/source/user_guide/unit_testing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# Unit testing

This page documents how to run, write, and tune the Quadrants Python unit test suite. For setup of the build / dev environment, see [contributing.md](contributing.md).

## Running the tests

The test suite is run via the project's launcher, **not** by invoking `pytest` directly:

```
python tests/run_tests.py
```

The launcher sets up the test-only env vars (kernel offline cache, watchdog, xdist worker count, etc.) and forwards any unrecognised flags to pytest. Calling `pytest` directly skips that setup and behaves differently.

Common one-liners:

```
# run one file
python tests/run_tests.py test_tile16

# run one test (any pytest -k expression)
python tests/run_tests.py -k test_tile16_cholesky

# run on a specific backend (or comma-separated list)
python tests/run_tests.py --arch cuda
python tests/run_tests.py --arch metal -k tile16

# same, via env var (handy for CI)
QD_WANTED_ARCHS=metal,vulkan python tests/run_tests.py

# rerun the last failing tests first
python tests/run_tests.py -f

# stop at the first failure
python tests/run_tests.py -x
```

The target architecture can also be set via `QD_WANTED_ARCHS` (comma-separated; supports `^arch` to exclude rather than include).

## Markers

Tests can opt into two project-specific markers, in addition to pytest's built-in ones (`skip`, `xfail`, etc.).

### `@pytest.mark.slow`

Marks a test as **slow**. `tests/run_tests.py` adds `-m "not slow"` to the pytest invocation by default; pass `--run-slow` to opt back in:

```
# default: skip slow
python tests/run_tests.py

# include slow
python tests/run_tests.py --run-slow

# slow ONLY (e.g. nightly job)
python tests/run_tests.py -m slow --run-slow
```

The marker is used in two patterns:

1. **Whole-test slow**: the whole test takes a long time.

```python
@pytest.mark.slow
def test_thing_that_is_always_slow():
...
```

2. **Slow-marked parametrize case**:

```python
@pytest.mark.parametrize("n", [4, pytest.param(12, marks=pytest.mark.slow)])
def test_sym_eig_general(n):
...
```

In this specific example the default suite still exercises the code path; the slow lane just adds the larger-size variant for full coverage.

### `@pytest.mark.sample(...)`

Marks a single heavily-parametrized test as opting in to **per-run stochastic sub-selection** of its parametrize cases. Use when:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this idea at all. But you do what you want.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your concern is that when one makes a PR, and tests run multiple times, sometimes they might succeed, sometimes they might fail?

Thoughts on a compromise where the random seed is deterministically fixed by the name of the branch? So, re-running the tests on a specific PR won't change which tests run.


- the test's parametrize space is large (≥ ~16 cases),
- each parametrize case is roughly independent (covering an independent corner case rather than a single bug class),
- running every case every CI run is overkill, and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more in favour of nightly vs PR jobs. With a defined selection of required subset running on all PRs.

- asymptotic coverage over many runs is acceptable.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is ever the case in my opinion.


Apply it like any other marker:

```python
@pytest.mark.sample(n=6) # keep 6 of N cases per run
# OR
@pytest.mark.sample(fraction=0.25) # keep 25% of cases per run, min 1
@pytest.mark.parametrize("size", [...])
@pytest.mark.parametrize("dtype", [...])
@pytest.mark.parametrize("layout", [...])
@test_utils.test(arch=qd.gpu)
def test_thing(size, dtype, layout):
...
```

**How to reproduce failing tests.** Three levels of reproducibility:

1. **One failing case** — paste the failing nodeid from the CI log. Pytest already prints the full nodeid on failure:

```
FAILED tests/python/test_tile16.py::test_tile16_load_store[arch=cuda-qd_dtype0-ndarray-16-32-4-8-7-11]
```

Just rerun it directly:

```
python tests/run_tests.py -k "test_tile16_load_store and ndarray-16-32-4-8-7-11"
# or, if you want the exact nodeid (bypasses -k matching):
pytest "tests/python/test_tile16.py::test_tile16_load_store[arch=cuda-qd_dtype0-ndarray-16-32-4-8-7-11]"
```

When pytest narrows collection to a single nodeid, the sampler's `len(group) <= 1` short-circuit keeps it. **No `--sample-seed` flag needed.**

2. **The exact subset of a failing run** — useful when several cases failed and you want to bisect or reproduce the whole sample locally. The report header of every run prints the seed used:

```
sample-seed=1834729104 (reproduce the same sample: --sample-seed=1834729104; ...)
```

Then locally:

```
python tests/run_tests.py --sample-seed=1834729104
```

3. **Exhaustive run** — for release gates, coverage-debt audits, or a periodic "did anything regress in any branch of the parametrize space" sweep. Disables the sampler entirely; every `@sample`-marked test runs every parametrize case:

```
python tests/run_tests.py --no-sample
```

**Per-test RNG independence.** Each `@sample`-marked test's subsample is seeded from `(global_seed, test_nodeid_prefix)`, so adding / renaming / tweaking the mark on `test_A` does NOT shift the sample of `test_B`. Routine refactors don't cause samples to migrate file-wide.

**Composition with `slow`.** Sampling runs **after** marker-based filtering. With `--run-slow` not passed (the default), slow-marked parametrize cases drop out first, then the sampler sub-selects from the remaining (fast) cases. The intersection is the right composition: `--no-sample --run-slow` is the truly-exhaustive combo.

## Writing new tests

The standard recipe combines `@test_utils.test(...)` (arch / option matrix) with `@pytest.mark.parametrize`:

```python
import pytest
import quadrants as qd
from tests import test_utils


@pytest.mark.parametrize("n", [4, pytest.param(12, marks=pytest.mark.slow)])
@test_utils.test(arch=qd.gpu, default_fp=qd.f32)
def test_my_thing(n):
...
```

`@test_utils.test` is what wires the test into the per-backend matrix and applies platform exclusions (`exclude=`), extension requirements (`require=`, e.g. `qd.extension.data64` for f64 tests), and per-test options (`default_fp`, `fast_math`, etc.). See `tests/test_utils.py` for the full surface.

Common helpers in `tests/test_utils.py`:

- `test_utils.skip_if_f64_unsupported(dtype)` — skip the current test at runtime if `dtype == qd.f64` and the active backend can't carry f64 through buffer I/O (Metal, MoltenVK on Darwin). Use inside a parametrized test that sweeps both f32 and f64.
- `test_utils.expected_archs()` — list of archs that the current `QD_WANTED_ARCHS` allows. Used to skip tests with no satisfiable arch.

## Advanced

Optional knobs and runtime details. The defaults work for most contributors.

### Per-test timeout

Per-test timeouts default to 600 s and are enforced by `pytest_hardtle`, a CFFI-compiled C watchdog that can kill tests hung in native GPU calls even when the GIL is held.

### Kernel compilation cache

During each test session the kernel compilation cache lives in a fresh, empty temp directory created by pytest's [`tmp_path_factory`](https://docs.pytest.org/en/stable/how-to/tmp_path.html) — typically `/tmp/pytest-of-<user>/pytest-<N>/qdcache0/`. Old session directories are cleaned up automatically by pytest's retention policy. This cache is separate from the user-facing `~/.cache/quadrants/` cache, and avoids recompiling identical kernels after each `qd.reset()` / `qd.init()` cycle within a session.

### Per-file timing breakdown

Set `QD_FILE_TIMING=1` to print a per-file duration summary at the end of the session:

```
QD_FILE_TIMING=1 python tests/run_tests.py
```

This is enabled by default in the Mac CI job; the results appear in the GitHub Actions job summary and are the primary tool for identifying slow test files.

### `@sample` + xdist seed propagation

`tests/run_tests.py` picks the per-run sample seed before pytest is launched and passes it via `--sample-seed=<S>` on argv. xdist forwards argv to every worker, so all workers see the same seed and produce identical samples; without this, each worker would subsample independently and `--sample-seed=<S>` wouldn't reproduce. The per-test RNG inside `pytest_collection_modifyitems` is then derived deterministically via `sha256(f"{seed}|{nodeid_prefix}")`, which is what makes the **Per-test RNG independence** property above hold.
36 changes: 26 additions & 10 deletions misc/demos/cholesky_blocked.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
#!/usr/bin/env python3
"""Benchmark 92x92 blocked Cholesky factorization using Tile16x16.
"""Benchmark NxN blocked Cholesky factorization using Tile16x16.

Three kernels compared:

1. Baseline: scalar Cholesky-Crout, 64 threads, shared memory, 2*N+1 sequential syncs. Thread 0 computes each
diagonal, remaining threads parallelize off-diagonal updates.

2. Blocked: 6x6 grid of 16x16 tiles, 16 threads, shared memory, scalar Crout for diagonal blocks. Same blocking
structure as Tile16x16 but all data lives in shared memory with block.sync() between every step.
2. Blocked: ceil(N/16) x ceil(N/16) grid of 16x16 tiles, 16 threads, shared memory, scalar Crout for diagonal
blocks. Same blocking structure as Tile16x16 but all data lives in shared memory with block.sync() between
every step.

3. Tile16x16: same blocked structure but fully register-resident via Tile16x16. No shared memory, zero syncs.
Prior tiles read from global memory (L2).
Expand All @@ -20,22 +21,37 @@
tile16 (Tile16x16, no shared memory) 16 533 5.19x

Usage:
python misc/demos/cholesky_blocked.py
python misc/demos/cholesky_blocked.py [--n N] [--n-envs N_ENVS] [--num-warmup WARMUP] [--num-iters ITERS]
"""

import argparse
import time

import numpy as np

import quadrants as qd

N = 92

def _parse_args():
p = argparse.ArgumentParser(
description="Blocked Cholesky NxN benchmark (3 kernels: baseline / blocked / tile16).",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
)
p.add_argument("--n", type=int, default=92, help="Matrix dimension N (NxN SPD).")
p.add_argument("--n-envs", type=int, default=4096, help="Number of independent environments.")
p.add_argument("--num-warmup", type=int, default=50, help="Warmup iterations per kernel.")
p.add_argument("--num-iters", type=int, default=200, help="Timed iterations per kernel.")
return p.parse_args()


_args = _parse_args()
N = _args.n
TILE = 16
N_BLOCKS = (N + TILE - 1) // TILE # 6
N_PADDED = N_BLOCKS * TILE # 96, rounded up for blocked kernel SharedArrays
N_ENVS = 4096
WARMUP = 50
ITERS = 200
N_BLOCKS = (N + TILE - 1) // TILE
N_PADDED = N_BLOCKS * TILE # rounded up for blocked kernel SharedArrays
N_ENVS = _args.n_envs
WARMUP = _args.num_warmup
ITERS = _args.num_iters

qd.init(arch=qd.gpu)

Expand Down
6 changes: 6 additions & 0 deletions tests/pytest.ini
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,9 @@ markers =
run_in_serial: mark test to run serially(usually for resource intensive tests).
sm70: Can only run on GPU with compute capability 7.0 or higher.
needs_torch: mark test as requiring PyTorch.
slow: mark test (or parametrize case) as slow. Skipped by default by tests/run_tests.py;
pass --run-slow to include them, or directly `pytest -m slow` to run only the slow ones.
sample(fraction=None, n=None): per-test stochastic parametrize subsampling. Pass exactly one of
`fraction` (0..1) or `n` (>= 1). Implemented in tests/python/conftest.py. See
docs/source/user_guide/unit_testing.md for the reproducibility recipes (--sample-seed,
--no-sample, nodeid-paste).
Loading
Loading