The CI pipeline maps test categories (st, ut-py, ut-cpp) × hardware tiers to GitHub Actions jobs. See testing.md for full test organization and hardware classification.
Design principles:
- Separate jobs per test category — st, ut-py, and ut-cpp run as independent jobs for parallelism and clear dashboard visibility.
- Runner matches hardware tier — no-hardware tests run on
ubuntu-latest; platform-specific tests run on self-hosted runners with the matching label (a2a3,a5). --platformis the only filter — pytest uses--platform+ therequires_hardwaremarker; ctest uses label-LEexclusion. No-m st, no-m "not requires_hardware".- sim = no hardware —
a2a3sim/a5simjobs run on github-hosted runners alongside unit tests.
The complete test-type × hardware-tier matrix. Empty cells have no tests yet; only non-empty jobs exist in ci.yml.
| Category | github-hosted (no hardware) | a2a3 runner | a5 runner |
|---|---|---|---|
| ut-py | ut-py |
ut-py-a2a3 |
ut-py-a5 |
| ut-cpp | ut-cpp |
ut-cpp-a2a3 |
ut-cpp-a5 |
| st | st-sim-a2a3, st-sim-a5 |
st-a2a3 |
st-a5 |
Currently active jobs (a5 jobs commented out — no runner yet):
PullRequest
├── ut-py (ubuntu-latest)
├── ut-cpp (ubuntu-latest)
├── st-sim-a2a3 (ubuntu + macOS)
├── st-sim-a5 (ubuntu + macOS)
├── ut-py-a2a3 (a2a3 self-hosted)
├── ut-cpp-a2a3 (a2a3 self-hosted)
├── st-a2a3 (a2a3 self-hosted)
├── ut-py-a5 (a5 self-hosted, commented out)
├── ut-cpp-a5 (a5 self-hosted, commented out)
└── st-a5 (a5 self-hosted, commented out)
| Job | Runner | What it runs |
|---|---|---|
ut-py |
ubuntu-latest |
pytest tests/ut |
ut-cpp |
ubuntu-latest |
ctest --test-dir tests/ut/cpp/build -LE requires_hardware |
st-sim-a2a3 |
ubuntu-latest, macos-latest |
pytest examples tests/st --platform a2a3sim + ci.py -p a2a3sim |
st-sim-a5 |
ubuntu-latest, macos-latest |
pytest examples tests/st --platform a5sim + ci.py -p a5sim |
ut-py-a2a3 |
a2a3 self-hosted | pytest tests/ut --platform a2a3 |
ut-cpp-a2a3 |
a2a3 self-hosted | ctest --test-dir tests/ut/cpp/build -L "^requires_hardware(_a2a3)?$" |
st-a2a3 |
a2a3 self-hosted | pytest examples tests/st --platform a2a3 + ci.py -p a2a3 -d ... |
ut-py-a5 |
a5 self-hosted | pytest tests/ut --platform a5 |
ut-cpp-a5 |
a5 self-hosted | ctest --test-dir tests/ut/cpp/build -L "^requires_hardware(_a5)?$" |
st-a5 |
a5 self-hosted | pytest examples tests/st --platform a5 + ci.py -p a5 -d ... |
- Sim scene tests and no-hardware unit tests run on github-hosted runners (no hardware).
a2a3tests (st + ut-py + ut-cpp) only run on thea2a3self-hosted machine.a5tests (st + ut-py + ut-cpp) only run on thea5self-hosted machine.
Three hardware tiers, applied to all test categories. See testing.md for the full table including per-category mechanisms (pytest markers, ctest labels, folder structure).
| Tier | CI Runner | Job examples |
|---|---|---|
| No hardware | ubuntu-latest |
ut-py, ut-cpp, st-sim-* |
| Platform-specific (a2a3) | [self-hosted, a2a3] |
ut-py-a2a3, ut-cpp-a2a3, st-a2a3 |
| Platform-specific (a5) | [self-hosted, a5] |
ut-py-a5, ut-cpp-a5, st-a5 |
Python unit tests. Run via pytest, filtered by --platform + requires_hardware marker.
| File | Content | Hardware? |
|---|---|---|
test_task_interface.py |
nanobind extension API tests | No |
test_runtime_builder.py (mocked classes) |
RuntimeBuilder discovery, error handling, build logic | No |
test_runtime_builder.py::TestRuntimeBuilderIntegration |
Real compilation across platform × runtime | Yes (@pytest.mark.requires_hardware) |
GoogleTest-based tests for pure C++ modules. Run via ctest, filtered by label -LE exclusion.
| Runner | Command |
|---|---|
| No hardware | ctest --test-dir tests/ut/cpp/build -LE requires_hardware |
| a2a3 | ctest --test-dir tests/ut/cpp/build -L "^requires_hardware(_a2a3)?$" |
| a5 | ctest --test-dir tests/ut/cpp/build -L "^requires_hardware(_a5)?$" |
Small, fast examples that run on both simulation and real hardware. Organized as examples/{arch}/{runtime}/{name}/. Discovered and executed by ci.py (legacy golden.py format) or pytest (@scene_test format).
Large-scale, feature-rich hardware tests. Too slow or using instructions unsupported by the simulator. Organized as tests/st/{arch}/{runtime}/{name}/. Platform compatibility is declared per test via @scene_test(platforms=[...]).
Both examples/ and tests/st/ cases follow the same layout:
{name}/
golden.py # generate_inputs() + compute_golden()
kernels/
kernel_config.py # KERNELS, ORCHESTRATION, RUNTIME_CONFIG
orchestration/*.cpp
aic/*.cpp # optional
aiv/*.cpp # optional
A legacy case is discoverable by ci.py when both golden.py and kernels/kernel_config.py exist. @scene_test cases are discovered by pytest via test_*.py files.
A single --platform flag controls hardware/non-hardware splitting across all three categories.
@pytest.mark.requires_hardware # any hardware
class TestRuntimeBuilderIntegration:
...
@pytest.mark.requires_hardware("a2a3") # a2a3 specifically
class TestA2A3Feature:
...Selection:
# No hardware (no-hw tests run, requires_hardware tests skip)
pytest tests/ut
# Hardware (no-hw tests skip, hw + platform-specific tests run)
pytest tests/ut --platform a2a3# any hardware
set_tests_properties(test_runtime_integration PROPERTIES LABELS "requires_hardware")
# a2a3-specific
set_tests_properties(test_a2a3_feature PROPERTIES LABELS "requires_hardware_a2a3")Selection uses -LE (label exclude) on no-hw runner and -L (label include) on device runners:
ctest -LE requires_hardware # no-hardware runner: only unlabeled
ctest -L "^requires_hardware(_a2a3)?$" # a2a3 runner: hw + a2a3-specific
ctest -L "^requires_hardware(_a5)?$" # a5 runner: hw + a5-specific@scene_test(level=2, platforms=["a2a3sim", "a2a3"], runtime="tensormap_and_ringbuffer")
class TestVectorExample(SceneTestCase):
...--platform |
Behavior |
|---|---|
a2a3sim |
Run if "a2a3sim" in platforms |
a2a3 |
Run if "a2a3" in platforms |
| (none) | Auto-parametrize over all *sim entries in platforms |
No --platform means "run all sims" — tests with no sim in their platforms list are skipped. No additional markers are used.
Single source of truth for platform, runtime, and test case discovery. Used by tests/conftest.py (via import) and available as a CLI for scripting.
from test_catalog import (
discover_platforms, # -> ["a2a3", "a2a3sim", "a5", "a5sim"]
discover_runtimes_for_arch, # -> ["host_build_graph", "aicpu_build_graph", ...]
discover_test_cases, # -> [TestCase(name, dir, arch, runtime, source), ...]
arch_from_platform, # "a2a3sim" -> "a2a3"
)python tools/test_catalog.py platforms
python tools/test_catalog.py runtimes --arch a2a3
python tools/test_catalog.py cases --platform a2a3sim --source example
python tools/test_catalog.py cases --platform a2a3 --source st --format jsonci.py handles scene test execution for golden.py-based tests (examples + st). New tests should use @scene_test and run via pytest. ci.py is retained for backward compatibility during the migration.
- ChipWorker reuse: Tasks sharing the same runtime reuse a single ChipWorker within their subprocess, avoiding repeated device init/teardown.
- Subprocess isolation: Different runtimes run in separate subprocesses (the host
.socannot be unloaded within a single process). - Device queue: Hardware tasks are distributed across devices specified by
-d. Workers pop tasks from a shared queue via threads. - Retry: Failed tasks are retried up to 3 times. Hardware workers quarantine a device after a failure.
- PTO-ISA pinning:
-c <commit>pins the PTO-ISA dependency. On first failure, re-runs failed tasks with the pinned commit. - Watchdog:
-t <seconds>sets a timeout. The entire run is aborted if it exceeds the limit. - Summary table: After all tasks complete, a formatted results table is printed with pass/fail status, timing, device, and attempt count.
# All sim platforms (no -p: auto-discovers a2a3sim, a5sim, etc.)
python ci.py -t 600
# Single sim platform
python ci.py -p a2a3sim -c 6622890 -t 600
# Hardware with device range
python ci.py -p a2a3 -d 4-7 -c 6622890 -t 600
# Filter by runtime
python ci.py -p a2a3sim -r tensormap_and_ringbuffer- macOS libomp collision: on macOS,
ci.pysetsKMP_DUPLICATE_LIB_OK=TRUEat the top of the file to work around a duplicate-libomp abort triggered by homebrew numpy and pip torch coexisting in one process. Do not reorder the imports or remove this workaround without reading macos-libomp-collision.md first.
ci.py scans two directories:
examples/— included for both sim and onboard platforms.tests/st/— included only for onboard platforms (non-sim).
For each directory, it walks subdirectories looking for kernels/kernel_config.py + golden.py. The arch and runtime are extracted from the path: {root}/{arch}/{runtime}/{case_name}/.
1. Parse arguments (-p, -d, -r, -c, -t)
2. If no -p: auto-discover all sim platforms and run each
3. For each platform:
a. Discover tasks from examples/ and tests/st/
b. Run tasks (subprocess per runtime group for sim, device queue for hw)
└── On failure + -c flag: pin PTO-ISA, retry failed tasks
4. Print combined summary table
5. Exit 0 if all passed, 1 otherwise