This document defines the end-to-end validation contract for LOOM, including standalone simulation checks, gem5-backed checks, and regression expectations.
A conforming LOOM validation flow should be able to demonstrate:
- successful graph lowering and mapping
- successful configuration generation and loading
- successful accelerator execution
- correct output or memory side effects
- consistent trace and performance data when enabled
The standalone simulator must support:
- synthetic or generated inputs
- external-memory prefill
- CPU-reference comparison
- trace and statistics generation
The gem5-backed flow must support:
- baremetal host execution
- MMIO-based configuration and launch
- DMA-based memory interaction
- result checking by host code and/or offline inspection
- replay-backed accelerator execution that emits trace and stat artifacts
The LOOM target validation matrix includes at least:
- full lowering from source to DFG artifacts
- ADG generation
- mapping report generation
- standalone simulation with trace and stat output The standalone flow is also expected to emit a machine-readable result artifact with actual outputs and memory snapshots.
- correctness on a representative vecadd-style memory-writing kernel
- gem5 end-to-end host-plus-accelerator execution
Current smoke anchor:
sum-array.mesh-6x6-extmem-1 - visualization rendering with mapping and, when available, trace playback
- focused unit tests for temporal hardware, tagged memory topologies, and decomposable switches, plus FU-body op coverage and exported ADG parseability
- container-config decoding checks for
spatial_peandtemporal_peslices, including mux or demux fields and selected internal FU config bits - temporal register encoding checks when
num_register > 0, includingresult -> regandreg -> operandcases - expected-fail temporal tests for incompatible
function_unitconfiguration reuse - temporal operand-buffer hardware-parameter checks for
enable_share_operand_buffer,operand_buffer_size,num_instruction,num_register, andreg_fifo_depth - tagged
spatial_swpositive and negative tests, including the rule that tagged spatial switches cannot be decomposable, and the hard limit of at most32input ports and32output ports temporal_swstructural validation, including: all ports tagged and same type, positivenum_route_table, validconnectivity_tablerow dimensions, and the hard limit of at most32input ports and32output ports- focused tagged-path tests that distinguish: source tags that remain distinct after width adaptation, and source tags that collapse to one observed hardware tag and must be rejected even when the conflicting shared resource only appears after expanding a memory-bridge suffix or prefix
- focused tagged memory and extmemory ingress tests in which tagged
spatial_swperforms tag-agnostic request merging while egress usestemporal_swfor tag-dependent response separation - application-level tagged memory regressions in which a frontend-generated
DFG reaches one shared
fabric.extmemorythrough tagged route-stage boundaries and still completes mapping, visualization, and standalone simulation - focused direct-boundary tests for both
fabric.memoryandfabric.extmemory, where the shared memory bridge terminates at already tagged route-stage ports on the compute side without requiring an explicitfabric.add_tagorfabric.del_tagat that boundary - focused tagged-memory egress tests that combine
fabric.map_tag, width-adapting taggedfabric.spatial_sw, and onefabric.temporal_swsplit, including negative cases where distinct source tags collapse to one observed tag before the temporal split - structural validation of Fabric definition and instantiation placement,
including:
fabric.function_unitvisibility and instantiation hosts, inline-onlyfabric.muxand tag-boundary ops, module-level component inline placement restricted tofabric.module, lexical same-host duplicate-definition rejection across operation kinds, rejection offabric.instancein unsupported hosts or with PE-local SSA operands or results, and rejection of exported Fabric MLIR that uses invalid symbol names - SciComp regression coverage for
SC-FP,SC-SPM, andSC-CTRLcollateral, including multi-lanefabric.extmemorybridges and function-unit bodies that use the currently supportedarith.minimumf,math.floor,math.rsqrt, and other backend-recognized FU ops
Validation must support:
- output-port comparison
- memory-region comparison
- config-slice decoding for temporal register fields
- negative tests that assert mapper failure by exit code and diagnostic text
Memory comparison is mandatory for kernels whose final observable results are written to memory instead of being returned on output ports.
When deterministic settings are enabled:
- mapping results should be reproducible under the same seed and inputs
- standalone simulation results should be reproducible
- trace ordering should remain stable
The intended LOOM validation family covers:
- IR-stage artifact generation
- mapping JSON and text reports
- visualization HTML generation
- trace and stat generation
- host-runtime or gem5 integration success
The current repository-level gem5 smoke entry point is:
ninja -C build check-loom-gem5
That smoke target is expected to exercise:
- source to mapping flow through the normal LOOM e2e case
- runtime-manifest emission
- baremetal host generation and cross-compilation
- gem5 device launch
- replay-generated accelerator trace and stat export
- host-visible pass or fail reporting through
gem5.report.json
This document captures the validation contract and acceptance intent. Implementation batches and project scheduling remain planning concerns and do not need to be part of the normative runtime behavior.