LOOM trace and performance output provides cycle-level visibility into mapped accelerator execution for both standalone and gem5-backed runs.
Trace and performance data should support:
- per-node execution visibility
- route and resource activity debugging
- cycle-level regression comparison
- post-run summary statistics for performance analysis
- visualization playback and highlighting
Trace events conceptually include:
- cycle
- hardware node id
- event kind
- event-specific payload fields
A minimal conceptual event family includes:
- node fire
- input stall
- output stall
- route use
- config write
- invocation start
- invocation done
- runtime error
Trace must be rich enough to connect activity back to mapped routes where useful. This is especially important for visualization playback in mapping-on mode.
Performance summaries should cover at least:
- active cycles
- input stall cycles
- output stall cycles
- transferred token counts
- configuration write counts where relevant
Derived metrics such as utilization ratios may be computed from these fields.
The planned trace-related artifact family includes:
.tracefor detailed event streams.statfor summary statistics
The concrete encoding may evolve, but the semantic field set should remain stable enough for validation and visualization.
Trace ordering must be deterministic under deterministic execution settings.
At minimum:
- events are ordered by cycle
- same-cycle event ordering must be stable
- invocation start precedes its execution events
- invocation completion follows the final functional activity of that invocation
Visualization playback consumes trace semantics but does not redefine them. If trace sampling or filtering is used, the resulting playback limits must be documented explicitly.