Performance Benchmark: Pure C++ Cycle-Accurate TLM against pyCircuit Simulator

**Description:**
While validating the 64-node NoC layout using pyCircuit simulation (`tb_perf_9round.py`), we developed and validated a cycle-accurate pure C++ Transaction-Level Model (TLM) to measure maximum simulation throughput for behaviorally equivalent network hardware.

This issue tracks the cycle-accurate equivalence and proposes tracking C-Models for exploring Routing & Topology limits where standard Object-Oriented simulation is bottlenecked by object lookup trees.

**Methodology & Golden Trace Verification:**
1. A dump of the 147,456 Multi/Broadcast random injections from the 9-round testbench was generated to freeze the randomness to a constant sequence (`golden_trace.txt`).
2. A 350-line monolithic C++ TLM was implemented natively, honoring identical FIFO depth, Round-Robin logic, Routing computation (unicast, multicast, broadcast splits), and Node logic (replication registers).

**Performance Baseline:**
*Tested against pyCircuit Version: `67774d4ffa57a0dec21676d3de146df2385981c2`*

1. **Native PyCircuit (C++ Wrapper): 6.72 s (~24k print ticks before hard stop)**
```bash
bash scripts/build_tb.sh designs/contest_module/tb_perf_9round.py --run-cpp
```
2. **Verilator (Flattened RTL bits): 0.81 s (25k cycles hard stop)**
```bash
bash scripts/build_tb.sh designs/contest_module/tb_perf_9round.py --run-verilator
```
3. **Pure Behavioral C++ TLM: 0.068 s (9704 true finish cycles!)**
TLM completes 286,441 true local pops natively in <70ms. 
```bash
# 1. Trace Dump
PYTHONPATH=../pyCircuit/compiler/frontend python3 designs/contest_module/dump_trace.py
# 2. Build
g++ -O3 -std=c++11 tests/fair_benchmark/cpp_tlm_noc.cpp -o tests/fair_benchmark/cpp_tlm_noc
# 3. Simulate
./tests/fair_benchmark/cpp_tlm_noc
```

The stark contrast clearly showcases the benefits of a native architecture-focused TLM layer (avoiding explicit wire state replication). The 24,000 "cycle" pycircuit duration is an artifact of the timeout rather than pure network limits, which truly drain around physical tick ~9700!

Can we introduce a dedicated `models/cmodel` interface layer bridging PyCircuit stimulus with external lightweight TLMs for extreme algorithmic exploration?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Benchmark: Pure C++ Cycle-Accurate TLM against pyCircuit Simulator #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Benchmark: Pure C++ Cycle-Accurate TLM against pyCircuit Simulator #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions