[RFC] Testbench architecture: expressiveness vs. compilation model

## Context

In a real NoC design project (8×8 mesh, 64 nodes), we encountered fundamental limitations with the current `@testbench` approach when trying to verify complex dynamic behaviors like backpressure handling, multi-round traffic injection, and protocol-level assertions.

The current `@testbench` API (`t.drive()`, `t.expect()`, `t.drive_when()`) works well for simple smoke tests but struggles with scenarios that require:
- **Reactive stimulus**: deciding what to send based on DUT output (e.g. credit-based flow control)
- **Complex state machines**: multi-phase test sequences with branching logic
- **Randomized/constrained verification**: randomized traffic patterns with coverage tracking
- **Reusable verification components**: scoreboards, monitors, protocol checkers

This issue discusses possible architectural directions.

---

## Option A: Full pyCircuit TB — extend MLIR to support dynamic behavior

Extend the `@testbench` DSL so it can express arbitrary Python-like control flow, which gets compiled via MLIR into C++/SystemVerilog testbenches.

**Pros:**
- Single-language design: everything stays in Python/pyCircuit
- Deterministic: compiled TB has no runtime overhead
- Portable: same TB works with C++ sim and Verilog sim

**Cons:**
- Requires MLIR to support a much richer subset of Python (loops with dynamic bounds, conditional branching on DUT outputs, data structures, etc.)
- Significant compiler complexity — essentially building a Python→C++ transpiler within the MLIR framework
- May never match the expressiveness of native C++/Python

**Variant A2: Python-in-the-loop simulation**

Instead of compiling Python to C++, keep the testbench in Python and call the C++ or Verilog model via FFI (similar to cocotb/PyVerilator).

- **Pros:** Full Python expressiveness, rapid iteration
- **Cons:** Performance overhead from Python↔C++ boundary crossing on every cycle; may be acceptable for small designs but problematic for large ones (our 64-node NoC runs ~10M cycles)

---

## Option B: Hybrid — pyCircuit DUT + external hand-written TB

pyCircuit focuses on DUT compilation and provides optional lightweight TB for smoke tests. For serious verification, users write their own C++ or SV testbench against the generated model.

This requires:
1. **`build` command without mandatory `@testbench`** (see PR #39 discussion / planned PR)
2. A well-documented C++ model API so external TBs can instantiate and drive the DUT
3. Optionally, pyCircuit generates a TB *skeleton* (clock/reset boilerplate, port declarations) that users extend

**Pros:**
- No compiler complexity — leverage existing C++/SV verification ecosystem
- Full expressiveness: users can use UVM, cocotb, or any framework
- Natural for agent-assisted workflows: an AI agent can generate C++ TBs using the documented model API

**Cons:**
- Two-language workflow (Python for design, C++ for verification)
- Users need to understand the generated C++ model API

---

## Option C: TB skeleton generation + plugin points

A middle ground: pyCircuit generates a complete TB harness (clock, reset, port bindings, VCD tracing) but provides explicit **plugin points** where users inject custom stimulus/checking logic.

```python
@testbench
def tb(t: Tb) -> None:
    t.clock("clk")
    t.reset("rst", active_low=True)
    
    # Built-in: simple drives/expects for smoke test
    t.drive("start", 1, at=0)
    t.expect("done", 1, at=100)
    
    # Plugin: user-provided C++ function called every cycle
    t.on_cycle("my_stimulus.cpp::inject_traffic")
    
    # Plugin: user-provided checker
    t.on_cycle("my_checker.cpp::check_output")
```

The generated TB calls user-supplied C++ functions at the appropriate points, with access to the DUT port struct.

**Pros:**
- Best of both worlds: pyCircuit handles boilerplate, user handles complex logic
- Progressive complexity: start with `t.drive()`/`t.expect()`, upgrade to plugins when needed
- Single build flow

**Cons:**
- API design complexity (what context to pass to plugins, how to handle state)
- Still requires C++ for complex verification

---

## Our Experience

In practice, we ended up with **Option B** organically:
1. Used `@testbench` + `drive_when` for initial smoke tests
2. Wrote a separate C++ testbench for multi-round all-to-all traffic verification
3. Used Verilator with a hand-written SV testbench for final validation

The main pain point was that `build` currently **requires** `@testbench`, making it impossible to compile just the DUT without a dummy testbench function.

## Questions for Discussion

1. What is the long-term vision for `@testbench` — is it intended to grow into a full verification language, or remain a lightweight smoke-test tool?
2. Is there interest in supporting `build` without `@testbench` (DUT-only compilation)?
3. Would a plugin/hook mechanism (Option C) be worth exploring?
4. For Python-in-the-loop (Option A2), are there known performance benchmarks for the FFI overhead?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Testbench architecture: expressiveness vs. compilation model #42

Context

Option A: Full pyCircuit TB — extend MLIR to support dynamic behavior

Option B: Hybrid — pyCircuit DUT + external hand-written TB

Option C: TB skeleton generation + plugin points

Our Experience

Questions for Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Testbench architecture: expressiveness vs. compilation model #42

Description

Context

Option A: Full pyCircuit TB — extend MLIR to support dynamic behavior

Option B: Hybrid — pyCircuit DUT + external hand-written TB

Option C: TB skeleton generation + plugin points

Our Experience

Questions for Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions