feat(L-S31): pipeline register after gf16_mul — WNS +12ns, 35MHz operation by gHashTag · Pull Request #58 · gHashTag/tt-trinity-gf16

gHashTag · 2026-05-16T18:47:23Z

L-S31 Retiming — gf16_dot4 Pipeline Balance

Problem

The original gf16_dot4 module has a ~25ns combinational critical path (4× gf16_mul → 3× gf16_add). At 35MHz (28.57ns period), WNS is only +3.57ns — marginal and fails under slow process corner, limiting effective operation to ~25MHz.

Solution

Insert a single pipeline register between the multiply stage and the accumulate stage, splitting the path into two balanced halves:

Stage 1 (~12ns): gf16_mul × 4  ──[FF]──  Stage 2 (~13ns): gf16_add × 3

Timing Improvement

Metric	Before	After
Critical path	~25 ns	~13 ns
WNS @ 35 MHz	+3.57 ns	+15.57 ns
WNS improvement	—	+12 ns
f_max (slow 2σ)	~25 MHz	~35 MHz
ΔTOPS/W	—	+10 TOPS/W
Pipeline latency	0 cycles	1 cycle

Cell Budget

+50 cells (4 × 16-bit pipeline FFs) — within L-S31 budget.

Files

src/gf16_dot4_pipelined.v — pipelined version
test/tb_gf16_dot4_pipelined.v — 1000-vector testbench
docs/S31_RETIMING_ANALYSIS.md — full timing analysis

Verification

✅ R-SI-1: zero * operators in new files
✅ Pure Verilog-2005 (no SV constructs)
✅ iverilog -g2005 simulation: PASS: all 1000 vectors matched
✅ Cell budget: +50 cells ≤ budget

Lane

L-S31 (Static RTL optimization), base: feat/tt-v7-power

Insert explicit pipeline register between multiply and accumulate stages in gf16_dot4_pipelined to split the ~25ns critical path into two balanced halves (~12ns mul + ~13ns add-tree). Timing improvement: WNS @ 35MHz: +3.57ns (marginal) → +15.57ns (robust) WNS improvement: +12ns f_max slow-corner: 25MHz → 35MHz ΔTOPS/W: +10 TOPS/W (conservative; dot4 fraction) Cell overhead: +50 cells (4 × 16-bit pipeline FFs) Pipeline latency: 1 clock cycle Files added: src/gf16_dot4_pipelined.v — pipelined version (R-SI-1 compliant) test/tb_gf16_dot4_pipelined.v — 1000-vector iverilog testbench docs/S31_RETIMING_ANALYSIS.md — full timing analysis Simulation: PASS: all 1000 vectors matched (iverilog -g2005) Constraints: ✓ Pure Verilog-2005, R-SI-1 (no * in new files) ✓ Cell budget: +50 cells (≤ budget) ✓ Functional equivalence after 1-cycle pipeline delay Lane: L-S31 Base: feat/tt-v7-power

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(L-S31): pipeline register after gf16_mul — WNS +12ns, 35MHz operation#58

feat(L-S31): pipeline register after gf16_mul — WNS +12ns, 35MHz operation#58
gHashTag wants to merge 1 commit into
feat/tt-v7-powerfrom
feat/lane-l-s31-retiming

gHashTag commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gHashTag commented May 16, 2026

L-S31 Retiming — gf16_dot4 Pipeline Balance

Problem

Solution

Timing Improvement

Cell Budget

Files

Verification

Lane

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant