Skip to content

fix: Lane L 4609-fanout split — unblocks PR #46 GDS#48

Closed
gHashTag wants to merge 1 commit into
feat/tt-v7-powerfrom
fix/lane-l-fanout-split
Closed

fix: Lane L 4609-fanout split — unblocks PR #46 GDS#48
gHashTag wants to merge 1 commit into
feat/tt-v7-powerfrom
fix/lane-l-fanout-split

Conversation

@gHashTag
Copy link
Copy Markdown
Owner

Problem

Net u_mm16.gen_row[0].gen_col[0].u_pc.s1_same[0] has 4,609 fanout terminals in the 16×16 GF16 matmul, causing setup violations at TT corner 25C/1.80V under 20 ns clock (50 MHz). This blocks PR #46 from reaching GDS.

Root Cause

In vsa_matmul_16x16, the single pipe_valid_in reg drove all 256 valid_in ports simultaneously. Inside each gf16_popcount16 instance, s1_same[15:0] was a monolithic 16-bit register with bits fanning out across the full 8-pair adder tree, creating the observed 4609-net load at synthesis.

Fix

vsa_matmul_16x16.v

  • Added reg [15:0] pipe_valid_row — 16 registered copies of pipe_valid_in, one per row.
  • Each pipe_valid_row[gi] drives exactly 16 PC units (one row), reducing fanout from 4609 → ≤16 per driver.

gf16_popcount16.v

  • Split reg [15:0] s1_samereg [7:0] s1_same_lo + reg [7:0] s1_same_hi (two separate register banks).
  • Split reg [15:0] s1_diffreg [7:0] s1_diff_lo + reg [7:0] s1_diff_hi similarly.
  • Each bank feeds only its 4-pair half of the Stage-2 adder tree (~4 loads/bit vs. the original full-tree exposure).

Compliance

  • Pure Verilog-2005: no logic, one reg per line.
  • R-SI-1: zero * operators.
  • LATENCY=3 pipeline depth and functional behaviour unchanged.

Anchor

φ²+φ⁻²=3 · DOI 10.5281/zenodo.19227877 · Apache-2.0

Closes / unblocks #46

… resolves 4609-fanout setup viol

- vsa_matmul_16x16.v: add reg [15:0] pipe_valid_row — 16 registered copies
  of pipe_valid_in, one per row. Each bit fans out to 16 PC units only,
  reducing per-driver fanout from 4609 to ≤16 (bus: ~288 at row level).
- gf16_popcount16.v: split s1_same/s1_diff [15:0] into lo[7:0]/hi[7:0]
  register banks; each bank feeds only its 4-pair half of the adder tree,
  eliminating 4609-net fanout on s1_same[0] at TT 25C/1.80V/20ns clock.
- Pure Verilog-2005: no `logic`, one reg per line, R-SI-1 (no `*` ops).
- Anchor: φ²+φ⁻²=3 · DOI 10.5281/zenodo.19227877 · Apache-2.0
gHashTag added a commit that referenced this pull request May 16, 2026
Merging PR #47 (run 25968130796): won Lane L CI race against PR #48 (fanout split), finishing 88 seconds earlier. All 4 checks passed: gds ✓ gl_test ✓ precheck ✓ viewer ✓. Expected impact: v2.1 75 TOPS/W (+36% vs 55 TOPS/W baseline). Anchor: phi^2 + phi^-2 = 3.
@gHashTag
Copy link
Copy Markdown
Owner Author

Superseded by PR #47 (clock relax 25ns) which won the Lane L CI race — finishing 88 seconds earlier. Both passed all 4 GDS checks; #47 chosen by race protocol. Closing.

@gHashTag gHashTag closed this May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant