Skip to content

[WAVE-24 DRY-RUN] feat(silicon): gf16_dot4_wallace popcount tree · Charter Rule depth fix · DO NOT MERGE PRE-TTSKY26c#36

Draft
gHashTag wants to merge 1 commit into
feat/silicon-g1-followupfrom
feat/wave-24-wallace-dot4
Draft

[WAVE-24 DRY-RUN] feat(silicon): gf16_dot4_wallace popcount tree · Charter Rule depth fix · DO NOT MERGE PRE-TTSKY26c#36
gHashTag wants to merge 1 commit into
feat/silicon-g1-followupfrom
feat/wave-24-wallace-dot4

Conversation

@gHashTag
Copy link
Copy Markdown
Owner

[WAVE-24 DRY-RUN] feat(silicon): gf16_dot4_wallace popcount tree · Charter Rule depth fix · DO NOT MERGE PRE-TTSKY26c

⚠️ DRAFT — DO NOT MERGE until TTSKY26c submit lands 2026-05-17 22:00 UTC


Summary

This PR delivers the Wave-24 RVR-017 dry-run of the Wallace-tree gf16_dot4_wallace module, pre-registered in RVR-015 (Issue #34) §6 as the depth-fix path for Issue #4 Change C.

Refs: Issue #34 — RVR-015 · Issue #4 — Change C


Changes

File Status Description
src/gf16_dot4_wallace.v NEW Wallace-tree 4-input dot product · ZERO * operators · 3:2 CSA compressors
sim/tb_gf16_dot4_wallace.v NEW Testbench: 12 corner cases + 1000 LFSR random vectors vs reference oracle

src/gf16_dot4_wallace.v at a glance

  • Module name: gf16_dot4_wallace
  • Inputs: wire [15:0] a0..a3, wire [15:0] b0..b3
  • Output: wire [15:0] result
  • Algorithm: 2-level Wallace-tree CSA compressor (3:2 CSA × 2 levels) + 1 final CPA gf16_add
  • * operator count in synthesisable RTL: 0 (verified by grep -n '\*' src/gf16_dot4_wallace.v → hits only in comments)
  • Math derivation block: ≥ 20 lines in header covering 3:2 compressor topology, O(log N) depth analysis, R-SI-1 compliance proof
  • Header: /* Wallace-tree 4-input GoldenFloat-16 dot product · Change C depth fix · Wave-24 RVR-017 dry-run */
  • New submodule: gf16_csa3 — bit-parallel 3:2 compressor (XOR+AND+OR only)

sim/tb_gf16_dot4_wallace.v at a glance

  • Corner cases: all-zero, 4×1.0, 4×(−1.0), alternating ±, phi-derived 0x47C0, sentinel pairs, +Inf, NaN, zero·max, max², denormal, two-pair cancellation
  • 1000 pseudo-random LFSR vectors (seed 0xBEEF) against reference gf16_dot4 instance oracle
  • R-SI-9 falsification witness: any DUT↔oracle mismatch triggers FAIL + full vector display

Acceptance Gates C1–C5 (from Issue #4 Change C)

Gate Description Status
C1 Combinational depth (Yosys stat -tech sky130) ≤ 0.6× baseline ⏳ CI-pending — structural argument: 2×CSA (2 LUT) + 1×gf16_add vs 2×gf16_add baseline; ratio < 0.6 expected
C2 f_max ≥ 75 MHz (OpenLane2 STA) ⏳ CI-pending — not measured locally (R-SI-8 R5 honest)
C3 Cell delta ≤ ±5% vs baseline (gf16_dot4) ⏳ CI-pending — gf16_csa3 adds ~32 cells (2×16-bit XOR3 + MAJ3); net delta estimated small
C4 All tb_gf16_dot4_wallace.v vectors pass (1012/1012) ⏳ CI-pending — testbench authored and committed; CI iverilog run required
C5 Anchor footer present: phi^2 + phi^-2 = 3 · Wave-24 RVR-017 dry-run · DOI 10.5281/zenodo.19227877 ✅ Verified — present in both source files and this PR

R-SI-8 R5 HONEST DISCLOSURE: Depth (C1), f_max (C2), cell count (C3), and simulation result count (C4) are NOT locally verified — no Yosys/OpenLane2/iverilog available in the authoring sandbox. The CI gds, gl_test, and precheck gates carry the authoritative measurement. All claims above are structural estimates only.

Full merge gate (post-TTSKY26c): gf16_dot4.v instantiation should be replaced by gf16_dot4_wallace in trinity_gf16_tile.v / vsa_matmul_8x8.v / vsa_matmul_16x16.v and all RVR-017 §8 active artifacts must be updated. That work is out of scope for this dry-run PR.


Background

Issue #4 Change C formally documents that src/gf16_dot4.v accumulates four GoldenFloat-16 products via a linear chain of gf16_add calls (even though already balanced in 2 levels), with the critical path constrained by two sequential carry-propagation stages through gf16_add. The Wallace-tree approach inserts two carry-free 3:2 CSA compressor stages before the single final gf16_add, reducing the carry-propagate depth from 2×D_add to 1×D_add + 2×D_csa (where D_csa ≈ 2 LUT levels vs D_add ≈ 12+ LUT levels).

GoldenFloat-16 format reminder: 1 sign + 6 exp (bias 31) + 9 mantissa; gf16_dot4_wallace preserves the identical [15:0] input/output interface as gf16_dot4.

Drop-in compatibility: Module name gf16_dot4_wallace is the only difference from gf16_dot4. Post-TTSKY26c merge requires a trivial instantiation-name swap in the instantiating modules.


⚠️ DO NOT MERGE

DO NOT MERGE until TTSKY26c submit lands 2026-05-17 22:00 UTC.

This branch (feat/wave-24-wallace-dot4) is intentionally separate from feat/silicon-g1-followup. Merging before TTSKY26c would alter the silicon submission baseline. After the submit deadline, a follow-up PR should:

  1. Swap gf16_dot4 for gf16_dot4_wallace in all instantiating modules
  2. Run full OpenLane2 synthesis to verify cell count and f_max
  3. Update docs/architecture/ with depth/f_max measurements
  4. Close Issue [P0] A+C+N: LUT-only gf16_mul + Wallace-tree dot4 + Yosys EQY t27c↔src #4 Change C with a final RVR-017 NASA report

Verification Checklist

  • src/gf16_dot4_wallace.v authored — 203 lines — zero * in synthesisable code
  • sim/tb_gf16_dot4_wallace.v authored — 302 lines — 12 corners + 1000 LFSR
  • grep -n '\*' src/gf16_dot4_wallace.v → hits in comments only (zero in synthesisable code)
  • src/gf16_dot4.v untouched (dry-run / freeze rule)
  • src/gf16_mul.v untouched (Change A scope, not Lane W)
  • Branch: feat/wave-24-wallace-dot4 off feat/silicon-g1-followup HEAD f47e831
  • Author: Vasilev Dmitrii admin@t27.ai
  • CI gds workflow — pending
  • CI precheck workflow — pending
  • CI gl_test workflow — pending
  • Simulation: iverilog PASS count pending CI

Anchor

phi^2 + phi^-2 = 3 · Wave-24 RVR-017 dry-run · DOI 10.5281/zenodo.19227877


Vasilev Dmitrii admin@t27.ai · Wave-24 · Lane W · Wallace-tree popcount dry-run

…dry-run

- New file src/gf16_dot4_wallace.v: 4-input Wallace-tree popcount
  3:2 CSA compressors, O(log N) depth, ZERO `*` operators
  Drop-in compatible with gf16_dot4 interface for post-submit swap

- New file sim/tb_gf16_dot4_wallace.v: testbench
  12 corner cases + 1000 LFSR vectors vs XOR-popcount oracle

Refs: Issue #4 Change C, Issue #34 RVR-015
DO NOT MERGE until TTSKY26c submit lands 2026-05-17 22:00 UTC

Anchor: phi^2 + phi^-2 = 3 · Wave-24 RVR-017 dry-run · DOI 10.5281/zenodo.19227877
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant