From d1825cbb4c2716c35aeb2f0861b49d3708805ab8 Mon Sep 17 00:00:00 2001 From: Paolo Rondot Date: Wed, 4 Mar 2026 16:03:54 +0100 Subject: [PATCH 1/4] Update the always-ff description to be more clear about what the difference between Blocking VS Non-blocking assignment is. (co-authored with Claude) --- src/lessons/sv/always-ff/description.html | 139 +++++++++++++++------- 1 file changed, 96 insertions(+), 43 deletions(-) diff --git a/src/lessons/sv/always-ff/description.html b/src/lessons/sv/always-ff/description.html index fd8a251..dcf4671 100644 --- a/src/lessons/sv/always-ff/description.html +++ b/src/lessons/sv/always-ff/description.html @@ -1,17 +1,38 @@ + + + + + + + +

always @(event) begin ... end is a block that runs every time a specified event fires. If the event is a clock edge (e.g. posedge clk) we typically use always_ff instead, where "ff" stands for "flip-flop".

- - - clk + + + clk - - - posedge - - posedge + + + posedge + + posedge

A flip-flop is a 1-bit memory element that captures its input (d) at a clock edge and holds it until the next edge. @@ -24,57 +45,89 @@ end // step 2: both mem and out update simultaneously

- We use non-blocking assignment (<=) inside always_ff. It works in two steps: first, all right-hand sides are sampled using current values; then all left-hand sides update simultaneously. So out always captures the value mem held before this edge — creating a true one-cycle delay, not a zero-delay pass-through. The same rule is why a <= b; b <= a; correctly swaps two flip-flops.

-

-An SRAM is an array of flip-flops — one per bit — indexed by address. + We use non-blocking assignment (<=) inside always_ff. It works in two steps: first, all right-hand sides are sampled using current values; then all left-hand sides update simultaneously. So out always captures the value mem held before this edge — creating a true one-cycle delay, not a zero-delay pass-through. The same rule is why a <= b; b <= a; correctly swaps two flip-flops. +

+ +

Blocking vs. Non-Blocking Assignments

+ +

The names describe how each operator behaves in the flow of your procedural code — whether the assignment blocks (pauses) execution until it completes.

+ +

Blocking = — execution stops and waits. The assignment completes immediately, in place, before the next line runs. Think of it like hand-delivering a letter: the recipient has it before you walk away.

+ +
+a = b;  // a gets b's value RIGHT NOW
+c = a;  // c sees the new value of a
+
+ +

Non-blocking <= — execution continues without waiting. The assignment schedules a write for later and immediately moves on. Think of it like dropping a letter in a mailbox: you keep walking and it gets delivered later, when the NBA update region runs.

+ +
+a <= b;  // schedules a write to a, but doesn't apply it yet
+c <= a;  // c gets a's OLD value — the write above hasn't happened yet
+
+ +

All right-hand sides are evaluated first, then all writes happen together at the end of the time step. This is what makes always_ff correctly model real hardware, where all flip-flops in a clocked stage sample their inputs and update their outputs simultaneously.

+ + + + + + + + + + +
ContextUse <= (non-blocking)?Use = (blocking)?
always_ff / clocked blocks✅ Preferred⚠️ Avoid
Tasks & functions⚠️ Only for static signals✅ Correct choice
Automatic task output ports❌ Forbidden✅ Required
+ +

An SRAM is an array of flip-flops — one per bit — indexed by address. We'll need a slightly more advanced pattern to model that array. We also use a port we (write enable) to control when writes happen, and a separate port rdata for the read result.

- - - - - + + + + + - - - + + + - WRITE - READ REQ - READ RESULT + WRITE + READ REQ + READ RESULT - + - clk - + clk + - we - + we + - addr - - addr = 2 + addr + + addr = 2 - wdata - - 0x42 - + wdata + + 0x42 + - rdata - - - 0x42 + rdata + + + 0x42 - - 1-cycle read latency + + 1-cycle read latency

In sram_core.sv fill in the always_ff body with two statements:

The read is registered: drive addr on cycle N and rdata reflects that address on cycle N+1. This is the standard synchronous-read SRAM model.

Testbench

From be3f41a8c7fde4172240e015910dfe3e0b8f8388 Mon Sep 17 00:00:00 2001 From: Paolo Rondot Date: Wed, 4 Mar 2026 16:05:56 +0100 Subject: [PATCH 2/4] - Update description of tasks-functions to give more details about autonmatic vs non-automatic and make an analogy with C language - Add a notice to make sure to not write or read directly on the clock edge from the testbench --- src/lessons/sv/tasks-functions/description.html | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/lessons/sv/tasks-functions/description.html b/src/lessons/sv/tasks-functions/description.html index d281958..85618e8 100644 --- a/src/lessons/sv/tasks-functions/description.html +++ b/src/lessons/sv/tasks-functions/description.html @@ -8,8 +8,8 @@ // task body endtask

-Here automatic means the task is re-entrant: it can be called recursively or from multiple places without interference. -Non-automatic tasks share state across calls. +Here automatic means the task is re-entrant: it can be called recursively or from multiple places without interference — just like a normal C/C++ function whose local variables live on the stack, with a fresh copy created for each call. +A non-automatic (static) task behaves like a C function where every local variable is declared static: all calls share the same memory, so concurrent calls will overwrite each other's state.

Calling a task write_word(addr, data) is blocking. @@ -21,6 +21,9 @@

  • write_word(vif, addr, data) — a task that drives one write transaction: assert we, set addr and wdata, wait one clock edge, then de-assert we.
  • read_word(vif, addr, data) — a task that drives one read transaction: set addr, wait one clock edge, then capture rdata.
  • +

    + Waiting for @(posedge clk) is not enough on its own. The testbench and the DUT are both sensitive to the same edge, so driving or sampling signals at the edge puts you in a race against the simulator's scheduler. The safe pattern is to wait for the edge and then advance a small delta — @(posedge clk); #1; — so that your assignments land in a quiet moment after the DUT has already reacted to the clock. +

    These are the exact helper routines a UVM driver uses internally. In Part 3 the driver wraps them in a class method that pulls transactions from a sequencer — but the core protocol logic is the same.

    Testbench structure

    The initial block calls write_word and read_word using the shared mem_if virtual interface, then checks the parity of the returned data with parity_check.

    From e55e4813e15be3898af3831673a3459ef726a26b Mon Sep 17 00:00:00 2001 From: Paolo Rondot Date: Wed, 4 Mar 2026 16:06:49 +0100 Subject: [PATCH 3/4] - Fix the bug that threw an error when trying to open the waves tab - Allow signal names with "." --- src/runtime/circt-adapter.js | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/src/runtime/circt-adapter.js b/src/runtime/circt-adapter.js index 9611c11..ef930d8 100644 --- a/src/runtime/circt-adapter.js +++ b/src/runtime/circt-adapter.js @@ -170,11 +170,21 @@ function needsUvmLibrary(files) { function removeInlinedPortsFromVcd(vcd) { if (typeof vcd !== 'string') return vcd; const lines = vcd.split('\n'); - const skipIds = new Set(); + const topLevelIds = new Set(); // ids used by non-dotted (top-level) signals + const dottedIds = new Map(); // id → true for dotted signal names for (const line of lines) { const m = line.match(/\$var\s+\S+\s+\d+\s+(\S+)\s+(\S+)(?:\s+\[\S+\])?\s+\$end/); - if (m && m[2].includes('.')) skipIds.add(m[1]); + if (!m) continue; + if (m[2].includes('.')) dottedIds.set(m[1], true); + else topLevelIds.add(m[1]); + } + // Only skip dotted signals whose VCD id is already claimed by a top-level + // signal (i.e. inlined port duplicates). Keep interface member signals that + // have their own unique id. + const skipIds = new Set(); + for (const id of dottedIds.keys()) { + if (topLevelIds.has(id)) skipIds.add(id); } if (skipIds.size === 0) return vcd; @@ -243,8 +253,9 @@ function fixLlhdVcdEncoding(vcd) { for (let i = 0; i < svWidth; i++) { decoded += flagBits[i] === '1' ? 'x' : valBits[i]; } - // 1-bit results use the compact scalar form. - return (svWidth === 1 ? decoded : 'b' + decoded) + ' ' + id; + // 1-bit results use the compact scalar form (no space before id). + if (svWidth === 1) return decoded + id; + return 'b' + decoded + ' ' + id; } } From 0e0c47178b0d02e273db27cf6f74e2040c42a187 Mon Sep 17 00:00:00 2001 From: Paolo Rondot Date: Wed, 4 Mar 2026 17:00:57 +0100 Subject: [PATCH 4/4] Update documentation --- scripts/toolchain.lock.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/toolchain.lock.sh b/scripts/toolchain.lock.sh index a9057a2..94ac0bd 100755 --- a/scripts/toolchain.lock.sh +++ b/scripts/toolchain.lock.sh @@ -11,4 +11,4 @@ readonly CIRCT_REF_LOCKED="3003e9a7d0af8fe09105fa89b3584bd1e2eb7410" readonly CIRCT_LLVM_SUBMODULE_REF_LOCKED="aa3d6b37c7945bfb4c261dd994689de2a2de25bf" readonly SURFER_ARTIFACT_URL_LOCKED="https://gitlab.com/surfer-project/surfer/-/jobs/artifacts/main/download?job=pages_build" -readonly SURFER_ARTIFACT_SHA256_LOCKED="2a684122436e7a7729cc4e57062fdc2ce8ec5fa096d84ca383dd59011012b873" +readonly SURFER_ARTIFACT_SHA256_LOCKED="abf8d4c3415d445bf86edb39dda9ec9f37d20ccddf4069ec925acb608dcb661b"