diff --git a/Makefile b/Makefile
index 3eca004..89349a1 100644
--- a/Makefile
+++ b/Makefile
@@ -9,7 +9,7 @@ ifeq ($(AGGRESSIVENESS), 0)
 	COVER_FLOAT_FLAGS += --partial-output
 endif
 
-MODELS := B1 B2 B3 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B20 B21 B25 B26 B27 B29
+MODELS := B1 B2 B3 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 B16 B20 B21 B25 B26 B27 B29
 
 .PHONY: build clean sim all $(MODELS)
 
diff --git a/docs/B16.adoc b/docs/B16.adoc
new file mode 100644
index 0000000..a22ec48
--- /dev/null
+++ b/docs/B16.adoc
@@ -0,0 +1,127 @@
+= Floating Point Model Documentation
+:toc:
+:toclevels: 3
+:sectnums:
+
+== B12 Add: Cancellation
+
+Aharoni et al.
+
+=== Description
+
+B16. Multiply-Add: Cancellation
+This model tests every possible value for cancellation.
+For the difference between the exponent of the intermediate result and the
+maximum between the exponents of the addend and the multiplication result,
+test all values in the range:
+ [-(2 * p + 1), 1].
+
+*Number of tests generated:* 1732 (1E3)
+
+*Precisions Supported:* `BF_16`, `FP_16`, `FP_32`, `FP_64`, `FP_128`
+
+*Operations Supported:* fmadd, fmsub, fnmadd, fnmsub
+
+== Definitions (TODO: CHANGE UP DEFINITIONS)
+
+`a`:: Operand 1
+`b`:: Operand 2
+`m`:: Mantissa. Number of bits in mantissa, excluding the hidden 1
+`p`:: Precision. Number of significant bits, including the hidden 1. p = m + 1
+`d`:: The difference between max(a_exp, b_exp) and exponent of the intermediate result. This is the interest of this model, where we test all d's in [-p, +1]
+`k`:: Magnitude of d. Defined as -d in the code since we deal with d = 1 separately
+`a_m`:: Mantissa of `a`
+`b_m`:: Mantissa of `b`
+`a_exp`:: Exponent of operand `a`, generally in `[min_exp, max_exp]`
+`b_exp`:: Exponent of operand `b`, generally in `[min_exp, max_exp]`
+`exp`:: Exponent of the result, generally in `[min_exp, max_exp]`
+`max_exp`:: Maximum exponent value based on precision
+`min_exp`:: Minimum exponent value based on precision
+
+== Background (TODO: NEED NEW BACKGROUND FOR FMAS)
+
+Cancellation occurs when the most significant digits of two operands subtract to zero, triggering a massive normalization effort as the hardware shifts out the many leading zeros. In high-performing floating point processors, a Leading Zero Anticipator (LZA) is used to predict the location of the most significant bit of the result to perform normalization. Generating operands that force specific cancellation depths thus tests the accuracy of the LZA, especially for the most difficult edge cases where the depth of cancellation is close to the precision.
+
+=== Design Choices (TODO: UPDATE DESIGN CHOICES)
+
+. If a_exp and b_exp are both negative, their sum must not exceed the lower exponent bound. This means that a_exp + b_exp > min_exp
+. If a_exp and b_exp are both positive, their sum must not exceed the upper exponent bound. This means that a_exp + b_exp < max_exp
+. Even though theoretically whether the product exponent is greater or the addend operand is greater should be random, we have the addend exponent to be greater for most cases to make controlling cancellation easier. Since we're only interested in the difference between the exponent of the operands and the intermediate result, this difference should not be of main concern, but could be a future improvement
+. For extremely deep cancellation cases the only possible result is Zero
+
+=== Notes (TODO: CHANGE NOTES)
+
+All explanation in this document will be done with one instead of both operations because the only difference will be sign assignments. All exponents used in the examples are unbiased exponents.
+
+== General Procedure
+We need to ensure that the product exponent `a_raw` and `b_raw` is high enough that even at the most extreme cancellation `d = -(2p+1)`, the result stays above the subnormal floor `min_raw`. Conversely, it must be low enough that at `d=1`, we don't overflow `max_raw`.
+
+Two situations: ab_exp is max, or c_exp is max. When we multiply a, b, the intermediate product is 2p bits wide
+
+product of a and b cannot be too different from desired, otherwise c would dominate and cancellation would be 0
+
+Unlike B2, we cannot use softfloat to generate desired result because ab has 2p precision and softfloat will remove that and round it to p.
+
+For single precision, the `c` only has `m` bits of precision (23 bits for f32). It can only cancel the top `m` bits of the product `a*b`. If you want a cancellation depth of 40, bits 24 through 39 of `a*b` must be exactly zero. If we use purely random mantissas for `a`` and `b`, their cross-multiplication will fill that gap with random noise, `c` won't be able to reach deep enough to cancel it, and the cancellation depth will permanently collapse back to around -24.
+
+Case 1: d < -2m, then we have to generate subnormal numbers, so a_raw and b_raw both need to be the smallest exponent possible
+Case 2: d < -m, meaning the cancellation is deeper than the precision. Set all operands to be power of 2 is an easy way to solve this
+Case 3: shallow cancellation d= -6, -5, -4, -3, -2, -1. (values are experimentally determined) Because shallow bins are very sensitive to
+1. whether c_exp becomes larger than ab_exp
+2. whether the final rounded result carries up one exponent
+3. whether cancellation becomes slightly deeper than intended
+4. whether product rounding changes ab_exp
+Case 4: Other d, random generation
+Case 5: d = 0. We make the exponent of c larger than ab_exp, so the result can be the same exp as c
+Case 6: d = 1. Again we make the exponent of c larger than ab_exp, and force c to be the largest mantissa so the result definitely carries
+
+== Specific Test Procedure
+Case 3: for d = -6, -5, -4, -3, -2, -1, the process looks like
+1. Generate a and b
+2. Compute the real rounded product exponent ab_exp.
+3. Choose a desired result with exponent ab_exp + d
+4. Solve for c
+5. Reject unless c_exp == ab_exp
+6. Reject unless actual_b16_d(...) == d
+
+== Test Count Breakdown
+
+Since each precision has tests going from -(2*p+1) to 1, the amount of test case per precision per operation is (1- -(2*p+1))/1+1 = 2*p+3
+
+[cols="1,1,1",options="header"]
+|===
+| Precision | p | Total (2*p+3)
+
+| BF_16
+| 8
+| 19
+
+| FP_16
+| 11
+| 25
+
+| FP_32
+| 24
+| 51
+
+| FP_64
+| 53
+| 109
+
+| FP_128
+| 113
+| 229
+|===
+
+=== Overall Test Count
+
+[cols="1,1",options="header"]
+|===
+| Description | Value
+
+| Total cancellation tests
+| 433
+
+| Accounting for FMADD, FMSUB, FNMADD, and FNMSUB (×4)
+| 1732
+|===
diff --git a/src/cover_float/testgen/B16.py b/src/cover_float/testgen/B16.py
new file mode 100644
index 0000000..6cb2ac4
--- /dev/null
+++ b/src/cover_float/testgen/B16.py
@@ -0,0 +1,234 @@
+"""
+Angela Zheng (angela20061015@gmail.com)
+
+Created: 4/28/2026
+Last Modified: 4/28/2026
+"""
+
+import logging
+import random
+from dataclasses import dataclass
+from pathlib import Path
+from random import seed
+from typing import TextIO, cast
+
+import cover_float.common.log as log
+from cover_float.common.constants import (
+    BIAS,
+    EXPONENT_BITS,
+    FLOAT_FMTS,
+    MANTISSA_BITS,
+    OP_FMADD,
+    OP_FMSUB,
+    OP_FNMADD,
+    OP_FNMSUB,
+    OP_MUL,
+    UNBIASED_EXP,
+)
+from cover_float.common.util import (
+    decimal_components_to_hex,
+    generate_test_vector,
+    get_result_from_ref,
+    reproducible_hash,
+)
+from cover_float.reference import run_and_store_test_vector
+from cover_float.testgen.model import register_model
+
+logger: log.ModelLogger = cast(log.ModelLogger, logging.getLogger("B16"))
+
+OPS = [OP_FMADD, OP_FMSUB, OP_FNMADD, OP_FNMSUB]
+SOLVER_OPS = {
+    OP_FMADD: OP_FNMSUB,
+    OP_FMSUB: OP_FMSUB,
+    OP_FNMADD: OP_FNMADD,
+    OP_FNMSUB: OP_FMADD,
+}
+
+
+@dataclass(frozen=True)
+class FloatFormat:
+    name: str
+    m_bits: int
+    e_bits: int
+    bias: int
+    min_exp: int
+    max_exp: int
+
+    @property
+    def p(self) -> int:
+        return self.m_bits + 1
+
+    @classmethod
+    def from_name(cls, name: str) -> "FloatFormat":
+        min_e, max_e = UNBIASED_EXP[name]
+        return cls(name, MANTISSA_BITS[name], EXPONENT_BITS[name], BIAS[name], min_e, max_e)
+
+    def to_hex(self, sign: int, exp: int, mant: int) -> str:
+        return decimal_components_to_hex(self.name, sign, exp + self.bias, mant)
+
+    def get_exp(self, fp_hex: str) -> int:
+        bits = int(fp_hex, 16)
+        return ((bits >> self.m_bits) & ((1 << self.e_bits) - 1)) - self.bias
+
+
+class B16Generator:
+    def __init__(self, fmt: str, test_f: TextIO, cover_f: TextIO) -> None:
+        self.f = FloatFormat.from_name(fmt)
+        self.test_f, self.cover_f = test_f, cover_f
+
+    def get_op_details(self, op: str, a: str, b: str, c: str) -> tuple[int, int]:
+        """Helper to get actual product exp and final result exp."""
+        p_hex = get_result_from_ref(OP_MUL, a, b, "0", self.f.name)
+        r_hex = get_result_from_ref(op, a, b, c, self.f.name)
+        return self.f.get_exp(p_hex), self.f.get_exp(r_hex)
+
+    def store(self, op: str, a: str, b: str, c: str) -> None:
+        v = generate_test_vector(op, int(a, 16), int(b, 16), int(c, 16), self.f.name, self.f.name)
+        run_and_store_test_vector(v, self.test_f, self.cover_f)
+
+    def get_random_split(self, target_exp: int) -> tuple[int, int]:
+        """Splits a target product exponent into two valid operand exponents."""
+        lo, hi = self.f.min_exp, self.f.max_exp
+        a_min, a_max = max(lo, target_exp - hi), min(hi, target_exp - lo)
+        a = random.randint(a_min, a_max)
+        return a, target_exp - a
+
+    def generate_same_exp(self, d: int, op: str) -> bool:
+        f, m = self.f, self.f.m_bits
+        a_s, b_s = random.randint(0, 1), random.randint(0, 1)
+        c_s = (a_s ^ b_s) if op in [OP_FMADD, OP_FNMADD] else (a_s ^ b_s) ^ 1
+        target_p_exp = random.randint(0, f.max_exp)
+        a_r = random.randint(0, target_p_exp)
+        b_r = target_p_exp - a_r
+        c_r = a_r + b_r + 3
+        a_m, b_m, c_m = random.getrandbits(m), random.getrandbits(m), random.getrandbits(m)
+        a_h = f.to_hex(a_s, a_r, a_m)
+        b_h = f.to_hex(b_s, b_r, b_m)
+        c_h = f.to_hex(c_s, c_r, c_m)
+        r_hex = get_result_from_ref(op, a_h, b_h, c_h, f.name)
+        r_exp = f.get_exp(r_hex)
+        if r_exp != c_r:
+            return False
+        self.store(op, a_h, b_h, c_h)
+        return True
+
+    def generate_shallow_cancel(self, d: int, op: str) -> bool:
+        f, m = self.f, self.f.m_bits
+        a_s, b_s = random.randint(0, 1), random.randint(0, 1)
+
+        # pick a safe product exponent away from underflow/overflow
+        target_p_exp = random.randint(f.min_exp + 10, f.max_exp - 10)
+        a_min = max(f.min_exp + 5, target_p_exp - (f.max_exp - 5))
+        a_max = min(f.max_exp - 5, target_p_exp - (f.min_exp + 5))
+        a_r = random.randint(a_min, a_max)
+        b_r = target_p_exp - a_r
+
+        # use non-extreme mantissas to avoid accidental exponent carry in a*b
+        a_m = random.randint(1 << (m - 2), (1 << m) - 1)
+        b_m = random.randint(1 << (m - 2), (1 << m) - 1)
+        a_h = f.to_hex(a_s, a_r, a_m)
+        b_h = f.to_hex(b_s, b_r, b_m)
+        p_exp = self.f.get_exp(get_result_from_ref(OP_MUL, a_h, b_h, "0", f.name))
+        res_raw = p_exp + d
+
+        # pick a mid-range result mantissa so rounding is less likely to change exponent
+        r_s = random.randint(0, 1)
+        res_m = random.randint(1 << (m - 2), (1 << (m - 1)) - 1)
+        res_h = f.to_hex(r_s, res_raw, res_m)
+        c_h = get_result_from_ref(SOLVER_OPS[op], a_h, b_h, res_h, f.name)
+        c_exp = f.get_exp(c_h)
+
+        # for shallow cancellation, c should be aligned with the product
+        if c_exp != p_exp:
+            return False
+        # Final validation
+        p_exp2, r_exp = self.get_op_details(op, a_h, b_h, c_h)
+        if (r_exp - max(p_exp2, c_exp)) == d:
+            self.store(op, a_h, b_h, c_h)
+            return True
+        return False
+
+    def generate_deep_cancel(self, d: int, op: str) -> bool:
+        f = self.f
+        a_s, b_s, r_s = random.randint(0, 1), random.randint(0, 1), random.randint(0, 1)
+        res_raw, res_m, a_m, b_m = f.min_exp - 1, 0, 0, 0
+        target_p_exp = res_raw - d
+        split = self.get_random_split(target_p_exp)
+        a_r, b_r = split
+        a_h, b_h = f.to_hex(a_s, a_r, a_m), f.to_hex(b_s, b_r, b_m)
+        p_exp = self.f.get_exp(get_result_from_ref(OP_MUL, a_h, b_h, "0", f.name))
+        res_h = f.to_hex(r_s, p_exp + d, res_m)
+        c_h = get_result_from_ref(SOLVER_OPS[op], a_h, b_h, res_h, f.name)
+        p_exp, r_exp = self.get_op_details(op, a_h, b_h, c_h)
+        c_exp = f.get_exp(c_h)
+        if (r_exp - max(p_exp, c_exp)) == d:
+            self.store(op, a_h, b_h, c_h)
+            return True
+        return False
+
+    def generate(self, d: int, op: str) -> bool:
+        f, m = self.f, self.f.m_bits
+        a_s, b_s, r_s = random.randint(0, 1), random.randint(0, 1), random.randint(0, 1)
+
+        if d <= -(2 * f.p - 1):
+            return self.generate_deep_cancel(d, op)
+        elif d in [-6, -5, -4, -3, -2, -1]:
+            return self.generate_shallow_cancel(d, op)
+        elif d == 0:
+            return self.generate_same_exp(d, op)
+        elif d == 1:  # need result > operands
+            a_raw, b_raw = random.randint(0, f.max_exp // 2), random.randint(0, f.max_exp // 2)
+            c_s = 0 if op in [OP_FMADD, OP_FNMADD] else 1
+            a_h = f.to_hex(a_s, a_raw, random.getrandbits(m))
+            b_h = f.to_hex(a_s, b_raw, random.getrandbits(m))
+            c_h = f.to_hex(c_s, a_raw + b_raw + 1, (1 << m) - 1)
+            self.store(op, a_h, b_h, c_h)
+            return True
+        else:
+            valid_lo = max(f.min_exp, (f.min_exp - 1) - d)
+            valid_hi = min(f.max_exp, f.max_exp - d)
+            target_p_exp = random.randint(valid_lo, valid_hi)
+
+        # generate a and b
+        split = self.get_random_split(target_p_exp)
+        a_r, b_r = split
+
+        # special mantissas for deep cancellation
+        if d < -m:
+            target_depth = abs(d)
+            k = max(0, 2 * m - target_depth) // 2
+            a_m, b_m, res_m = 1 << k, 1 << (max(0, 2 * m - target_depth) - k), 0
+        else:
+            a_m, b_m, res_m = [random.getrandbits(m) for _ in range(3)]
+
+        a_h, b_h = f.to_hex(a_s, a_r, a_m), f.to_hex(b_s, b_r, b_m)
+
+        # solve for c
+        p_exp = self.f.get_exp(get_result_from_ref(OP_MUL, a_h, b_h, "0", f.name))
+        res_h = f.to_hex(r_s, p_exp + d, res_m)
+        c_h = get_result_from_ref(SOLVER_OPS[op], a_h, b_h, res_h, f.name)
+
+        p_exp, r_exp = self.get_op_details(op, a_h, b_h, c_h)
+        c_exp = f.get_exp(c_h)
+
+        if (r_exp - max(p_exp, c_exp)) == d:
+            self.store(op, a_h, b_h, c_h)
+            return True
+        return False
+
+
+@register_model("B16")
+def main(test_f: TextIO, cover_f: TextIO) -> None:
+    with (
+        Path("./tests/testvectors/B16_tv.txt").open("w") as tf,
+        Path("./tests/covervectors/B16_cv.txt").open("w") as cf,
+    ):
+        for fmt_name in FLOAT_FMTS:
+            gen = B16Generator(fmt_name, tf, cf)
+            for d in range(-(2 * gen.f.p + 1), 2):
+                retries = 15
+                for op in OPS:
+                    seed(reproducible_hash(f"{fmt_name}_b16_{d}_{op}"))
+                    for _ in range(retries):
+                        if gen.generate(d, op):
+                            break
diff --git a/src/cover_float/testgen/__init__.py b/src/cover_float/testgen/__init__.py
index d87d958..542ed81 100644
--- a/src/cover_float/testgen/__init__.py
+++ b/src/cover_float/testgen/__init__.py
@@ -26,6 +26,7 @@
 import cover_float.testgen.B13 as B13
 import cover_float.testgen.B14 as B14
 import cover_float.testgen.B15 as B15
+import cover_float.testgen.B16 as B16
 import cover_float.testgen.B20 as B20
 import cover_float.testgen.B21 as B21
 import cover_float.testgen.B25 as B25
@@ -48,6 +49,7 @@
     "B13",
     "B14",
     "B15",
+    "B16",
     "B20",
     "B21",
     "B25",