Skip to content

Virtual experts#45

Merged
chrishayuk merged 80 commits intomainfrom
virtual-experts
May 4, 2026
Merged

Virtual experts#45
chrishayuk merged 80 commits intomainfrom
virtual-experts

Conversation

@chrishayuk
Copy link
Copy Markdown
Owner

No description provided.

  # walk_path_audit — baseline index

  Per-path equivalence audit for `WalkFfn` dispatch paths. Each entry
  below records a measurement of one (model, vindex variant) pair against
  the `WeightFfn` dense matmul reference, with the assertion bounds
  locked in from that measurement.

  ## Methodology

  For each `WalkFfn` path a forced-dispatch measurement is taken via a
  `MaskedGateIndex` wrapper that hides the `has_*` flags above the target
  path in the routing ladder. Three prompts (anchor + factual + code)
  are run end-to-end through `predict_with_ffn`, with a per-layer
  `DualFfn` capturing the diff between the path's output and the
  reference at every (layer, position).

  Assertion metrics are **cos** and **relative L2** (`L2 / ‖primary‖`),
  both magnitude-invariant. Absolute L2 and max-element drift are kept
  as diagnostic columns to surface residual-magnitude outliers (e.g. the
  L11/code/1 ` fibonacci` spike on Gemma 3 4B) without driving the
  verdict. Per-path bounds use a measure-then-tighten rule: cosine floor
  at one decimal less precise than the measured worst; rel_L2 ceiling at
  measured worst × 4.

  Source: `crates/larql-inference/examples/walk_path_audit.rs`.

  ## Baselines

  | date | model | vindex | paths tested | min cos | max rel L2 | n_obs | verdict |
  |---|---|---|---|---|---|---|---|
  | 2026-05-01 | google/gemma-3-4b-it | gemma3-4b-f16 | sparse, full_mmap, exact | 0.999997 | 1.881e-3 | 1,326 | 3/3 PASS |

  ### 2026-05-01 — Gemma 3 4B f16 (canonical baseline)

  The f32 paths agree at cos = 0.999997 across 1,326 observations, three
  independent code paths land on identical assertion values, dispatch
  trace verified 102/102 layers per path. Worst rel_L2 observed at
  L32/paris/0 (BOS position of the Paris prompt). Top-1 token matches on
  all three prompts × three paths; Paris probability holds to within
  1.4e-4 of dense.

  Bounds locked: `cos ≥ 0.99999, rel_L2 ≤ 1e-2` for the exact bucket.
  The rel_L2 ceiling is intentionally loose pending Q4K and FP4 baseline
  measurements — see inline comment at `BOUND_EXACT` for the sequencing
  rule. Target post-matrix tightening: ~7.5e-3 (= measured × 4).

  Artifacts: `walk_path_audit_gemma3_4b_f16_baseline.{md,json}`.

  ## Sequenced follow-ups

  Each is its own measure-bound-commit cycle, separate PR:

  1. `gemma3-4b-q4k-v2.vindex` → measure `interleaved_q4k:dequant`, set
     quantized rel_L2 bound at measured × 4.
  2. `gemma3-4b-fp4a.vindex` → measure `fp4_storage:sparse`, set fp4
     bound at measured × 4.
  3. Single cross-bucket bound-tightening commit once all three
     measurements are in (will tighten the f16 exact rel_L2 from the
     intentionally-loose 1e-2 to ~7.5e-3).

  ---
  Commit message (paste into a HEREDOC or your editor):

  docs(audits): walk path equivalence index — f16 baseline cos=0.999997

  Adds docs/audits/walk_path_audit/INDEX.md documenting the per-path
  equivalence audit methodology and recording the canonical Gemma 3 4B
  f16 baseline measurement.

  Headline finding: the f32 paths (sparse, full_mmap, exact) agree at
  cos = 0.999997 across 1,326 observations, three independent code
  paths land on identical assertion values, dispatch trace verified
  102/102 layers per path. All three pass cos ≥ 0.99999, rel_L2 ≤ 1e-2
  with comfortable margin. Top-1 matches on every prompt × path; Paris
  probability holds to within 1.4e-4 of dense. Worst rel_L2 observed at
  L32/paris/0.

  The harness (walk_path_audit.rs example), the MaskedGateIndex
  wrapper, and the per-path baseline artifacts landed in 84aee5a,
  bundled with unrelated working-tree work. This commit is a follow-up
  to make the audit and its baseline discoverable via `git log` and
  repo search.

  Searchable terms: walk path equivalence, walk_path_audit, f16
  baseline, MaskedGateIndex, cos 0.999997, 1326 observations, dispatch
  trace, WeightFfn / WalkFfn parity, rel_L2 1.881e-3, L32/paris/0.

  Sequenced follow-ups (separate commits, one per vindex variant):

    - gemma3-4b-q4k-v2.vindex → measure interleaved_q4k:dequant
    - gemma3-4b-fp4a.vindex → measure fp4_storage:sparse
    - then a single cross-bucket bound-tightening commit (will close
      the deliberately-loose f16 exact rel_L2 ceiling once Q4K and FP4
      measurements have set their own measured-worst-×-4 bounds)
@chrishayuk chrishayuk merged commit ae93375 into main May 4, 2026
1 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant