Virtual experts by chrishayuk · Pull Request #45 · chrishayuk/larql

chrishayuk · 2026-05-04T20:07:38Z

No description provided.

# walk_path_audit — baseline index Per-path equivalence audit for `WalkFfn` dispatch paths. Each entry below records a measurement of one (model, vindex variant) pair against the `WeightFfn` dense matmul reference, with the assertion bounds locked in from that measurement. ## Methodology For each `WalkFfn` path a forced-dispatch measurement is taken via a `MaskedGateIndex` wrapper that hides the `has_*` flags above the target path in the routing ladder. Three prompts (anchor + factual + code) are run end-to-end through `predict_with_ffn`, with a per-layer `DualFfn` capturing the diff between the path's output and the reference at every (layer, position). Assertion metrics are **cos** and **relative L2** (`L2 / ‖primary‖`), both magnitude-invariant. Absolute L2 and max-element drift are kept as diagnostic columns to surface residual-magnitude outliers (e.g. the L11/code/1 ` fibonacci` spike on Gemma 3 4B) without driving the verdict. Per-path bounds use a measure-then-tighten rule: cosine floor at one decimal less precise than the measured worst; rel_L2 ceiling at measured worst × 4. Source: `crates/larql-inference/examples/walk_path_audit.rs`. ## Baselines | date | model | vindex | paths tested | min cos | max rel L2 | n_obs | verdict | |---|---|---|---|---|---|---|---| | 2026-05-01 | google/gemma-3-4b-it | gemma3-4b-f16 | sparse, full_mmap, exact | 0.999997 | 1.881e-3 | 1,326 | 3/3 PASS | ### 2026-05-01 — Gemma 3 4B f16 (canonical baseline) The f32 paths agree at cos = 0.999997 across 1,326 observations, three independent code paths land on identical assertion values, dispatch trace verified 102/102 layers per path. Worst rel_L2 observed at L32/paris/0 (BOS position of the Paris prompt). Top-1 token matches on all three prompts × three paths; Paris probability holds to within 1.4e-4 of dense. Bounds locked: `cos ≥ 0.99999, rel_L2 ≤ 1e-2` for the exact bucket. The rel_L2 ceiling is intentionally loose pending Q4K and FP4 baseline measurements — see inline comment at `BOUND_EXACT` for the sequencing rule. Target post-matrix tightening: ~7.5e-3 (= measured × 4). Artifacts: `walk_path_audit_gemma3_4b_f16_baseline.{md,json}`. ## Sequenced follow-ups Each is its own measure-bound-commit cycle, separate PR: 1. `gemma3-4b-q4k-v2.vindex` → measure `interleaved_q4k:dequant`, set quantized rel_L2 bound at measured × 4. 2. `gemma3-4b-fp4a.vindex` → measure `fp4_storage:sparse`, set fp4 bound at measured × 4. 3. Single cross-bucket bound-tightening commit once all three measurements are in (will tighten the f16 exact rel_L2 from the intentionally-loose 1e-2 to ~7.5e-3). --- Commit message (paste into a HEREDOC or your editor): docs(audits): walk path equivalence index — f16 baseline cos=0.999997 Adds docs/audits/walk_path_audit/INDEX.md documenting the per-path equivalence audit methodology and recording the canonical Gemma 3 4B f16 baseline measurement. Headline finding: the f32 paths (sparse, full_mmap, exact) agree at cos = 0.999997 across 1,326 observations, three independent code paths land on identical assertion values, dispatch trace verified 102/102 layers per path. All three pass cos ≥ 0.99999, rel_L2 ≤ 1e-2 with comfortable margin. Top-1 matches on every prompt × path; Paris probability holds to within 1.4e-4 of dense. Worst rel_L2 observed at L32/paris/0. The harness (walk_path_audit.rs example), the MaskedGateIndex wrapper, and the per-path baseline artifacts landed in 84aee5a, bundled with unrelated working-tree work. This commit is a follow-up to make the audit and its baseline discoverable via `git log` and repo search. Searchable terms: walk path equivalence, walk_path_audit, f16 baseline, MaskedGateIndex, cos 0.999997, 1326 observations, dispatch trace, WeightFfn / WalkFfn parity, rel_L2 1.881e-3, L32/paris/0. Sequenced follow-ups (separate commits, one per vindex variant): - gemma3-4b-q4k-v2.vindex → measure interleaved_q4k:dequant - gemma3-4b-fp4a.vindex → measure fp4_storage:sparse - then a single cross-bucket bound-tightening commit (will close the deliberately-loose f16 exact rel_L2 ceiling once Q4K and FP4 measurements have set their own measured-worst-×-4 bounds)

chrishayuk added 30 commits April 24, 2026 22:36

working on fp4

770db26

working on q4

06e2063

improving testing of compute

10ff401

working on kernel tests

8c60fe0

roadmap.md

b225d08

working on shaders and kernels

ee0c4af

working on quantization

14e8d04

working on vindex and compute

96225c6

working on clean up

87106a2

compute refactor

dabd484

more metal improvements

2fe1a39

cleaning up compute and vindex

19bc6e7

performance

60f14ed

improved performance

a0d77d0

docs cleanup, and refactor cleanup

bdd34c1

vindex cleanup

2a3bce4

improvements to vindex

c2afc0d

performance improvements

09ebff6

improved vindex

79fe9c7

performance

ea4a112

more performance optimizations

1362bf5

improving testing

173f893

improved performance

ca429d3

improved test coverage

b043834

larql models test coverage

9b82681

workig on larql-server and performance

1e010ed

docs

41ae236

working on coverage

b41663a

performance improvements, working on moe

6b42237

working on refactor

daf3452

chrishayuk added 28 commits April 30, 2026 23:51

core

7ba6f8c

17 tokens per eecond 26B cpu

beb99e3

working on grid

a7996cf

improving performance and cleanliness

84aee5a

cleaning up vindex

29d2d8f

roadamp tidyup

ff82c0a

working on larql-server

d3a8bc6

improved larq-server with refactor

2e5ba51

working on openai compiance

b21a3da

updated docs on performance

953f85b

openai compliance

6f98292

working on lql cleanup

b1d039f

cleaning up ov_rd

c814e24

adding more mechanistic interpretability capabiltiies

846b593

working on ov_rd

505b131

tidied up lql

f250ce0

cleanup of magix strings

dd64ce8

updating roadmap

16a0f02

cleanup

38f3e93

clean up

3054509

clippy

18edf8a

working on video scripts

3cc559c

fixed shard demo

69d450a

cleanup for script and remote ffn

c224008

performance improvements for script

24cd90f

working on demo script

6cb7c33

fixed bench

4064bf4

chrishayuk merged commit ae93375 into main May 4, 2026
1 of 4 checks passed

deem0n mentioned this pull request May 5, 2026

fix(build): restore deleted extract/build.rs and align stale test/exa… #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtual experts#45

Virtual experts#45
chrishayuk merged 80 commits intomainfrom
virtual-experts

chrishayuk commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chrishayuk commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant