Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
770db26
working on fp4
chrishayuk Apr 24, 2026
06e2063
working on q4
chrishayuk Apr 24, 2026
10ff401
improving testing of compute
chrishayuk Apr 25, 2026
8c60fe0
working on kernel tests
chrishayuk Apr 25, 2026
b225d08
roadmap.md
chrishayuk Apr 25, 2026
ee0c4af
working on shaders and kernels
chrishayuk Apr 25, 2026
14e8d04
working on quantization
chrishayuk Apr 25, 2026
96225c6
working on vindex and compute
chrishayuk Apr 25, 2026
87106a2
working on clean up
chrishayuk Apr 25, 2026
dabd484
compute refactor
chrishayuk Apr 25, 2026
2fe1a39
more metal improvements
chrishayuk Apr 25, 2026
19bc6e7
cleaning up compute and vindex
chrishayuk Apr 25, 2026
60f14ed
performance
chrishayuk Apr 25, 2026
a0d77d0
improved performance
chrishayuk Apr 25, 2026
bdd34c1
docs cleanup, and refactor cleanup
chrishayuk Apr 25, 2026
2a3bce4
vindex cleanup
chrishayuk Apr 25, 2026
c2afc0d
improvements to vindex
chrishayuk Apr 25, 2026
09ebff6
performance improvements
chrishayuk Apr 25, 2026
79fe9c7
improved vindex
chrishayuk Apr 25, 2026
ea4a112
performance
chrishayuk Apr 25, 2026
1362bf5
more performance optimizations
chrishayuk Apr 25, 2026
173f893
improving testing
chrishayuk Apr 25, 2026
ca429d3
improved performance
chrishayuk Apr 25, 2026
b043834
improved test coverage
chrishayuk Apr 26, 2026
9b82681
larql models test coverage
chrishayuk Apr 26, 2026
1e010ed
workig on larql-server and performance
chrishayuk Apr 26, 2026
41ae236
docs
chrishayuk Apr 26, 2026
b41663a
working on coverage
chrishayuk Apr 26, 2026
6b42237
performance improvements, working on moe
chrishayuk Apr 26, 2026
daf3452
working on refactor
chrishayuk Apr 26, 2026
e1b95ac
working on test coverage
chrishayuk Apr 26, 2026
fbb5a70
huge update on quality
chrishayuk Apr 26, 2026
faf9ad6
models done
chrishayuk Apr 26, 2026
64eec18
fixed performance issue
chrishayuk Apr 26, 2026
6116e79
cleaning up inference
chrishayuk Apr 26, 2026
c12c59f
moe
chrishayuk Apr 26, 2026
ec40814
working on inference
chrishayuk Apr 26, 2026
077884b
working on performance
chrishayuk Apr 27, 2026
d768039
working on inference harness
chrishayuk Apr 27, 2026
ff7148b
working on moe performance
chrishayuk Apr 27, 2026
deb1b63
working on moe
chrishayuk Apr 27, 2026
6add16b
working on grpc
chrishayuk Apr 27, 2026
5f35276
moe improvements for grpc
chrishayuk Apr 28, 2026
a854151
working on expert sharding
chrishayuk Apr 29, 2026
be7222d
working on accuracy for moe
chrishayuk Apr 29, 2026
56a7cd1
paris
chrishayuk Apr 29, 2026
ee49d8d
working on cleanliness
chrishayuk Apr 29, 2026
d251ce9
working on inference accuracy
chrishayuk Apr 30, 2026
da44f4e
working on performance and cleanup
chrishayuk Apr 30, 2026
e77a23d
working on cleanup
chrishayuk Apr 30, 2026
35aed33
imptoved core
chrishayuk Apr 30, 2026
bff2190
working on nechanistci
chrishayuk Apr 30, 2026
7ba6f8c
core
chrishayuk Apr 30, 2026
beb99e3
17 tokens per eecond 26B cpu
chrishayuk May 1, 2026
a7996cf
working on grid
chrishayuk May 1, 2026
84aee5a
improving performance and cleanliness
chrishayuk May 1, 2026
aa42a36
File: docs/audits/walk_path_audit/INDEX.md
chrishayuk May 1, 2026
29d2d8f
cleaning up vindex
chrishayuk May 1, 2026
ff82c0a
roadamp tidyup
chrishayuk May 1, 2026
d3a8bc6
working on larql-server
chrishayuk May 1, 2026
2e5ba51
improved larq-server with refactor
chrishayuk May 1, 2026
b21a3da
working on openai compiance
chrishayuk May 1, 2026
953f85b
updated docs on performance
chrishayuk May 1, 2026
6f98292
openai compliance
chrishayuk May 2, 2026
b1d039f
working on lql cleanup
chrishayuk May 2, 2026
c814e24
cleaning up ov_rd
chrishayuk May 2, 2026
846b593
adding more mechanistic interpretability capabiltiies
chrishayuk May 2, 2026
505b131
working on ov_rd
chrishayuk May 2, 2026
f250ce0
tidied up lql
chrishayuk May 2, 2026
dd64ce8
cleanup of magix strings
chrishayuk May 2, 2026
16a0f02
updating roadmap
chrishayuk May 2, 2026
38f3e93
cleanup
chrishayuk May 2, 2026
3054509
clean up
chrishayuk May 2, 2026
18edf8a
clippy
chrishayuk May 3, 2026
3cc559c
working on video scripts
chrishayuk May 3, 2026
69d450a
fixed shard demo
chrishayuk May 3, 2026
c224008
cleanup for script and remote ffn
chrishayuk May 3, 2026
24cd90f
performance improvements for script
chrishayuk May 4, 2026
6cb7c33
working on demo script
chrishayuk May 4, 2026
4064bf4
fixed bench
chrishayuk May 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 9 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
target/
output/
.git/
.claude/
knowledge/
experiments/
docs/
*.vindex
/tmp/
98 changes: 98 additions & 0 deletions .github/workflows/bench-regress.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Bench regression detector — runs `make bench-check` on every PR
# against a baseline saved on `main`. Fails the workflow if any cell
# in the criterion bench suite regresses past Criterion's noise
# threshold.
#
# Surface covered (`make bench` = `make bench-quant + bench-matmul + bench-linalg`):
# - `quant_matvec`: Q4_0 / Q4_K / Q4_KF / Q6_K × 3 shapes × cpu/metal
# - `matmul`: f32 matmul + f32_gemv (lm-head) — cpu vs metal
# - `linalg`: cholesky + ridge solve (cpu only)
#
# That's the surface where the next throughput cliff would show up
# first. The 75 %-row drop in `q4_matvec_v4` would have shown as a 4×
# regression at `quant_matvec_q4_0/metal/lm_head_262144` weeks before
# goldens caught it.

name: bench-regress

on:
push:
branches: [main]
pull_request:
branches: [main]
# Manual trigger so a maintainer can re-baseline after intentional
# perf changes without waiting for the next merge to main.
workflow_dispatch: {}

jobs:
bench:
# macos-14 = Apple Silicon (M1+). Required for the metal cells —
# without it, drop --features metal from FEATURES to skip them
# and run only the CPU surface on any runner.
runs-on: macos-14
timeout-minutes: 90

steps:
- uses: actions/checkout@v4

# Cargo deps are big and stable across PRs — separate cache.
- name: Cache cargo deps
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-cargo-bench-${{ hashFiles('**/Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-bench-

# Criterion baselines: write-through on main, read-only on PRs.
# Keyed by the run number so each main push refreshes the cache.
- name: Cache criterion baseline (main only)
if: github.ref == 'refs/heads/main'
uses: actions/cache@v4
with:
path: target/criterion
key: ${{ runner.os }}-criterion-baseline-${{ github.run_number }}
restore-keys: |
${{ runner.os }}-criterion-baseline-

- name: Restore criterion baseline (PRs only)
if: github.event_name == 'pull_request'
uses: actions/cache/restore@v4
with:
path: target/criterion
key: ${{ runner.os }}-criterion-baseline-
restore-keys: |
${{ runner.os }}-criterion-baseline-

- name: Save baseline (main only)
if: github.ref == 'refs/heads/main'
run: make bench-save

- name: Check vs baseline (PRs + manual)
if: github.event_name == 'pull_request' || github.event_name == 'workflow_dispatch'
run: |
# Cold cache → bench-check prints "no baseline found" and
# exits 2. Treat as neutral: the first PR after CI is stood
# up shouldn't fail just because there's no baseline yet.
set +e
make bench-check
rc=$?
set -e
if [ "$rc" -eq 2 ]; then
echo "::warning::no criterion baseline cached; skipping regression check"
exit 0
fi
exit "$rc"

# On regression, attach the criterion HTML report so reviewers
# can see the per-cell delta without re-running locally.
- name: Upload criterion report on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: criterion-report
path: target/criterion/
retention-days: 14
68 changes: 68 additions & 0 deletions .github/workflows/larql-models.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# larql-models cross-platform CI
#
# Runs check + clippy + tests + bench test-mode on Linux, Windows, and macOS
# for every change to the larql-models crate. Validates cross-platform compatibility:
# - Linux (x86_64-unknown-linux-gnu)
# - Windows (x86_64-pc-windows-msvc) — HF cache path, mmap, path separators
# - macOS (aarch64-apple-darwin) — NEON SIMD paths

name: larql-models

on:
push:
branches: [main]
paths:
- 'crates/larql-models/**'
- 'Cargo.toml'
- 'Cargo.lock'
- '.github/workflows/larql-models.yml'
pull_request:
branches: [main]
paths:
- 'crates/larql-models/**'
- 'Cargo.toml'
- 'Cargo.lock'
- '.github/workflows/larql-models.yml'
workflow_dispatch: {}

jobs:
test:
name: test · ${{ matrix.os }}
runs-on: ${{ matrix.os }}
timeout-minutes: 20

strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-14]

steps:
- uses: actions/checkout@v4

- name: Install stable Rust
uses: dtolnay/rust-toolchain@stable
with:
components: clippy

- name: Cache cargo registry + build artefacts
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
~/.cargo/git
target
key: ${{ runner.os }}-cargo-models-${{ hashFiles('**/Cargo.lock') }}
restore-keys: |
${{ runner.os }}-cargo-models-

- name: Check (all targets)
run: cargo check -p larql-models --all-targets

- name: Clippy (warnings as errors)
run: cargo clippy -p larql-models --all-targets -- -D warnings

- name: Test
run: cargo test -p larql-models

- name: Test benches
run: cargo test -p larql-models --benches
83 changes: 79 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: build release test check clean fmt lint demos
.PHONY: build release test test-fast test-full test-integration test-models check clean fmt lint demos bench bench-save bench-check coverage coverage-summary

# Build
build:
Expand All @@ -8,9 +8,31 @@ release:
cargo build --release -p larql-cli

# Test
test:
#
# Default test target is intentionally fast: no integration binaries, no
# model-backed ignored tests. Use `test-full` for the historical full
# workspace run, and `test-models` for real-model/vindex checks.
test: test-fast

test-fast:
cargo test --workspace --lib --bins

test-full:
cargo test --workspace

test-integration:
cargo test --workspace --tests

test-models:
cargo test -p larql-inference --test test_arch_golden -- --ignored
cargo test -p larql-inference --test test_logits_goldens -- --ignored
cargo test -p larql-inference --test test_gemma3_smoke -- --ignored
cargo test -p larql-inference --test test_generate_q4k_cpu -- --ignored
cargo test -p larql-inference --test bench_probe_latency -- --ignored --nocapture
cargo test -p larql-inference --test test_llm_dispatch -- --ignored --nocapture
cargo test -p larql-inference --test test_constrained_dispatch -- --ignored --nocapture
cargo test -p larql-inference --test test_trie_dispatch -- --ignored --nocapture

# Check (compile without building)
check:
cargo check --workspace
Expand All @@ -26,12 +48,29 @@ lint:
cargo clippy --workspace --tests -- -D warnings

# All quality checks
ci: fmt-check lint test
ci: fmt-check lint test-full

# Clean
clean:
cargo clean

# Benchmarks
#
# `bench` runs the full quant_matvec suite and writes HTML reports under
# `target/criterion/`. `bench-save` records a baseline named `main`;
# `bench-check` re-runs and fails if any cell regresses past Criterion's
# default noise threshold. Plug `bench-check` into CI to catch the next
# 4× throughput cliff (the kind the q4_matvec_v4 row-drop bug caused) at
# PR time, not at goldens-fail time weeks later.
bench:
cargo bench -p larql-compute --bench quant_matvec --features metal

bench-save:
bash scripts/bench-regress.sh save

bench-check:
bash scripts/bench-regress.sh check

# Demos
demos:
cargo run --release -p larql-models --example architecture_demo
Expand All @@ -52,7 +91,43 @@ bench-core:
bench-inference:
cargo run --release -p larql-inference --example bench_inference

bench-all: bench-core bench-inference
# Vindex micro-benches — synthetic, fast, safe under load.
bench-vindex:
cargo bench -p larql-vindex --bench vindex_ops

# Vindex production-dim scaling bench. Refuses if larql-server / router
# are alive (they distort 1-2 GB matmuls). Run alone, on a cool host;
# results feed PERFORMANCE.md.
bench-vindex-scaling:
@if pgrep -fl 'larql-(server|router)' >/dev/null 2>&1; then \
echo "Refusing bench-vindex-scaling: larql daemons running. Stop them first."; \
pgrep -fl 'larql-(server|router)'; \
exit 2; \
fi
cargo bench -p larql-vindex --bench vindex_scaling

bench-all: bench-core bench-inference bench-vindex

# Coverage — uses cargo-llvm-cov (install with `cargo install cargo-llvm-cov`).
# Writes an HTML report to coverage/ that can be opened in a browser.
# Scoped to larql-vindex by default since the audit owner cares about
# that crate; pass CRATE=… to scope elsewhere.
COVERAGE_CRATE ?= larql-vindex
coverage:
@if ! command -v cargo-llvm-cov >/dev/null 2>&1; then \
echo "cargo-llvm-cov not installed. Install with:"; \
echo " cargo install cargo-llvm-cov"; \
exit 1; \
fi
cargo llvm-cov --package $(COVERAGE_CRATE) --html --output-dir coverage
@echo "Report: coverage/html/index.html"

coverage-summary:
@if ! command -v cargo-llvm-cov >/dev/null 2>&1; then \
echo "cargo-llvm-cov not installed."; \
exit 1; \
fi
cargo llvm-cov --package $(COVERAGE_CRATE) --summary-only

# Python extension (managed via uv)
python-setup:
Expand Down
Loading
Loading