Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
64 changes: 64 additions & 0 deletions .github/workflows/harness.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: isolation-harness

on:
push:
branches: ["**"]
pull_request:

jobs:
gates:
# Skip forks to avoid secrets leaking to untrusted PRs
if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name == 'push'
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- uses: dtolnay/rust-toolchain@stable

- uses: Swatinem/rust-cache@v2
with:
key: harness-v1

- name: build larql-server
run: cargo build --release -p larql-server

- name: checkout isolation-harness
uses: actions/checkout@v4
with:
repository: Divinci-AI/larql-isolation-harness
token: ${{ secrets.HARNESS_REPO_TOKEN }}
path: harness
ref: main

- name: build isolation-harness
run: cargo build --release --manifest-path harness/Cargo.toml

- name: start larql-server
run: |
./target/release/larql-server testdata/tiny-vindex --port 8787 &
echo $! > larql.pid
for i in {1..40}; do
curl -sf http://localhost:8787/v1/health && break
sleep 0.5
done
curl -sf http://localhost:8787/v1/health || (echo "server failed to start" && exit 1)

- name: T2 — concurrent read-lock (no serialization)
env:
LARQL_URL: http://localhost:8787
run: ./harness/target/release/isolation-harness concurrent --iterations 200

- name: T3 — session global-leak isolation
env:
LARQL_URL: http://localhost:8787
run: ./harness/target/release/isolation-harness global-leak

- name: T5 — patch revert down/up override leak
env:
LARQL_URL: http://localhost:8787
run: ./harness/target/release/isolation-harness revert

- name: stop larql-server
if: always()
run: kill $(cat larql.pid) 2>/dev/null || true
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,7 @@ build/
# output
output/
data/
experiments/
experiments/
vindexes/
.pids/
docs/replay/*.bak.bak
26 changes: 20 additions & 6 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Three extraction levels gate which LQL statements work: `browse` (DESCRIBE/WALK/
Cargo workspace at repo root with a strict dependency chain — respect this when adding modules:

```
# LARQL-specific (depend on vindex, LQL, etc.)
larql-models model config, architecture traits, weight loading, quant/dequant
larql-compute CPU/Metal matmul backends, pipeline
Expand All @@ -28,8 +29,21 @@ larql-server HTTP + gRPC server serving vindexes
larql-cli top-level `larql` binary (every subcommand lives in commands/)
larql-python PyO3 bindings (maturin-built, module name `larql._native`)
kv-cache-benchmark standalone benchmark crate

# Portable (no LARQL deps; extract to sibling repo later, name stable)
model-compute bounded native kernels (arithmetic/datetime) and optional
wasmtime-hosted WASM modules (features: `native`/`wasm`)
```

**`model-compute` never imports `larql-*`.** Dependency flow is one-way:
LARQL may consume it (e.g. for compile-time `sum(1..100)` resolution); it
knows nothing about vindex or LQL. When it moves to a sibling repo, the
name stays the same so imports don't churn. The `install_edge` primitive
that stamps a compiled edge into gate/up/down tensors lives at
[crates/larql-cli/src/commands/extraction/compile_cmd/edge.rs](crates/larql-cli/src/commands/extraction/compile_cmd/edge.rs) —
it's the lowest-level step of the `COMPILE` verb and isn't a separate crate
until a second consumer needs it.

The CLI is a thin dispatcher: each `larql <cmd>` lives in [crates/larql-cli/src/commands/extraction/](crates/larql-cli/src/commands/extraction/) or [crates/larql-cli/src/commands/query/](crates/larql-cli/src/commands/query/) and is wired into the `Commands` enum in [crates/larql-cli/src/main.rs](crates/larql-cli/src/main.rs). `larql serve` exec's into `larql-server`. `larql repl` and `larql lql` delegate to `larql_lql::run_repl`/`run_statement`.

LQL parser and executor are split symmetrically: [crates/larql-lql/src/parser/](crates/larql-lql/src/parser/) and [crates/larql-lql/src/executor/](crates/larql-lql/src/executor/) both have matching `lifecycle.rs`, `query.rs`, `mutation.rs`, `introspection.rs`, `trace.rs`. When adding a statement, touch the AST in [crates/larql-lql/src/ast.rs](crates/larql-lql/src/ast.rs), then both sides.
Expand Down Expand Up @@ -68,15 +82,15 @@ Or via the Makefile: `make python-setup | python-build | python-test | python-cl
- **Storage is mmap-first.** Gate vectors, embeddings, down weights are zero-copy `mmap`'d. f16 is the default dtype (`--f16` halves size with negligible accuracy loss). Don't load entire tensors into RAM unless an operation requires it.
- **Three extraction levels, not features.** `browse` (~3 GB), `inference` (~6 GB), `all` (~10 GB) — gated by `ExtractLevel` enum in [crates/larql-vindex/src/config/types.rs](crates/larql-vindex/src/config/types.rs). Check level before attempting an operation; fail loudly if weights aren't present.
- **Walk FFN is sparse-by-design and can beat dense** (517ms vs 535ms on Gemma 4B) because gate KNN (K≈10) skips most of the 10,240 features per layer. If you touch FFN code, preserve this invariant — see [docs/ffn-graph-layer.md](docs/ffn-graph-layer.md).
- **MXFP4 quantized MoE (GPT-OSS) has degraded DESCRIBE/WALK** due to 4-bit precision; `INFER` is the supported path. Don't assume all model families are equivalent — see [docs/vindex-operations-spec.md](docs/vindex-operations-spec.md).
- **MXFP4 quantized MoE (GPT-OSS) has degraded DESCRIBE/WALK** due to 4-bit precision; `INFER` is the supported path. Don't assume all model families are equivalent — see [docs/specs/vindex-operations-spec.md](docs/specs/vindex-operations-spec.md).

## Where to find things

- LQL language spec: [docs/lql-spec.md](docs/lql-spec.md) (v0.3)
- Vindex file format: [docs/vindex-format-spec.md](docs/vindex-format-spec.md)
- Operations + patches: [docs/vindex-operations-spec.md](docs/vindex-operations-spec.md)
- Ecosystem (HF publish, Vindexfile): [docs/vindex-ecosystem-spec.md](docs/vindex-ecosystem-spec.md)
- LQL language spec: [docs/specs/lql-spec.md](docs/specs/lql-spec.md) (v0.3)
- Vindex file format: [docs/specs/vindex-format-spec.md](docs/specs/vindex-format-spec.md)
- Operations + patches: [docs/specs/vindex-operations-spec.md](docs/specs/vindex-operations-spec.md)
- Ecosystem (HF publish, Vindexfile): [docs/specs/vindex-ecosystem-spec.md](docs/specs/vindex-ecosystem-spec.md)
- Inference engine internals: [docs/inference-engine.md](docs/inference-engine.md), [docs/ffn-graph-layer.md](docs/ffn-graph-layer.md)
- Trace format (.bin/.bndx/.ctxt): [docs/trace-format-spec.md](docs/trace-format-spec.md), [docs/residual-trace.md](docs/residual-trace.md)
- Trace format (.bin/.bndx/.ctxt): [docs/specs/trace-format-spec.md](docs/specs/trace-format-spec.md), [docs/residual-trace.md](docs/residual-trace.md)
- Experimental work: [experiments/](experiments/) — numbered 01-07, each self-contained
- Python bindings docs: [crates/larql-python/README.md](crates/larql-python/README.md), [docs/larql-python.md](docs/larql-python.md)
8 changes: 8 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
[workspace]
resolver = "2"
members = [
# larql-specific
"crates/larql-models",
"crates/larql-compute",
"crates/larql-core",
Expand All @@ -9,8 +10,12 @@ members = [
"crates/larql-lql",
"crates/larql-cli",
"crates/larql-server",
"crates/larql-router",
"crates/larql-router-protocol",
"crates/larql-python",
"crates/kv-cache-benchmark",
# portable (extract to sibling repos later, names stable)
"crates/model-compute",
]
default-members = [
"crates/larql-models",
Expand All @@ -21,7 +26,10 @@ default-members = [
"crates/larql-lql",
"crates/larql-cli",
"crates/larql-server",
"crates/larql-router",
"crates/larql-router-protocol",
"crates/kv-cache-benchmark",
"crates/model-compute",
]

[workspace.package]
Expand Down
Loading