ChelatedAI is a Python research repository for adaptive retrieval, post-hoc embedding correction, multi-dataset evaluation, and computational-storage experiments.
Primary research path (2026-06): the Liquified Lattice program — self-annealing retrieval pools steerable by quant-like shims, linked as a DAG/GNN evidence graph, with disk-scale precomputed pools as the endgame. Active execution is tracked in docs/ROADMAP_EXECUTION.md (Phase I core queue now; Phase II lattice slices after step 8).
The repo still carries substantial work on road-course tuning, learned gates, Model-Scope steering, computational storage, and agentic remediation. Those tracks remain on the books and are not abandoned; they are sequenced after or alongside the primary lattice milestones as capacity allows. See Research baseline and queued work below.
The codebase spans two connected themes that feed the lattice path:
- improving vector retrieval quality through chelation, sedimentation, distillation, topology analysis, and online correction
- exploring whether parts of model execution can be pushed toward storage-resident node graphs, deterministic transport paths, and multi-drive speculative execution
Note The computational-storage track includes drive-resident graph execution experiments and RP2040 transport tooling. It does not yet prove full on-device LLM inference on physical hard drives or SSDs. The current merged hardware claim is scope-locked to a deterministic transport proof. See docs/computational-storage-transport-scope-decision.md.
Most embedding systems assume the base embedding model is fixed and that retrieval quality is mainly a search-index problem. ChelatedAI treats retrieval failures as a dynamic systems problem:
- detect when a query enters a noisy neighborhood
- rerank or adapt before collapse propagates
- track structural drift over time
- benchmark whether improvements generalize across datasets
- test whether some inference primitives can move closer to storage media
This is the current focus. It unifies retrieval correction, self-healing (SEAL/EGGROLL), Model-Scope steering, shims, and the disk-first endgame into one phased program.
| Phase | Scope | Status (2026-06-06) |
|---|---|---|
| Phase I (steps 1–8) | ML correctness, infra hygiene, Model-Scope shadow pilot, E2E learning loop | Steps 1–6 and 8 largely complete on live branch; step 7 (Model-Scope) in progress |
| Phase I defer | SHIM substrate (production SIP wiring) | Open, env-guarded; resumes after step 8 |
| Phase II (steps 9–17) | Annealing controller, evidence DAG, disintegration loop, drift experiment, GNN, quant shim routing, disk pool slice | Documented; starts after Phase I exit |
Key docs: VISION_LIQUIFIED_LATTICE.md · ROADMAP_EXECUTION.md · CHANGELOG.md
What “liquified lattice” means in practice today
| Lattice piece | Repo surface today | Next milestone |
|---|---|---|
| Crystal pool | vector_store.py, sedimentation, adapters |
Evidence DAG schema (Phase II #12) |
| Laser / refraction | antigravity_engine.py chelation + masks |
Annealing controller (Phase II #11) |
| Annealing | sedimentation, online_updater.py, ES optimizer |
Unified temperature schedule |
| Disintegration | isomer_detector.py, masking |
Drift-triggered prune loop (Phase II #13) |
| Shims (quant-like) | adapters, model_scope_steering.py, chelated_shim_research.py |
Production SHIM DoD (Phase I defer / II #10) |
| Disk pools | computational_storage_poc/block_graph.py |
One pool shard + parity (Phase II #17) |
# Primary-path validation (live branch)
python -m unittest discover -s tests -p "test_*.py" -v
python scripts/check_block_flag.py
python scripts/phase_development_loop.py --onceAll tracks below remain active parts of the portfolio. Primary = lattice program; Queued = tackle on schedule, not dropped.
| Priority | Track | What it covers | Main entrypoints |
|---|---|---|---|
| Primary | Liquified lattice | Self-annealing pools, shims, evidence DAG, disk-scale endgame | VISION_LIQUIFIED_LATTICE.md, self_healing_chelation.py, build_attribution_pool.py, chelated_shim_research.py |
| Queued | Adaptive retrieval | Chelation, sedimentation, adapter-based correction, vector-store integration | antigravity_engine.py, chelation_adapter.py, vector_store.py, config.py |
| Queued | Distillation and correction | Teacher guidance, cross-lingual routing, online updates, schedule tuning | teacher_distillation.py, cross_lingual_distillation.py, teacher_weight_scheduler.py, online_updater.py |
| Queued | Evaluation and reporting | BEIR runs, comparative benchmarks, sweeps, and dashboards | benchmark_beir.py, benchmark_comparative.py, benchmark_multitask.py, run_sweep.py, run_large_sweep.py, dashboard_server.py |
| Queued | Structural analysis | Topology cohesion, isomer drift, embedding quality, stability diagnostics | topology_analyzer.py, isomer_detector.py, embedding_quality.py, stability_tracker.py |
| Queued | Computational storage and drive nodes | Block-graph execution, mock NVMe path, multi-drive array simulation, RP2040 firmware, emulator, host reader, evidence capture | computational_storage_poc/, test_computational_storage_poc.py, test_computational_storage_payload.py, test_computational_storage_emulation.py |
| Queued | Process and remediation | Agentic review workflow, tracker docs, session logs, verification evidence | aep_orchestrator.py, docs/ARCH AGENTIC ENGINEERING AND PLANNING/ |
Windows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -e .macOS / Linux:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .requirements.txt installs the full research stack, including requests, mteb, and scikit-learn. pyproject.toml exposes the installable package metadata and optional dependency groups.
If you want to use the Ollama-backed embedding path:
docker run -d -p 11434:11434 ollama/ollama
docker exec ollama ollama pull nomic-embed-textUse model names like ollama:nomic-embed-text to route through the HTTP embedding backend.
python -m unittest discover -s . -p "test_*.py" -v
python run_live_fire_diagnostics.py --output live_fire_results.json
python run_safety_testbed.py
python run_road_course_campaign.py --task SciFact --max-queries 20 --sample-docs 1200 --output experiment_runs\roadcourse-small\roadcourse_profile_grid.json
python run_road_course_tuning_loop.py --task SciFact --max-queries 100 --sample-docs 1200 --rounds 2 --output experiment_runs\roadcourse-small\scifact_hundred_tuning_loop.json
python run_road_course_tuning_loop.py --task SciFact --max-queries 100 --sample-docs 1200 --rounds 2 --initial-grid modules --output experiment_runs\roadcourse-small\scifact_hundred_module_tuning_loop.json
python run_road_course_tuning_loop.py --task SciFact --max-queries 100 --sample-docs 1200 --rounds 2 --initial-grid calibrated --output experiment_runs\roadcourse-small\scifact_hundred_calibrated_tuning_loop.json
python run_thousand_query_tuning.py --loop-queries 200 --window-queries 50 --sample-docs 400 --output experiment_runs\roadcourse-small\adaptive_thousand_query_tuning.json
python run_thousand_query_tuning.py --phase-queries 5000 --loop-queries 200 --window-queries 50 --sample-docs 250 --output experiment_runs\roadcourse-small\adaptive_fivek_query_tuning.json
python -m unittest test_computational_storage_poc.py -v
python -m unittest test_computational_storage_emulation.py -v
python computational_storage_poc/run_all_tests.py
python computational_storage_poc/emulation/validate_emulation_path.pypython benchmark_beir.py --tier small --output benchmark_beir_small.json
python benchmark_multitask.py --tasks small --epochs 5 --max-queries 100
python dashboard_server.py --port 8080flowchart TD
A[Documents] --> B[Embedding backend]
B --> C[Vector store ingestion]
Q[Query] --> E[AntigravityEngine]
E --> F[Neighborhood retrieval]
F --> G{Variance / structure check}
G -->|Stable| H[Standard ranking]
G -->|Noisy| I[Chelation / reranking]
I --> J[Noise-center logging]
J --> K[Sedimentation or online update]
K --> L[Adapter weights / corrected behavior]
H --> M[Result set]
I --> M
flowchart LR
A[Train or define graph] --> B[Compile matrix blocks]
B --> C[Flash or file-backed payload]
C --> D[Software block-graph validation]
C --> E[Mock NVMe latency model]
C --> F[RP2040 firmware or emulator]
F --> G[Sector 100 payload contract]
G --> H[Host reader / evidence capture]
Progress branch: feat/live-progress-tracker-20260606 · PR #257
| Area | Status | Notes |
|---|---|---|
| ML correctness (InfoNCE, projection, adapter isolation) | Done on branch | Regression tests guard against reversion |
| Sweep / packaging / docs truth | Done on branch | run_large_sweep bounded; pyproject.toml py-modules updated |
| Model-Scope pilot (Phase I #7) | In progress | Runtime, steering, bridge, and provenance paths tested on fixtures |
| E2E learning loop (Phase I #8) | Done on branch | tests/test_learning_loop_e2e.py covers ingest → sedimentation → metric delta |
| SHIM research (Phase I defer) | Partial, env-guarded | chelated_shim_research.py, promoted SIP probe, evidence recorders |
| Phase / BHS loops | Running | scripts/phase_development_loop.py, bash scripts/loop_core_10m.sh |
| Liquified Lattice vision + Phase II plan | Documented | VISION_LIQUIFIED_LATTICE.md, ROADMAP_EXECUTION.md |
Block flag: CLEAR (8 open SHIM carried-debt rows, non-blocking per operator queue). See docs/next-session.md.
Progress log: CHANGELOG.md
| Surface | Purpose |
|---|---|
chelated_shim_research.py |
Env-guarded (CHELATED_SHIM_RESEARCH=1) SIP preflight and promoted registry probe |
shim_node_promoted.py |
Promoted shim registry copy (CHELATED_SHIM_PROMOTED=1) |
scripts/record_shim_*_evidence.py |
Writes dated artifacts/bhs_shim_evidence_*.json from production seams |
scripts/run_five_worker_shim_gate.py |
In-repo five-worker shim gate (SHIM-CD-06 partial) |
scripts/phase_development_loop.py |
CORE-SLICE / SHIM-SLICE orchestrator with artifacts/phase_loop/ state |
reports/ARCH_AEP_REMEDIATION_FINDINGS*.md |
AEP remediation findings and merge-readiness notes |
The sections below describe established results on main and work still on the books. They are not the day-to-day execution queue — that is the Liquified Lattice path above — but they remain valid research context and will be revisited (road-course campaigns, learned gates, RP2040 evidence, etc.) as Phase I/II milestones clear.
- the adaptive retrieval, benchmarking, and distillation surfaces are implemented on
main - the EGGROLL-inspired optimizer, retrieval-fitness gates, adaptive workflow orchestration, and AI-engineering runtime diagnostics are implemented on
main - deterministic live-fire diagnostics validate that engine controls and reporting are wired end-to-end; the tiny fixture is saturated, so proof of chelation lift still requires benchmark campaigns
- the project-car safety testbed now covers instrumentation, component benches, dyno sweeps, non-saturated closed-course loops, calibration profiles, and failure-injection ravine tests
- the first small-model road-course campaign supports a conservative chelation threshold guardrail (
0.01) and rejects always-on chelation for MiniLM/SciFact - module-aware hundred-query loops exercise query reformulation, guard+reformulation, and temperature-centered profiles; they currently preserve baseline or regress, so no module profile is promoted
- calibrated actuator loops now prove query reformulation fusion and chelation percentile masks mechanically affect rankings, but those effects reduce quality on first-hundred SciFact/NFCorpus loops
- an adaptive 1,000-query cycle with 50-query checkpoints found directional FiQA lift for
adaptive_p85_t0.002, but cross-task instability blocks default/profile promotion - adaptive 5,000-query and FiQA-focused confirmation phases found no global winner; the earlier FiQA-like
adaptive_p85_t0.002/adaptive_p85_t0.002_reform_rrf_v2prospect did not survive repeat confirmation, so no route-specific promotion is justified - tuning summaries now include fault classifications (
no_op_tied,actuator_active_positive,actuator_active_negative, andmetric_changed_without_actuator) so future runs can separate safe no-ops, working-but-harmful actuators, and implementation/instrumentation faults - a fault-aware 5,000-query golden-setting search found no default-promotable or golden profile;
adaptive_p99_t0.0015produced large positive SciFact windows but also larger active-negative regressions, confirming the next path is learned/query-conditional gating rather than another global threshold default - a gate-learning 5,000-query campaign now emits
gate_feature_rows,gate_candidate_report, andshippable_gate_candidates; no shippable diagnostic gate was found, and the result points to a supervised gate trained on held-out windows rather than another hand-written threshold - conservative learned-gate tooling is now implemented:
chelatedai-train-gatetrains holdout-validated gate artifacts andrun_thousand_query_tuning.py --strategy learned_gate --gate-artifact ...consumes them; the first trained artifact rejected all 140 candidate rules, so it correctly fails closed instead of promoting an unsafe actuator - two alternative validation tracks are now implemented: tuning artifacts emit
query_attribution_rowsfor per-query actuator/gate learning, andchelatedai-synthetic-collapseprovides a deterministic semantic-collapse fixture where masking the known noisy dimension recovers NDCG/MRR/Recall from 0.0 to 1.0 - all six follow-up research pathways now have working surfaces: query attribution, synthetic collapse, learned mask smoke, selective reformulation, benchmark-family meta-analysis, and candidate-profile proposals; the first 200-query SciFact meta probe still finds no golden setting, but it identifies always-on
reform_rrf_v2as the only retest candidate while treating chelation profiles as training data only - follow-on reformulation-policy and static-mask probes did not produce a new candidate: reformulation policies were neutral/negative across the next 100-query search, and supervised static masks showed train-slice hints but hurt held-out SciFact retrieval
- conditional static-mask gates can reduce damage but are not stable enough yet: the recurring low-stopword gate tied or slightly improved holdout in some compact probes, but one repeat regressed and no run crossed the promotion threshold
- regularized conditional static-mask gates now require an internal train/validation split before holdout application; compact repeats produced one small holdout lift (+0.0014), one tie, and one fail-closed run, so this remains a weak research lead rather than a shippable setting
- classifier-gated conditional masks are now implemented with logistic scoring, internal validation, and a minimum-positive-example floor; 50 compact SciFact loops found no lift, and the safer floor failed closed on all seeds, so this branch is rejected as a current candidate but retained as guarded research tooling
- the remaining non-hardware work is broader road-course campaign execution and evidence review before any aggressive profile promotion, not missing feature delivery
- the computational-storage follow-through is narrowed to real RP2040 evidence capture and a dated retention review
- the repository includes credible storage-node experiments, but not a shipped hard-drive-hosted LLM runtime
| Area | Status | When / how it returns |
|---|---|---|
| Road-course profile promotion | No global golden setting yet; learned/query-conditional gating is the lead | Ongoing campaigns; Phase II drift experiment (#14) |
| Learned gates and static masks | Tooling exists; first artifacts fail closed or hurt holdout | Attribution pool → evidence DAG (Phase II #12) |
| SEAL/EGGROLL self-healing depth | Advisory + sandbox; cloned-adapter execution pending | Phase II annealing controller (#11) + seal-eggroll doc |
| Computational storage / RP2040 | Software transport proof strong; physical evidence capture pending | Phase II disk pool slice (#17); storage track |
| Disk-first CPU program | Architecture docs exist; not fully reflected in runtime | After evidence DAG + pool shard milestones |
| Agentic remediation (AEP/BHS) | Active process layer | Continuous; see docs/next-session.md |
For the current live-fire validation plan, see docs/live-fire-diagnostics-2026-04-27.md. For the safety testbed road-course gates, see docs/safety-testbed-road-course-plan.md. For the first small-model road-course result, see docs/road-course-results-2026-04-27.md. For the earlier post-feature evaluation plan, see docs/roadmap-audit-and-weight-refinement-plan-2026-03-06.md. Full track inventory: docs/RESEARCH_TRACKS.md.
antigravity_engine.py: central engine for ingestion, inference, adaptive chelation, logging, and training hooksembedding_backend.py: routes embeddings to Ollama or local SentenceTransformersvector_store.py: Qdrant abstraction used by the retrieval enginechelation_adapter.py: near-identity adapter variants for post-hoc correctionconfig.py: presets and validation for retrieval, distillation, online updates, topology, and BEIR
teacher_distillation.py: offline, hybrid, and teacher-guided correction helperscross_lingual_distillation.py: language-aware teacher routingonline_updater.py: inference-time update mechanisms and diagnosticsself_healing_chelation.py: SEAL/EGGROLL-inspired self-edit planning for advisory adapter-only repair directivestopology_analyzer.pyandisomer_detector.py: structural drift analysisstability_tracker.py,embedding_quality.py,convergence_monitor.py: health and learning diagnostics
benchmark_beir.py,benchmark_multitask.py,benchmark_comparative.py,benchmark_distillation.py: retrieval-quality evaluationrun_sweep.pyandrun_large_sweep.py: grid-search style parameter studiesrun_live_fire_diagnostics.py: deterministic live-fire harness for engine controls, telemetry, gates, and reportingrun_safety_testbed.py: staged safety testbed for non-saturated closed-course loops, calibration profiles, failure gates, and road-course campaign planningrun_road_course_campaign.py: small-model road-course profile grid for threshold/default decisionsrun_road_course_tuning_loop.py: iterative first-hundred-query profile tuning loop with adaptive and module-aware next-grid selectionrun_thousand_query_tuning.py: five-loop adaptive 1,000-query road-course cycle with 50-query validation windowsdashboard_server.pyanddashboard/index.html: local research dashboard
computational_storage_poc/block_graph.py: flash-friendly block packing and traversalcomputational_storage_poc/mock_nvme.py: software parity and latency model for computational-storage readscomputational_storage_poc/mock_array.py: speculative multipath racing across storage nodescomputational_storage_poc/payload_contract.py: deterministic trigger-sector payload used by firmware and emulatorcomputational_storage_poc/usb_host_inference.py: host-side raw-sector readercomputational_storage_poc/capture_hardware_evidence.py: auditable RP2040 evidence capture toolcomputational_storage_poc/firmware/: RP2040/TinyUSB transport firmwarecomputational_storage_poc/emulation/: dependency-light emulator validation path
GitHub Actions currently verifies:
- Python linting with
ruff - full
unittestdiscovery across Python 3.9, 3.10, 3.11, and 3.12 - computational-storage fundamentals and the script harness
- computational-storage emulation validation
- RP2040 firmware build and artifact upload
See .github/workflows/test.yml and .github/workflows/build_firmware.yml.
Start here:
- docs/README.md: canonical docs home and legacy-to-canonical map
- docs/VISION_LIQUIFIED_LATTICE.md: north-star architecture (self-annealing lattice, shims, disk pools)
- docs/ROADMAP_EXECUTION.md: Phase I + Phase II execution queue
- docs/SYSTEM_BLUEPRINT.md: architecture, stack, and information flows
- docs/MODULE_GUIDE.md: module-by-module inventory
- docs/RESEARCH_TRACKS.md: active and historical research tracks
- docs/COMPUTATIONAL_STORAGE_DRIVE_NODES.md: hard-drive / storage-node research summary
- docs/INDEX.md: broader index, including the AEP process archive
- compare standard vs. chelated ranking behavior
- run cross-dataset BEIR evaluations
- refine adapter schedules and teacher weights
- test whether block-graph traversal can remain correct when moved toward storage media
- compare host-driven vs. storage-driven latency models
- validate deterministic firmware or emulator transport surfaces
- use the canonical docs set first
- fall back to the AEP archive for process evidence, session logs, and prior decisions
This repository is distributed under the MIT license. See LICENSE.