Skip to content

dancinlab/hexa-codex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

468 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hexa-codex

📜 hexa-codex

HEXA-Codex family — codified theorems · AI knowledge substrate · 17 verbs · 4 groups

License DOI Spec Verbs Verify Falsifiers lm_foundry Family


📜 hexa-codex — AI knowledge substrate (HEXA family)

17-verb AI knowledge substrate organized in 4 groups: safety + economics

  • ops + substrate. A library-style (codex) spec catalog — each verb ships a closed-form candidate spec + falsifier preregister, extracted from canon (domains/cognitive/) on 2026-05-06.

+ lm_foundry/ — the domain-LLM training pipeline, absorbed from the standalone hexa-forge repo on 2026-05-13. Where the 17 verbs are spec library, lm_foundry/ is trained models + runtime — a code-LLM for hexa-lang at 94.29% Mk.I strict (r39 GA, frozen) wrapped by a v0.5.x orchestration runtime (r44–r62) that ships pre-7B classifier routing, real 3-vendor SDKs, persistent cache, multi-turn memory, production observability, and SQLite WAL multi-process safety. See lm_foundry/README.md and ARCHITECTURE.json ("lm_foundry/" node).


Why hexa-codex?

hexa-codex is a standalone AI knowledge substrate — a codex (library) of AI-domain specs that the rest of the dancinlab stack imports declaratively. Each verb is a single closed-form spec markdown extracted unchanged from canon/domains/cognitive/, organized into four orthogonal groups so that consumers can navigate by concern.

The codex framing matters because:

  • Spec-first. Each verb is a written candidate + falsifier preregister before any sandbox is wired. Consumers read the codex; they do not run it.
  • Group-orthogonal. SAFETY, ECONOMICS, OPS, and SUBSTRATE are concerns every AI deployment crosses — but the four sets carry different falsifier classes (interp probes / cost-curve fits / SLO checks / capability evals).
  • Sister to hexa-bio. Where hexa-bio curates 4 molecular verbs (write-side wet/dry sandbox), hexa-codex curates 17 cognitive verbs (write-side AI spec library) — same HEXA-family pattern, different domain.

Verbs — 17 specs across 4 groups (6 + 3 + 4 + 4 = 17)

Each verb ships as a single .md spec under a group-named directory, extracted from canon@c0f1f570:domains/cognitive/ on 2026-05-06. Read the spec; the codex does not run these verbs — write-side sandbox wiring is per-verb future work (see release ladder). Every spec is a preregistered hypothesis, not a validated capability claim.

SAFETY (6)

Domain SSOT: ARCHITECTURE.json → "SAFETY group" node · history → CHANGELOG.jsonl (was SAFETY.md/.log.md)

Verb Spec
alignment alignment/ai-alignment.md — HELM-12-axis alignment-score aggregator (scaling-falsifier)
safety safety/ai-safety.md — refusal-matrix + capability-gate spec
welfare welfare/ai-welfare.md — model-welfare probe protocol
adversarial adversarial/ai-adversarial.md — red-team failure-mode taxonomy
consciousness consciousness/ai-consciousness.md — IIT × GWT probe (BT-19 falsifier-in-action, see below)
interpret interpret/ai-interpretability.md — SAE motif count = 10 (scaling-falsifier)

ECONOMICS (3)

Domain SSOT: ARCHITECTURE.json → "ECONOMICS group" node · history → CHANGELOG.jsonl (was ECONOMICS.md/.log.md)

Verb Spec
train_cost train_cost/ai-training-cost.md — Chinchilla-fit N^24 scaling (scaling-falsifier)
infer_cost infer_cost/ai-inference-cost.md — context^τ = context^4 (scaling-falsifier)
quality_scale quality_scale/ai-quality-scale.md — HumanEval+/hexa-eval aggregate

OPS (4)

Domain SSOT: ARCHITECTURE.json → "OPS group" node · history → CHANGELOG.jsonl (was OPS.md/.log.md)

Verb Spec
deploy deploy/ai-deployment.md — hardware-tier deployment recipes
enterprise enterprise/ai-enterprise-custom.md — enterprise customisation envelope
agent_serving agent_serving/ai-agent-serving.md — tool-use SLO + schema
eval eval/ai-eval-pipeline.md — Mk handoff eval template

SUBSTRATE (4)

Domain SSOT: ARCHITECTURE.json → "SUBSTRATE group" node · history → CHANGELOG.jsonl (was SUBSTRATE.md/.log.md)

Verb Spec
multimodal multimodal/ai-multimodal.md — multimodal fusion spec
rlhf rlhf/youth-ai-labeling-rlhf-hub.md — DPO/RLHF labelling hub
cog_arch cog_arch/cognitive-architecture.md — cognitive architecture envelope
causal causal/causal-chain.md — causal-chain reasoning spec

theoretical preregisters, not empirically verified. External AI labs (OpenAI / Anthropic / DeepMind) publish their own benchmarks with their own metrics — those external evaluations do not use this codex's scaling framing, and this codex makes no claim that they should. The T1+T2+T3 runnable surface verifies internal closed-form algebraic floors; T4 per-verb empirical landing is deferred to release ladder v1.1.0..v2.0.0.


lm_foundry/ — domain-LLM foundry (absorbed from hexa-forge, 2026-05-13)

The 17 verbs above are spec library (read, don't run). lm_foundry/ is the opposite: a working model-training pipeline for domain-specialised LLMs. It was the standalone hexa-forge repo (retired 2026-05-13); hexa-codex was always its sister (serving / inference side) — the merge consolidates the two.

verb what status (2026-05-14, v0.5.14 / r62)
code programming-only LLM for hexa-lang GA at 94.29% Mk.I strict (627/665), 96% 5-NL — r39 v3-t3patch adapter, unchanged since GA mark. Path: Qwen2.5-Coder-7B + LoRA r=64 SFT (r1–r34) → Phase-A manifest fixes (r33/r37/r38) → compile-feedback RL via GRPO (Lever 4 — T4 enum 55→100%) → T3 quote-fragility patch (r39, T3 58.8→100%). v0.4.x in-weight delegation disproved (r40–r43.1, 5 distinct failure modes); routing moved OUT of model weights to a deterministic pre-7B classifier + per-vendor tier selector + real 3-vendor SDKs + per-prompt cache + multi-turn memory + production observability. v0.5.x orchestration line (r44–r62) ships the production stack: DLG-mk0 classifier 0.9833 / tier_match 1.000 / Brier 0.0242 EXCELLENT / ECE 0.0461 GOOD on 300-task held-out manifest. See ARCHITECTURE.json ("ORCHESTRATION runtime" node).
bio HEXA-BIO domain LLM (seq + prose) recipe spec landed; training pending. Paired with dancinlab/hexa-bio.
  • Knowledge SSOTs: ARCHITECTURE.json → "code-LLM learning surface"
    • "bio-LLM scaffold" nodes (was LEARNING_PROGRAMMING.md · LEARNING_BIO.md).
  • Round-by-round chronicle → CHANGELOG.jsonl + git history (was LEARNING_PROGRAMMING.log.md specialist r1–r39 · ORCHESTRATION.log.md runtime r40–r72).
  • Runtime spec: ARCHITECTURE.json "ORCHESTRATION runtime" node — canonical runtime spec (was ORCHESTRATION.md, 15 sections).
  • Design docs: lm_foundry/papers/ (incl. spec-lever4-compile-rl.md, spec-delegation-v0.4.0.md OBSOLETE §4/§10).
  • HF artifacts: 42 repos under dancinlab/hexa-forge-* (prefix kept as artifact identity). GA adapter (unchanged): dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.0-rl-t4-v3-t3patch (r39). v0.5.x is software-only — no new HF model artifacts (orchestration lives in tool/, not in weights).
  • bench-cold/, runs/, logs/ under lm_foundry/ are gitignored (SoT for benches is HF dancinlab/hexa-forge-bench-cold-v0.1.3).

See lm_foundry/README.md for the full layout and operating notes (Vast.ai is the primary GPU platform after RunPod's 2026-05-12 incident).


Falsifier preregister

[.roadmap.hexa_codex §A.4](. roadmap.hexa_codex) prereregisters four falsifiers; each one's arithmetic floor is checked at v1.0 by release ladder.

Tag Claim Arithmetic Empirical
scaling-falsifier training_cost ∝ N^24 (Chinchilla-fit) PASS PENDING (v1.2.0)
scaling-falsifier inference_cost ∝ context^τ = context^4 (Claude 4.7 1M) PASS PENDING (v1.2.0)
scaling-falsifier alignment_score = mean over 12 axes (HELM-comparable) PASS PENDING (v1.1.0)
scaling-falsifier interpret_motifs = 12 − 2 = 10 (Anthropic dict-l.) PASS PENDING (v1.1.0+)
hexa-codex calc train_cost --N 7e9 --D 1.4e12   # scaling-falsifier closed form
hexa-codex calc infer_cost --context 1000000    # scaling-falsifier (1M ctx)
hexa-codex calc alignment --helpfulness 0.85    # scaling-falsifier axis aggregator
hexa-codex calc interpret --observed-motifs 9   # scaling-falsifier motif counter

Release ladder

Per [.roadmap.hexa_codex §A.2](. roadmap.hexa_codex), strict monotone in verbs-wired and eval-pipeline count. Verified by verify/release_ladder.py (7/7 PASS).

Version Date Status Group focus wired evals Empirical falsifier
v1.0.0 2026-05 RELEASED (seed) 0 0 (arithmetic floor only)
v1.1.0 2026-08 TARGET safety 2 1 scaling-falsifier
v1.2.0 2026-10 PLANNED economics 5 2 scaling-falsifier
v1.3.0 2026-12 PLANNED ops 9 3 scaling-falsifier
v2.0.0 2027-Q2 ASPIRATIONAL substrate 17 4 scaling-falsifier
hexa-codex verify release         # ladder monotonicity audit
python3 verify/release_params.py  # full per-version parameter table

Verify

verify/run_all.hexa is the canonical .hexa orchestrator (sister of hexa-rtsc / hexa-cern / hexa-fusion / hexa-ufo / hexa-chip / hexa-antimatter run_all.hexa patterns). It runs 42 green-core verify subscripts and emits __HEXA_CODEX_RUN_ALL__ PASS — 42/42 green on success.

HEXA_CODEX_ROOT=$(pwd) hexa run verify/run_all.hexa     # 42/42 expected

Green-core inventory (42 subscripts, all PASS)

Tier Count Scripts
T4 PENDING stubs 11 numerics_*_t4_parity × 11 (train_cost, infer_cost, alignment, interpret, safety, adversarial, quality_scale, rlhf, eval, agent_serving, deploy) — emit PENDING per D-023
inventory 1 cross_doc_audit
group ladder reports 1 report_economics_ladder

Honesty — no falsifier-tripped scripts, no silenced FAILs

Unlike hexa-chip (4 falsifier-tripped scripts kept on disk as honest signal of post-GAA flattening / Moore retraction / HBM4 spec drift), hexa-codex's surface is currently all-green: every scaling-falsifier..4 pillar carries T1 + T2 ×3 closed-form arithmetic + numerics + solver + parity layers; the 11 numerics_*_t4_parity stubs emit a PENDING sentinel (not a fake PASS) until external hexa-forge data lands per plan-decisions-pending.md D-023.

these specs are theoretical preregisters, not empirically verified. External AI lab benchmarks (OpenAI / Anthropic / DeepMind published evals — HELM, MMLU, GSM8K, HumanEval, SAE motif counts) use their own metrics. The codex makes no claim that those external evals are settled; its runnable surface verifies internal closed-form algebraic floors only, and per-verb T4 empirical landings sit at recipe §9 and land per the release ladder v1.1.0..v2.0.0.

Closed-form algebra alone is not sufficient verification — the numerics_* tier carries real-limits anchors (PAC sample complexity, Kolmogorov K(program) lower bound, Rice's theorem undecidability of semantic equivalence — see ARCHITECTURE.json "LIMIT_BREAKTHROUGH" node).

Bookkeeping closure verdict

  • 100 % bookkeeping closure within the green-core (42/42 PASS).
  • NOT AI safety / economics / capability settled — scaling-falsifier..4 remain at "arithmetic floor closed, empirical T4 PENDING per release ladder"; the 11 T4 stubs are honestly PENDING.
  • Saturated ≠ falsified ≠ confirmed. 100 % closure here means closed-form + numerics-T2 + published-ref parity layers are regression-locked at the code layer for future bench comparison; it does not mean Chinchilla scaling, HELM-Core 12-axis alignment, Anthropic SAE motif counts, or any external eval are settled.

Runnable surface

The runnable surface follows the runnable_surface_recipe.md closure-depth pattern. Every prediction the codex ships is paired with at least one runnable verifier, and the surface is closed when each scaling-falsifier falsifier carries T1 (algebraic) + T2 ×3 (numerical / published-ref / ODE solver) layers — recipe §7.2 sat-1 saturation.

Status: 100% closure reached. Under recipe §3 (T1 = calc_*, T2 = numerics_*numerics_*_solver, T3 = numerics_*_parity), every scaling-falsifier..4 carries T1 ✓ + T2 ✓ + T3 ✓ ⇒ closure_pct = 3/3 = 100%, and quality_scale carries the same T1+T2+T3 ladder as the first non-falsifier ECONOMICS verb. Plus 7 cross-cutters, 1 group ladder report, and 3 meta verifiers. Total 31 runnable verify scripts + 33 companion regression tests. verify/saturation_check.hexa emits the recipe §7.3 self-stop sentinel __HEXA_CODEX_RSC_SATURATED__ STOP.

verify/ — 23 .hexa-native verifiers (math_pure, no deps)

All scripts use self/runtime/math_pure (no external Python / float libraries). Each emits a __HEXA_CODEX_<NAME>__ PASS sentinel; the top-level aggregator polls sentinels and exits 0 iff every layer is green.

Per-pillar tier stack (4 × 4 = 16 files, recipe §3 taxonomy):

Pillar T1 — calc T2 — numerics T2 — solver T3 — parity

T2 (numerics + solver) re-derives the prediction from the closed form itself: numerics_* exercises the closed form on a synthetic anchor grid; numerics_*_solver integrates the underlying ODE (Euler / midpoint-RK2 / RK4 cascade for pillars 1, 2, 4; symplectic leapfrog/Verlet harmonic oscillator for pillar 3) and verifies convergence orders 1 / 2 / 4 by step-halving.

T3 (parity) is the archival empirical contact: it ties the prediction to external published numbers (Chinchilla / GPT-3 / Llama-2 / PaLM for cost; HELM-Core for alignment; Olsson / Cunningham / Bricken / Anthropic-2024 SAE motif counts for interpret).

A failure in any T2 file alone is a closed-form bug; a failure in any T3 file alone is an empirical-contact drift. Both classes are caught by independent layers, which is what closure_pct = 100% (3/3 tiers) buys.

Cross-cutters (7 files):

Verifier What it checks
cross_doc_audit.hexa Taxonomy + falsifier-prefix + provenance consistency across docs
numerics_economics_scaling_laws.hexa ECONOMICS scaling-law sweep (q/train/infer halving·doubling·4×, cost/quality ratio)
numerics_economics_pareto.hexa ECONOMICS Pareto envelope (iso-loss · iso-cost · Lagrangian optimum · Chinchilla allocation)

Group ladder reports (1 file):

Verifier What it does
report_economics_ladder.hexa ECONOMICS group recipe §3 ladder — per-verb T1+T2+T3 table + X-ECON + T4-stub rows

Meta (3 files):

Verifier What it does
lint_numerics.hexa Recipe §4 invariants 1-5 over every numerics_*.hexa
saturation_check.hexa Aggregate self-stop signal — re-runs 6 closure components
hexa-codex verify all                              # full sweep, sat-1 verdict
hexa-codex verify saturation-check                 # one-shot sat-1 marker
hexa-codex verify falsifier-check                  # closure tracker
hexa-codex verify lint-numerics                    # recipe §4 invariants
hexa-codex verify numerics-train_cost-solver       # one specific layer
RESOURCE_LOCAL_HEXA=1 hexa run verify/saturation_check.hexa
# → __HEXA_CODEX_SATURATION_CHECK__ PASS  (when at sat-1)

Each script also runs standalone: RESOURCE_LOCAL_HEXA=1 hexa run verify/<name>.hexa. The RESOURCE_LOCAL_HEXA=1 env routes the local interpreter (~/.hx/packages/hexa/hexa.real) instead of the hexa-r ubu-1 remote-routing wrapper that ships with the resource toolkit.

tests/ — 24 .hexa regression wrappers + 83 pytest auto

Each verify/*.hexa script has a companion tests/test_*.hexa wrapper that re-runs the verifier, greps the sentinel, and exits 0/1. tests/test_all.hexa aggregates all 24 wrappers; the legacy 83 pytest auto-cases continue to cover the spec / inventory / group surface.

RESOURCE_LOCAL_HEXA=1 HEXA_CODEX_ROOT="$PWD" \
    ~/.hx/packages/hexa/hexa.real run tests/test_all.hexa   # 24/24 PASS
python3 -m pytest tests/ -m auto                            # 83 PASS

cli/hexa-codex.hexa — extended subcommands

hexa-codex verify [target]       # any .hexa verifier; e.g. saturation-check, falsifier-check
hexa-codex calc <metric>         # train_cost / infer_cost / alignment / interpret / quality_scale
hexa-codex inventory             # 17-verb spec presence + canonical-header audit
hexa-codex test [mark]           # pytest tests/ -m {auto|hexa}
hexa-codex status                # one-shot health JSON

Reference annexes

consciousness deep-dive (BT-19 falsifier-in-action)

File Concern
consciousness/measurement-protocol.md BT-19 α_IIT·α_GWT=1 reproducible EEG/fMRI protocol (PAPER-P8-2)
consciousness/red-team-failure.md BT-19 red-team refutation — verdict MISS, [7?] CONJECTURE → [5] downgrade

These 2 files demonstrate the falsifier-preregister discipline at work: a CONJECTURE was preregistered, independently red-teamed, and downgraded. This is the reason hexa-codex calls itself a falsifier-preregister library, not just a spec catalog.


Status

**SPEC_CATALOG + RUNNABLE_SURFACE at 100% closure (recipe §7.2 sat-1).

  • lm_foundry/ — code-LLM at 94.29% Mk.I strict (r39 GA, frozen) + v0.5.x orchestration runtime (r44–r62) production-ready.**

17-verb AI 지식 substrate (4 그룹: safety + economics + ops + substrate)

  • verify/ + tests/ + build/ + docs/ runnable surface
  • lm_foundry/ (hexa-forge 흡수, 2026-05-13 — 도메인 LLM 학습 파이프라인 + 런타임; code-LLM GA 94.29% Mk.I strict r39 frozen + v0.5.x 오케스트레이션 런타임 r44–r62 production-ready, bio-LLM 레시피). Recipe §7.2 sat-1 saturation reached — all 4 scaling-falsifier..4 closed at recipe §3 closure_pct = 100% (T1 + T2 + T3 ✓ each), via 23 .hexa verifiers + 24 regression wrappers + 3 meta verifiers. T4 (live hardware / Stage-1+) is recipe §9 territory and out of loop scope.

Translation: this repo is (1) a library of AI specs and (2) a runnable verification surface at recipe §7.2 sat-1 = 100% closure under the §3 ladder. The cli/hexa-codex.hexa dispatcher routes both — verb spec reads + .hexa-native verifiers / calculators / tests (legacy Python verify/ kept as a parallel CI path). The heavy-lift per-verb T4 live-hardware / Stage-1+ pipelines (live FLOP/loss measurements, KV-cache profiles, HELM-Core composites, SAE feature counts) sit in recipe §9 territory and land per the release ladder v1.1.0..v2.0.0.

What works at 100% closure (sat-1):

  • 17 verb specs land on disk under their group-named directories.
  • hexa-codex list prints the full 4-group table.
  • hexa-codex <verb> prints the spec path + first 20 lines.
  • hexa-codex selftest confirms 17/17 spec presence.
  • hexa-codex verify saturation-check re-runs the 6 closure components and emits the canonical recipe §7.3 self-stop sentinel __HEXA_CODEX_RSC_SATURATED__ STOP plus the sat-1 marker __HEXA_CODEX_SATURATION_CHECK__ PASS.
  • hexa-codex verify falsifier-check runs the closure tracker — per-pillar T1/T2/T3 tier presence, cross-cutter row, recipe §3 closure_pct = 100% verdict.
  • hexa-codex verify <pillar>-<layer> runs any single layer (e.g. numerics-train_cost-solver).
  • make -C build sat1 is the friendly CI gate.
  • make -C build everything = ci (Python legacy) + 24-wrapper .hexa regression + sat-1 closure + selftest.

What is out of scope at 100% closure (sat-1):

  • Per-verb T4 live-hardware / Stage-1+ pipelines (recipe §9 — out of loop scope; closure_pct already at 100% on the §3 T1/T2/T3 ladder).
  • Model training, inference SaaS, or RLHF labeling production pipeline.
  • Any regulatory, alignment, or capability claim — these specs are preregistered hypotheses, not validated results.

Install

# 1. Install hexa-lang (gives you `hexa` + `hx` package manager)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/dancinlab/hexa-lang/main/install.sh)"

# 2. Install hexa-codex
hx install hexa-codex

Run

hexa-codex list                    # 17-verb table grouped by 4 groups
hexa-codex selftest                # 17-verb spec presence sweep
hexa-codex verify [check]          # unified verifier dispatcher (cross-doc/train_cost/infer_cost/inventory/group/release/falsifiers/reference/all)
hexa-codex inventory               # 17-verb spec inventory + canonical-header audit
hexa-codex calc <metric>           # scaling-falsifier..4 calculators (train_cost/infer_cost/alignment/interpret/quality_scale)
hexa-codex test [mark]             # pytest tests/ (auto|hexa)
hexa-codex status                  # one-shot verifier health summary
hexa-codex <verb>                  # read a verb spec (alignment/safety/welfare/.../causal — see `list`)
hexa-codex version                 # print version
hexa-codex help                    # full --help (subcommands + flags + env)

Cross-link

Sister repos in the dancinlab HEXA family:

Cognitive substrate rollups (sister-libraries)

  • 👁️ dancinlab/hexa-senses5-verb sensory substrate (dream + ear + empath + olfact + voice). voice is formulaic-only, learned TTS FORBIDDEN.
  • 🧠 dancinlab/hexa-mind7-verb mental substrate (mind + neuro + oracle + hexa_telepathy + telepathy + mind_upload + superpowers). 4/7 SPECULATIVE (preregister honesty).

Domain-specific siblings

  • 👻 dancinlab/anima — consciousness / soul cousin (phenomenal grounding adjacent to consciousness).
  • 🧬 dancinlab/hexa-brain — BCI sister (read-side neural substrate counterpart).
  • ⚖️ dancinlab/honesty-monitor — AI honesty-bit falsifier sister (write-side validator for the SAFETY group).
  • 🌱 dancinlab/hexa-bio — 4-verb molecular toolkit (same HEXA-family pattern, biology domain).
  • 🔨 lm_foundry/ (in this repo) — domain-LLM training pipeline, absorbed from the retired hexa-forge repo on 2026-05-13. hexa-codex was forge's sister (serving side); now one repo. See the lm_foundry/ section above.

The 17 + 5 + 7 = 29 verbs across cognitive sister-libraries all share the same HEXA-family group taxonomy. hexa-codex covers AI knowledge; hexa-senses covers AI senses; hexa-mind covers AI mental ops.

Upstream concept SSOT: canon/domains/cognitive/ (declarative sources for all 17 hexa-codex verbs + 5 hexa-senses verbs + 7 hexa-mind verbs).


Repo layout

hexa-codex/
├── README.md                  this file
├── LICENSE                    MIT
├── AGENTS.tape                identity + governance (.tape v1.2)
├── CLAUDE.md                  symlink → AGENTS.tape
├── hexa.toml                  project metadata
├── install.hexa               hx install entry
├── cli/                       hexa-codex dispatcher (.hexa)
│   SAFETY group (6 verbs):
├── alignment/                 HELM-12-axis alignment-score aggregator   (scaling-falsifier)
├── safety/                    refusal-matrix + capability-gate spec
├── welfare/                   model-welfare probe protocol
├── adversarial/               red-team failure-mode taxonomy
├── consciousness/             IIT × GWT probe (BT-19 falsifier-in-action)
├── interpret/                 SAE motif count = 10               (scaling-falsifier)
│   ECONOMICS group (3 verbs):
├── train_cost/                Chinchilla-fit N^24 scaling              (scaling-falsifier)
├── infer_cost/                context^τ = context^4                    (scaling-falsifier)
├── quality_scale/             HumanEval+/hexa-eval aggregate
│   OPS group (4 verbs):
├── deploy/                    hardware-tier deployment recipes
├── enterprise/                enterprise customisation envelope
├── agent_serving/             tool-use SLO + schema
├── eval/                      Mk handoff eval template
│   SUBSTRATE group (4 verbs):
├── multimodal/                multimodal fusion spec
├── rlhf/                      DPO/RLHF labelling hub
├── cog_arch/                  cognitive architecture envelope
├── causal/                    causal-chain reasoning spec
├── lm_foundry/                domain-LLM training pipeline (absorbed from hexa-forge, 2026-05-13)
├── papers/                     consciousness deep-dive annexes (BT-19)
├── verify/                    34 .hexa-native verifiers (math_pure)
├── tests/                     24 .hexa regression wrappers + 83 pytest
├── build/                     pandoc + xelatex PDF rebuild
├── temporal-architecture/     research-tier modules
├── reality-map/               canon meta-grid
├── experiments/               sandbox runs (gitignored heavy outputs)
└── CHANGELOG.jsonl               change log

License

MIT. See LICENSE.