Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
blank_issues_enabled: true
contact_links:
- name: "📖 Architecture Reference"
url: https://github.com/devlux76/cortex/blob/main/DESIGN.md
about: Review DESIGN.md before proposing architectural changes.
url: https://github.com/devlux76/cortex/wiki
about: Review the architecture wiki before proposing architectural changes.
5 changes: 3 additions & 2 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,12 @@ The engine models three biological brain regions:
| File | Purpose |
|---|---|
| `README.md` | Product vision and quick start |
| `DESIGN.md` | Complete architecture specification and design principles |
| **CORTEX Wiki** | Canonical architecture specification and design principles |
| `DESIGN.md` | Repo landing page / TOC into the wiki |
| `PLAN.md` | Module-by-module implementation status and development phases |
| `TODO.md` | Prioritized actionable tasks to ship v1.0 |

Keep `DESIGN.md` synchronized with the real code state after every implementation pass.
Keep the wiki and `DESIGN.md` synchronized with the real code state after every implementation pass.

## Project Management

Expand Down
920 changes: 23 additions & 897 deletions DESIGN.md

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,8 @@ bun run dev:harness # start the browser runtime harness at http://127.0.0.1:4173

| Document | Purpose |
|---|---|
| [`DESIGN.md`](DESIGN.md) | Architecture specification and core design principles |
| [CORTEX Wiki](https://github.com/devlux76/cortex/wiki) | Canonical design documentation (architecture, algorithms, and math). |
| [`DESIGN.md`](DESIGN.md) | Static repo landing page / TOC into the wiki |
| [`PLAN.md`](PLAN.md) | Module-by-module implementation status and development phases |
| [`TODO.md`](TODO.md) | Prioritized actionable tasks to ship v1.0 |
| [`docs/api.md`](docs/api.md) | API reference for developers integrating with CORTEX |
Expand Down
9 changes: 5 additions & 4 deletions docs/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,10 +170,11 @@ This command will fail the build if it detects numeric literals that are likely

At the end of every implementation pass, update documents in this order:

1. **`DESIGN.md`** — update if architecture changes.
2. **`README.md`** — confirm the project description still reflects reality.
3. **`docs/api.md`** — update if new public APIs are added or existing ones change.
4. **GitHub Issues** — close completed tasks, create new ones as needed via `gh` CLI or the web UI.
1. **`DESIGN.md`** — update if the design landing/TOC changes.
2. **Wiki** — update the relevant wiki page(s) for any architecture or algorithm changes.
3. **`README.md`** — confirm the project description still reflects reality.
4. **`docs/api.md`** — update if new public APIs are added or existing ones change.
5. **GitHub Issues** — close completed tasks, create new ones as needed via `gh` CLI or the web UI.

> Numeric examples in design docs are illustrative unless explicitly sourced from model metadata.

Expand Down
30 changes: 30 additions & 0 deletions docs/wiki-draft/Architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Architecture Overview

This page describes the high-level architecture of CORTEX and the major subsystems.

## The Three Living Regions

CORTEX models three biological brain regions working in concert:

- **Hippocampus** — Fast associative encoding and incremental prototype construction.
- **Cortex** — Intelligent routing, dialectical retrieval, and coherence.
- **Daydreamer** — Background consolidation and maintenance.

Each region is responsible for a distinct phase of the memory lifecycle. Together they form a pipeline from ingestion through retrieval.

## Core Concepts

### Medoid vs. Centroid vs. Metroid

- **Medoid** — An actual memory node (page) selected as the representative of a cluster.
- **Centroid** — A computed geometric average (never stored as a real node).
- **Metroid** — A transient, structured dialectical search probe (`{ m1, m2, c }`) used at query time.

### How the subsystems interact

1. **Ingestion:** Hippocampus embeds content and creates/update prototypes.
2. **Retrieval:** Cortex constructs Metroids and performs dialectical search for coherent context.
3. **Consolidation:** Daydreamer updates prototypes, prunes edges, and maintains stability.


> For the full algorithmic detail, see **Retrieval & Metroid Algorithm**.
14 changes: 14 additions & 0 deletions docs/wiki-draft/Consolidation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Consolidation (Daydreamer)

This page covers background consolidation and maintenance mechanisms.

## Daydreamer Responsibilities

- **Long-term potentiation (LTP):** strengthen important connections.
- **Long-term depression (LTD):** decay and prune weak edges.
- **Medoid/centroid recomputation:** keep prototypes coherent as the graph evolves.
- **Experience replay:** rehearse recent data in background when idle.

## Stability & Throttling

The Daydreamer is designed to run opportunistically without blocking foreground query performance. Its work is throttled and batch-sized according to the current memory graph complexity.
38 changes: 38 additions & 0 deletions docs/wiki-draft/Home.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# CORTEX Design Wiki

This wiki is the **canonical home for CORTEX architecture and design documentation**.

> The repository includes `DESIGN.md` as a small **TOC/landing page** that points here. Use this wiki for all deep dives, algorithm descriptions, and mathematical reasoning.

## Where to start

- **If you’re writing an issue or PR that affects the architecture:** start with **Architecture Overview**.
- **If you need to understand retrieval behavior:** see **Retrieval & Metroid Algorithm**.
- **If you’re changing ingestion or indexing:** see **Ingestion (Hippocampus)**.
- **If you’re optimizing or debugging consolidation:** see **Consolidation (Daydreamer)**.
- **If you’re changing storage formats or persistence:** see **Storage Architecture**.
- **If you’re tuning performance budgets:** see **Performance Model & Constraints**.
- **If you need definitions or constant values:** see **Terminology + Numerics**.
- **If you need the math behind the design:** see **Math Appendix**.

---

## Wiki Pages (Table of Contents)

- [Architecture Overview](Architecture.md)
- [Retrieval & Metroid Algorithm](Retrieval.md)
- [Ingestion (Hippocampus)](Ingestion.md)
- [Consolidation (Daydreamer)](Consolidation.md)
- [Storage Architecture](Storage.md)
- [Performance Model & Constraints](Performance.md)
- [Security & Trust](Security.md)
- [Terminology + Numerics](Terminology.md)
- [Math Appendix](Math-Appendix.md)

---

## How to use this wiki

- **Edit on GitHub**: Edit the markdown directly in the wiki repo (`cortex.wiki`) and push.
- **Linking from issues/PRs**: Link to the relevant wiki page (e.g., `https://github.com/devlux76/cortex/wiki/Retrieval-&-Metroid-Algorithm`).
- **Keeping it up to date**: When the code changes, update the relevant wiki page; include the wiki link in the PR description.
16 changes: 16 additions & 0 deletions docs/wiki-draft/Ingestion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Ingestion (Hippocampus)

This page describes how CORTEX ingests new observations and integrates them into memory.

## Ingest Path

1. **Chunking/Parsing** — Raw inputs are segmented into pages/blocks.
2. **Embedding** — Each chunk is embedded using a Matryoshka-capable model.
3. **Fast Neighbor Insert** — New vectors are connected into the semantic neighbor graph.
4. **Hierarchical Prototypes** — Pages are organized into Books, Volumes, and Shelves.

## Hierarchy & Promotion

CORTEX manages a hierarchical prototype structure to keep hot (frequently-accessed) concepts in memory while relying on disk-backed storage for the long tail.

> For implementation details, see the code in `hippocampus/` and the `HierarchyBuilder` design notes.
37 changes: 37 additions & 0 deletions docs/wiki-draft/Math-Appendix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Math Appendix

This appendix contains the mathematical background that motivates several of CORTEX’s key design decisions.

## Curse of Dimensionality

In high-dimensional spaces, the volume of a unit ball collapses rapidly. For even dimension `n = 2m`:

```
V_n = π^m / m!
```

Stirling’s approximation shows this shrinks exponentially with `n`, meaning nearly all the volume is concentrated near the surface.

## Hypersphere Volume and the Hollow Sphere

CORTEX leverages this “hollow sphere” phenomenon: in high dimensions, the interior of a ball is essentially empty, so nearest-neighbor search can focus on the surface shell.

## Williams 2025 Sublinear Bound

CORTEX applies the result:

```
S = O(√(t · log t))
```

to bound space requirements (hotpath capacity, fanout limits, maintenance budgets) in a way that maintains on-device performance.

## Why This Matters

These mathematical observations drive several design decisions in CORTEX:

- Matryoshka dimension protection (to prevent domain drift)
- Sublinear fanout quotas (to avoid explosion in edge counts)
- The Metroid dialectical search pattern (to avoid confirmation bias in high-D retrieval)

> For full details, see the source code and the other wiki pages.
28 changes: 28 additions & 0 deletions docs/wiki-draft/Performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Performance Model & Constraints

This page explains the performance budget model and the key formulas that keep CORTEX sublinear.

## Williams Sublinear Bound

CORTEX uses the Williams 2025 result:

> **S = O(√(t · log t))**

This bound is applied to multiple budgets (hotpath index size, hierarchy fanout, neighbor degrees, maintenance batch sizes) to ensure the system stays efficient as the graph grows.

## Hotpath Capacity

The resident hotpath index is capped to a sublinear growth function, often expressed as:

```
H(t) = ⌈c · √(t · log₂(1 + t))⌉
```

## Budgeting & Fanout Limits

The same sublinear law is used for:
- Hierarchy fanout limits
- Semantic neighbor degree caps
- Daydreamer maintenance batch sizing

> See the code in `core/HotpathPolicy.ts` and `hippocampus/HierarchyBuilder.ts` for the concrete implementations.
29 changes: 29 additions & 0 deletions docs/wiki-draft/Retrieval.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Retrieval & Metroid Algorithm

This page explains the retrieval pipeline and the Metroid-based dialectical search mechanism.

## Metroid Overview

A **Metroid** is a structured search primitive: it contains a thesis (`m1`), an antithesis (`m2`), and a frozen centroid (`c`).

- `m1` is the medoid closest to the query.
- `m2` is an opposite medoid found via cosine-opposite medoid search.
- `c` is the frozen centroid between `m1` and `m2`.

## Dialectical Search Zones

From the centroid `c`, the system classifies candidates into three zones:

- **Thesis zone:** closer to `m1` than to `c`.
- **Antithesis zone:** closer to `m2` than to `c`.
- **Synthesis zone:** near `c`, balanced between both poles.

## Matryoshka Dimensional Unwinding

CORTEX uses Matryoshka embeddings with protected dimensions (lower dimensions that anchor domain context). The retrieval algorithm progressively frees dimensions to explore antithesis candidates while keeping the centroid frozen.

## Knowledge Gap Detection

When no suitable `m2` can be found within constraints, the system flags a **knowledge gap** and may broadcast a P2P curiosity request.

> See the **Math Appendix** for the geometric intuition behind why this approach is necessary in high-dimensional spaces.
16 changes: 16 additions & 0 deletions docs/wiki-draft/Security.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Security & Trust

This page covers trust assumptions, cryptographic integrity, and smart sharing guardrails.

## Cryptographic Integrity

CORTEX supports cryptographic signing (and optional verification) for stored vectors and metadata to help detect tampering and integrity issues.

## Smart Sharing Guardrails

When sharing memory fragments over P2P, CORTEX enforces:
- MIME type validation
- Model URN compatibility checks
- Eligibility filtering (to avoid leaking sensitive or irrelevant data)

> See `sharing/` for the implementation details of peer exchange and eligibility classification.
19 changes: 19 additions & 0 deletions docs/wiki-draft/Storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Storage Architecture

This page describes how CORTEX stores vectors and metadata in the browser.

## Vector Storage (OPFS)

- Vectors are stored in an append-only OPFS file for fast writes.
- The OPFS backend is designed to be zero-copy for WebGPU and WebGL consumption.

## Metadata Storage (IndexedDB)

- Metadata (nodes, edges, schema) is stored in IndexedDB.
- IndexedDB is used for fast subgraph retrieval and persistence across sessions.

## Maintenance & Corruption Resistance

Today, CORTEX relies on the browser’s OPFS and IndexedDB durability guarantees, with limited, optional integrity checks (for example, content-hash verification on incoming peer fragments).

Planned: a broader integrity verification and corruption-detection/recovery flow for OPFS/IndexedDB-backed data, including cryptographic integrity validation of stored payloads and automated remediation of partial failures.
23 changes: 23 additions & 0 deletions docs/wiki-draft/Terminology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Terminology & Numerics

This page collects key terms, model-derived numeric constants, and policy-derived constants.

## Key Terms

- **Medoid** — an existing memory node selected as a cluster representative.
- **Centroid** — a computed average vector (not necessarily a real node).
- **Metroid** — a transient search construct `{ m1, m2, c }` used during retrieval.
- **Hotpath** — the in-memory resident index of active nodes.

## Model-Derived Numerics

These values are derived from the embedding model profile (not hardcoded):
- Embedding dimensionality
- Matryoshka protected dimension boundary
- Query context length limits

## Policy-Derived Constants

Policy constants (e.g. fanout caps, quota ratios) are defined in the code and kept in sync with the design.

> For the authoritative source of policy constants, see `core/HotpathPolicy.ts`.
Loading