VectorRAG: Semantic Memory Architecture

Last Updated: 7 February 2026
Purpose: Technical documentation for Athena's semantic memory system

Executive Summary

VectorRAG (Vector Retrieval-Augmented Generation) is Athena's semantic memory system. Unlike traditional keyword search, it finds meaning—retrieving contextually relevant information even when exact words don't match.

Component	Technology	Purpose
Vector Database	Supabase + pgvector	Cloud-native, persistent storage
Embeddings	Google `text-embedding-004`	3072-dimension semantic vectors
Similarity	Cosine Distance (`<=>`)	Meaning-based matching
Sync	Python Scripts	Automated indexing pipeline

Why VectorRAG vs. GraphRAG?

Note

Athena chose VectorRAG over Microsoft's GraphRAG after evaluating both approaches.

Feature	VectorRAG (Supabase)	GraphRAG (Microsoft)
Setup	~30 mins	Hours (entity extraction)
Cost	Free tier viable	Expensive LLM calls
Query Speed	<100ms	Seconds (graph traversal)
Best For	Semantic recall	Entity relationships
Maintenance	Low	High (graph updates)

Verdict: For a personal knowledge base where finding relevant context fast is the goal, VectorRAG wins. GraphRAG excels at entity relationship mapping but adds complexity we don't need.

System Architecture

flowchart TB
    subgraph LOCAL["📁 Local Workspace"]
        direction TB
        SESSIONS["Session Logs<br/>(300+ files)"]
        CASES["Case Studies<br/>(40+ files)"]
        PROTOCOLS["Protocols<br/>(170+ files)"]
        PROFILE["User Profile<br/>(Preferences, Settings)"]
        ENTITIES["Entity Data<br/>(External Imports)"]
    end
    
    subgraph SYNC["⚙️ Sync Pipeline"]
        direction TB
        PARSE["Parse Markdown"]
        EMBED["Generate Embeddings<br/>(text-embedding-004)"]
        UPLOAD["Upsert to Supabase"]
    end
    
    subgraph CLOUD["☁️ Supabase (pgvector)"]
        direction TB
        T1["sessions"]
        T2["case_studies"]
        T3["protocols"]
        T4["capabilities"]
        T5["playbooks"]
        T6["references"]
        T7["frameworks"]
        T8["workflows"]
        T9["user_profile"]
        T10["system_docs"]
        T11["entities"]
    end
    
    SESSIONS --> PARSE
    CASES --> PARSE
    PROTOCOLS --> PARSE
    PROFILE --> PARSE
    ENTITIES --> PARSE
    
    PARSE --> EMBED
    EMBED --> UPLOAD
    UPLOAD --> T1 & T2 & T3 & T4 & T5 & T6 & T7 & T8 & T9 & T10 & T11
    
    style LOCAL fill:#1e3a5f,stroke:#4a9eff,color:#fff
    style SYNC fill:#2d4a3e,stroke:#4ade80,color:#fff
    style CLOUD fill:#4a2040,stroke:#f472b6,color:#fff

How Semantic Search Works

The Query Flow

sequenceDiagram
    participant U as 👤 User
    participant A as 🏛️ Athena
    participant G as 🔮 Gemini API
    participant S as ☁️ Supabase
    
    U->>A: "What did we discuss about project architecture?"
    
    Note over A: Autonomic Trigger: §0.7.1
    
    A->>G: Embed query text
    G-->>A: Return 3072-dim vector
    
    A->>S: search_sessions(embedding, threshold=0.3)
    S-->>A: Top 5 matches with similarity scores
    
    A->>S: search_case_studies(embedding)
    S-->>A: Related case studies
    
    A->>S: search_protocols(embedding)
    S-->>A: Relevant protocols
    
    Note over A: Synthesize context
    
    A->>U: "In Session 15, we designed the API structure..."

The Math: Cosine Similarity

Similarity = 1 - (embedding_a <=> embedding_b)

Where <=> is pgvector's cosine distance operator.

Similarity Score	Interpretation
> 0.7	Highly relevant
0.5 - 0.7	Moderately relevant
0.3 - 0.5	Loosely related
< 0.3	Likely noise

Database Schema

Core Tables

Each table follows this pattern:

CREATE TABLE IF NOT EXISTS sessions (
    id SERIAL PRIMARY KEY,
    date DATE NOT NULL,
    session_number INTEGER NOT NULL,
    title TEXT,
    content TEXT NOT NULL,
    embedding VECTOR(3072),  -- pgvector type
    file_path TEXT UNIQUE NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- IVFFlat index for fast similarity search
CREATE INDEX ON sessions 
USING ivfflat (embedding vector_cosine_ops) 
WITH (lists = 100);

Search Functions (RPC)

CREATE OR REPLACE FUNCTION search_sessions(
    query_embedding VECTOR(3072),
    match_threshold FLOAT DEFAULT 0.3,
    match_count INT DEFAULT 5
)
RETURNS TABLE (
    id INT,
    date DATE,
    title TEXT,
    content TEXT,
    similarity FLOAT
)
LANGUAGE plpgsql
AS $$
BEGIN
    RETURN QUERY
    SELECT
        s.id,
        s.date,
        s.title,
        s.content,
        1 - (s.embedding <=> query_embedding) AS similarity
    FROM sessions s
    WHERE 1 - (s.embedding <=> query_embedding) > match_threshold
    ORDER BY s.embedding <=> query_embedding
    LIMIT match_count;
END;
$$;

The 11 Searchable Domains

                              ┌─────────────────────────────────────┐
                              │            VectorRAG                │
                              └──────────────┬──────────────────────┘
              ┌──────────────────────────────┼──────────────────────────────┐
              │                              │                              │
     ┌────────▼────────┐           ┌─────────▼─────────┐          ┌─────────▼─────────┐
     │     Memory      │           │      Skills       │          │    Reference      │
     └────────┬────────┘           └─────────┬─────────┘          └─────────┬─────────┘
              │                              │                              │
    ┌─────────┼─────────┐          ┌─────────┼─────────┐          ┌─────────┼─────────┐
    │         │         │          │         │         │          │         │         │
sessions  case_studies entities  protocols capabilities playbooks frameworks references workflows
                                                                                 │
                              ┌──────────────────────────────────────────────────┘
                              │
                     ┌────────▼────────┐
                     │    Identity     │
                     └────────┬────────┘
                              │
                    ┌─────────┼─────────┐
                    │                   │
              user_profile         system_docs

Domain	Table	Count	Description
Sessions	`sessions`	~468	Daily interaction logs
Case Studies	`case_studies`	~75	Pattern analysis documents
Entities	`entities`	~100 chunks	External data imports
Protocols	`protocols`	~226	Reusable thinking patterns
Capabilities	`capabilities`	~10	Tool/skill definitions
Playbooks	`playbooks`	~5	Strategic guides
Frameworks	`frameworks`	~5	Core Identity modules
References	`references`	~10	External citations
Workflows	`workflows`	~20	Automation scripts
User Profile	`user_profile`	~10	Preferences, settings
System Docs	`system_docs`	~10	TAG_INDEX, manifests

Total Indexed Documents: ~850+

The Sync Pipeline

Script: `supabase_sync.py`

# Reference: python3 scripts/supabase_sync.py --all

flowchart LR
    subgraph INPUT["📂 Input"]
        F1["Session Logs"]
        F2["Case Studies"]
        F3["Protocols"]
        F4["...more"]
    end
    
    subgraph PROCESS["⚙️ Processing"]
        P1["Read Markdown"]
        P2["Extract Metadata"]
        P3["Generate Embedding<br/>(Gemini API)"]
        P4["Check Exists"]
    end
    
    subgraph OUTPUT["☁️ Supabase"]
        O1["INSERT if new"]
        O2["SKIP if exists"]
    end
    
    F1 & F2 & F3 & F4 --> P1 --> P2 --> P3 --> P4
    P4 -->|New| O1
    P4 -->|Exists| O2

Embedding Generation

def get_embedding(text: str) -> list[float]:
    """Generate 3072-dim embedding using Google Gemini."""
    text = text[:32000]  # Token limit
    
    url = f"https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent?key={GOOGLE_API_KEY}"
    
    payload = {
        "model": "models/text-embedding-004",
        "content": {"parts": [{"text": text}]}
    }
    
    response = requests.post(url, json=payload)
    return response.json()["embedding"]["values"]

Query Interface

Script: `supabase_search.py`

# Search everything
# Reference: python3 scripts/supabase_search.py "API design"

# Search specific domain
# Reference: python3 scripts/supabase_search.py "authentication" --sessions-only

# Adjust sensitivity
# Reference: python3 scripts/supabase_search.py "database schema" --threshold 0.5 --limit 10

Output Example

🔍 Searching for: "API design"
============================================================

📚 SESSIONS:
  1. [72.3%] 2025-01-15 - Session 42: API Architecture Review
     📁 .context/memories/session_logs/2025-01-15-session-42.md
  2. [68.1%] 2025-01-20 - Session 47: REST Endpoint Design
     📁 .context/memories/session_logs/2025-01-20-session-47.md

📖 CASE STUDIES:
  1. [61.2%] CS-005 - Authentication Flow Analysis
     📁 .context/memories/case_studies/CS-005-auth-flow.md

📋 PROTOCOLS:
  1. [58.7%] Agentic Engineering Strategy: API Design Patterns
     📁 .agent/skills/protocols/architecture/42-api-design.md

Autonomic Integration (Protocol §0.7.1)

VectorRAG is not optional. Per Core Identity §0.7.1:

Semantic Context Protocol: Run supabase_search.py at the start of EVERY query to inject relevant context.

Trigger Matrix

User Query Pattern	Automatic Action
"What did we discuss about X?"	`supabase_search.py "X"`
"Find sessions where..."	`supabase_search.py`
"Remember when we talked about..."	`supabase_search.py`
Case study lookup	`supabase_search.py --cases-only`
Protocol recall	`supabase_search.py --protocols-only`

Temporal Continuity: The Killer Feature

 Session Progression: 302 Sessions in 14 Days (Dec 11-24, 2025)

 ┌──────────────────────────────────────────────────────────────────────────────┐
 │  Day 1-3           Day 4-7            Day 8-11           Day 12-14           │
 │  (~30 sessions)    (~90 sessions)     (~120 sessions)    (~60 sessions)      │
 │  ──────────────    ──────────────     ──────────────     ──────────────      │
 │  • Core Identity   • Protocol Lib     • VectorRAG        • Full Production   │
 │  • First Workflows • Supabase Setup   • 170 Protocols    • 730+ Documents    │
 │  • Session Logs    • Case Studies     • Semantic Search  • Complete Recall   │
 │                                                                              │
 │  [FOUNDATION]      [ACCELERATION]     [INTEGRATION]      [PRODUCTION]        │
 └──────────────────────────────────────────────────────────────────────────────┘

 Average: ~22 sessions/day | Peak: 33 sessions (Dec 18)

Why This Matters

Traditional LLM	Athena + VectorRAG
"Who are you?" (Every session)	"Welcome back. Per Session 42, let's continue the API design."
Context resets each chat	Context persists forever
Manual memory via user uploads	Automatic sync pipeline
Generic responses	Personalized, historically-aware responses

Implementation Files

File	Purpose	Lines
`supabase_sync.py`	Indexing pipeline	~970
`supabase_search.py`	Query interface	~390
`migrations/*.sql`	Table/function definitions	~168

Cost Analysis

Resource	Free Tier	Paid Tier
Supabase	500MB DB, 2GB bandwidth	$25/mo for 8GB
Gemini Embeddings	1,500 req/day	N/A (no cost beyond free)
Total	$0/month	~$25/month at scale

Tip

At ~730 documents, we're well within free tier limits. Embeddings are generated once per document, so ongoing costs are minimal.

Comparison: Athena VectorRAG vs. Alternatives

                        ┌─────────────────────────────────────┐
                        │     RAG Approaches Comparison       │
                        ├─────────────────────────────────────┤
        Capability      │                                     │
            ▲           │   IDEAL          │    OVERKILL      │
            │           │                  │                  │
        Advanced        │  ★ Athena        │   ○ GraphRAG     │
            │           │    VectorRAG     │   ○ Pinecone+    │
            │           │                  │     OpenAI       │
            ├───────────┼──────────────────┼──────────────────┤
            │           │   INSUFFICIENT   │    ENTERPRISE    │
        Limited         │                  │                  │
            │           │  ○ Local Chroma  │                  │
            │           │  ○ Context Only  │                  │
            │           │                  │                  │
            └───────────┴──────────────────┴──────────────────┘
                             Low Cost          High Cost

  ★ = Athena's choice   ○ = Alternative approaches

Real-World Application: Contextual Intelligence (vs. Generic AI)

Scenario: A user asks for advice on a business deal involving a commission-only agent. Query: "Should I accept this commission-based partnership where the agent takes no risk?"

1. The Generic AI Response (Standard LLM)

Focus: General business principles.
Output: "Commission-based partnerships can be effective for motivation. Ensure you have clear contracts. Pros include low fixed costs, while cons include potential short-term focus by the agent."
Verdict: Safe but generic. It offers a balanced pros/cons list but lacks strategic conviction.

2. The Athena Response (VectorRAG)

Retrieval Mechanism: The system mimics the user's brain, searching memory for relevant patterns:
- Protocol 33: Principal-Agent Problem (Theory)
- CS044_ILP_Trust_Arbitrage (Case Study of a similar bad deal)
- Protocol 121: Amoral Realism (Framework)
Output: "Stop. This structure matches the Principal-Agent Problem we analyzed in the ILP Trust Arbitrage Case (CS044). The counterparty has zero downside risk while you bear all liability. This mirrors the 'Tuition Trap' mechanic we identified in Dec 2024. Recommendation: Reject the deal unless they accept a clawback clause (shared risk) per Protocol 33."
Verdict: Contextualized & Decisive. It uses past organizational memory to prevent recurring errors, functioning as a high-level consultant rather than a chatbot.

Future Enhancements

Hybrid Search: Combine vector + keyword + TAG_INDEX for precision → See SEMANTIC_SEARCH.md
Auto-Reindex: Trigger sync on file save (via GitHub webhook)
Cross-Reference: Link sessions to protocols to case studies
Chunking Strategy: Split large documents for finer retrieval

References

#vectorrag #supabase #semantic-search #memory #architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VectorRAG: Semantic Memory Architecture

Executive Summary

Why VectorRAG vs. GraphRAG?

System Architecture

How Semantic Search Works

The Query Flow

The Math: Cosine Similarity

Database Schema

Core Tables

Search Functions (RPC)

The 11 Searchable Domains

The Sync Pipeline

Script: `supabase_sync.py`

Embedding Generation

Query Interface

Script: `supabase_search.py`

Output Example

Autonomic Integration (Protocol §0.7.1)

Trigger Matrix

Temporal Continuity: The Killer Feature

Why This Matters

Implementation Files

Cost Analysis

Comparison: Athena VectorRAG vs. Alternatives

Real-World Application: Contextual Intelligence (vs. Generic AI)

1. The Generic AI Response (Standard LLM)

2. The Athena Response (VectorRAG)

Future Enhancements

References

FilesExpand file tree

VECTORRAG.md

Latest commit

History

VECTORRAG.md

File metadata and controls

VectorRAG: Semantic Memory Architecture

Executive Summary

Why VectorRAG vs. GraphRAG?

System Architecture

How Semantic Search Works

The Query Flow

The Math: Cosine Similarity

Database Schema

Core Tables

Search Functions (RPC)

The 11 Searchable Domains

The Sync Pipeline

Script: supabase_sync.py

Embedding Generation

Query Interface

Script: supabase_search.py

Output Example

Autonomic Integration (Protocol §0.7.1)

Trigger Matrix

Temporal Continuity: The Killer Feature

Why This Matters

Implementation Files

Cost Analysis

Comparison: Athena VectorRAG vs. Alternatives

Real-World Application: Contextual Intelligence (vs. Generic AI)

1. The Generic AI Response (Standard LLM)

2. The Athena Response (VectorRAG)

Future Enhancements

References

Script: `supabase_sync.py`

Script: `supabase_search.py`