Skip to content

AzazAhmedLipu79/openmemory

Repository files navigation

OpenMemory

Universal memory protocol and engine for AI systems.

Claude, ChatGPT, Cursor, custom agents. Any AI can read and write shared, structured memory through a standard protocol. Think of it as the common memory layer every AI system should have had from the start.

"Three different AI systems sharing one OpenMemory instance, each retrieving context written by another." This is what we set out to build.


Philosophy

  • Memory is external. AI models are stateless reasoning nodes. All continuity comes from the memory layer.
  • Local-first, cloud-optional. Default embeddings run locally (Apache 2.0, 384-dim, CPU). Zero API cost. Zero data leaving your machine.
  • Never delete. Status transitions only. Full immutable event log.
  • Protocol, not product. REST + MCP. Any AI that speaks HTTP can use it.

Quick Start

git clone https://github.com/AzazAhmedLipu79/openmemory
cd openmemory
cp .env.example .env
docker compose up -d
docker compose exec api alembic upgrade head

Verify:

curl localhost:8000/health
# {"status":"healthy","db":"connected","redis":"connected"}

curl -X POST localhost:8000/memories \
  -H 'Content-Type: application/json' \
  -d '{"type":"preference","key":"language","value":"Go","confidence":0.9}'

curl -X POST localhost:8000/memories/retrieve \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"What language does the user prefer?","top_k":3}'

Features

Memory Store

  • 6 memory types: preference, belief, fact, task, episodic, procedural
  • 4 scopes: user, project, session, team
  • Status lifecycle: active → stale → superseded → archived (never deleted)
  • Confidence scoring: 0.0–1.0, evolves with contradictions
  • Immutable event log: every action traced, cursor-paginated

Confidence and Importance Standards

Agents use consistent scoring so memory quality stays uniform across all AIs:

Memory Type Confidence Importance Reason
preference 0.85-0.90 0.80-0.90 User stated, affects every decision
fact 0.85-1.00 0.70-0.85 Verifiable project knowledge
procedural 0.85-0.95 0.80-0.90 Tested workflow, critical
task 0.70-0.90 0.50-0.70 Commitment, matters until done
belief 0.50-0.80 0.40-0.60 Educated guess, might change
episodic 0.60-0.80 0.20-0.40 Past event, context that fades
Source Confidence Effect
explicit (directly stated) +0.15 to +0.20 boost
observed (noticed pattern) Baseline
inferred (deduced) -0.15 to -0.20 penalty

Full details: documents/12-confidence-importance-standards.md

Retrieval Engine

  • 4-signal composite scoring: semantic similarity + recency + importance + frequency
  • 4 retrieval modes: smart, focused, deep, working
  • Two-phase pipeline: pgvector ANN pre-filter → score + re-rank
  • Graph traversal: depth-decay scoring ($0.7^d$) across memory_relations
  • Assembled context: ready-to-inject string for AI system prompts

Embeddings

  • Default: all-MiniLM-L6-v2 (Apache 2.0, 384 dimensions, 80MB, runs on CPU)
  • Configurable: OpenAI text-embedding-3-small, Ollama nomic-embed-text, any sentence-transformers model
  • Async generation: memories written immediately, embeddings generated within 10s by background worker

Background Workers (5 jobs)

Job Interval Purpose
Auto-embedding 10s Generate embeddings for new memories
Access flush 60s Flush Redis access counters → DB
Recency recompute Hourly Update $e^{-\lambda \cdot \text{age}}$ scores
Session cleanup Hourly Archive expired session memories
Confidence decay 30 days Decay confidence for untouched memories

Performance

  • Cold retrieval: ~26s (full pipeline)
  • Cache hit: ~14ms (~1,800× faster)
  • Rate limiting: Redis sliding window, 100 req/s retrieval, 20 req/s writes
  • Connection pool: 5–20 connections, 5s statement timeout

AI Integration

  • MCP adapter: 4 tools (retrieve, write, link, health). Claude and Cursor auto-discover them via VS Code
  • REST API: 10 endpoints, any HTTP client
  • Demo: demo/demo.py shows two agents sharing the same memory system

How It Works, End to End

  WRITE                    BACKGROUND                 RETRIEVE
 ───────                  ──────────                ─────────
 
 AI writes          Worker picks up           AI asks question
 memory via         un-embedded               via REST or MCP
 REST or MCP        memories every 10s
     │                    │                        │
     ▼                    ▼                        ▼
 ┌─────────┐      ┌──────────────┐         ┌──────────────┐
 │ Validate│      │ Embed via    │         │ Embed query  │
 │+Contra- │      │ all-MiniLM-  │         │ (same model) │
 │ diction │      │ L6-v2 (CPU)  │         │ → 384d vec   │
 └────┬────┘      └──────┬───────┘         └──────┬───────┘
      │                  │                        │
      ▼                  ▼                        ▼
 ┌─────────┐      ┌──────────────┐         ┌──────────────┐
 │ INSERT  │      │ UPDATE       │         │ Phase A: ANN │
 │ memories│      │ embedding    │         │ pre-filter   │
 │+events  │      │ column       │         │ 36 candidates│
 └─────────┘      └──────────────┘         └──────┬───────┘
                                                  │
      ┌───────────────────────────────────────────┘
      │
      ▼
 ┌────────────────────────────────────┐
 │ Phase B: Score 36 candidates       │
 │ score = sem×0.40 + rec×0.20       │
 │       + imp×0.20 + freq×0.10      │
 │ × scope_bonus × status_multiplier │
 │ → keep top 12                      │
 └────────────┬───────────────────────┘
              │
              ▼
 ┌────────────────────────────┐
 │ Phase C: Graph traversal   │
 │ Follow relations depth 2   │
 │ score × weight × 0.7^depth │
 │ Deduplicate, sort           │
 └────────────┬───────────────┘
              │
              ▼
 ┌────────────────────────────┐       ┌──────────────────┐
 │ Assemble response          │       │ Cache in Redis   │
 │ + assembled_context string │──────▶│ 60s TTL          │
 │ + scored memory list       │       │ Next query: 14ms │
 └────────────────────────────┘       └──────────────────┘

Every memory flows through Store → Embed → Index → Retrieve → Score → Deliver.

1. Store

An AI writes a memory via REST or MCP:

POST /memories
{
  "type": "fact",
  "key": "database",
  "value": "PostgreSQL 16 with pgvector",
  "confidence": 0.95
}

What happens:

  • Contradiction check: The engine looks for an existing memory with the same (subject_id, key). If found with a different value:
    • New confidence > old + 0.2 → old is superseded
    • Source is explicit → old is superseded
    • Otherwise → old confidence decays 20%, both kept
  • Immutable event: Every write and status change appends to memory_events. This table is append-only, never updated, never deleted. It uses a BIGINT auto-increment ID with a space-efficient BRIN index.
  • Columns populated: recency_score = 1.0, frequency_score = 0.0, embedding = NULL (filled asynchronously)

2. Embed

The background worker picks up un-embedded memories every 10 seconds:

Worker: SELECT * FROM memories WHERE embedding IS NULL LIMIT 50
        → call all-MiniLM-L6-v2 (Apache 2.0, 384-dim, CPU)
        → UPDATE memories SET embedding = $vector

The memory becomes semantically searchable within about 10 seconds. Until then, it's still findable through recency, importance, and keyword search. Only the semantic similarity score stays at zero during that brief window.

3. Index

Eight PostgreSQL indexes make retrieval fast:

Index Type What It Speeds Up Speedup at 100K rows
ix_memories_subject_scope_status Composite B-tree Filtering by owner + scope + status ~400× vs seq scan
ix_memories_embedding_ivfflat IVFFlat vector ANN cosine similarity search ~50× vs brute force
ix_memories_subject_key B-tree Contradiction detection on write O(log n)
ix_memories_session_active Partial B-tree Session cleanup (only 5% of table) ~20× smaller
ix_memory_relations_from/to B-tree ×2 Bidirectional graph traversal O(log n)
ix_memory_events_memory_ts Composite B-tree Per-memory event history O(log n)
ix_memory_events_ts_brin BRIN Chronological event scans ~1,000× smaller than B-tree

4. Retrieve

An AI asks a question:

POST /memories/retrieve
{ "prompt": "What database does this project use?", "mode": "smart", "top_k": 12 }

Phase A: ANN Pre-Filter (PostgreSQL, about 3ms)

SELECT * FROM memories
WHERE subject_id = $1 AND scope = ANY($2) AND status = ANY($3)
ORDER BY embedding <=> $query_vector   -- pgvector cosine distance
LIMIT 36                                -- 3× top_k

Returns 36 candidates. The IVFFlat index avoids scanning all 100K vectors.

Phase B: Score and Re-Rank (Python, about 2ms)

Each candidate gets a composite score:

score = (
    semantic_similarity × 0.40   ← cosine(query_vec, memory_vec), mapped to [0,1]
  + recency_score       × 0.20   ← e^(-λ × age_hours), pre-computed, hourly refresh
  + importance          × 0.20   ← user-set at write time
  + frequency_score     × 0.10   ← LOG2(access_count + 1) / 10, pre-computed
) × scope_bonus × status_multiplier

Weights vary by mode:

Mode sem rec imp freq Graph Use
smart 0.40 0.20 0.20 0.10 depth 2 General queries
focused 0.55 0.15 0.15 0.15 depth 1 Coding assistants
deep 0.30 0.10 0.25 0.05 depth 3 Summarization
working 0.50 0.40 0.05 0.05 none Turn-by-turn chat

Scope bonus: exact match ×1.20, parent scope ×1.10, unrelated ×1.00.
Status penalty: active ×1.0, stale ×0.4, superseded ×0.1, archived ×0.0 (excluded).

The 36 candidates are sorted by score. Only the top 12 proceed to graph traversal.

Phase C: Graph Traversal (5 to 10ms)

For each top-12 memory, the engine follows memory_relations edges:

traversed_score = parent_score × relation_weight × 0.7^depth

Depth 1: ×0.70   closely related
Depth 2: ×0.49   moderately related
Depth 3: ×0.34   distantly related

This surfaces connected memories that pure vector search would miss. A memory about "PostgreSQL 15" can pull in a related memory about "Flyway migrations" through a depends_on edge, even if that memory wouldn't rank highly on vector similarity alone.

Traversed nodes already in the result set are deduplicated, keeping the higher score. Running graph traversal only on the top 12, instead of all 36 ANN candidates, cuts SQL queries by 67%.

5. Deliver

The final response is assembled:

{
  "memories": [
    {
      "key": "database",
      "value": "PostgreSQL 16 with pgvector",
      "score": 0.687,
      "confidence": 0.95,
      "status": "active"
    }
  ],
  "assembled_context": "PostgreSQL 16 with pgvector. Deploy via docker compose...",
  "graph_nodes_traversed": 2,
  "conflicts_suppressed": 0
}

assembled_context is the most important field. The AI never parses memory internals. It takes this string and places it directly into its system prompt, right before the user's message:

System: You are a coding assistant. User context: PostgreSQL 16 with pgvector.
        Deploy via docker compose up -d on VPS. Preferred language is Go.
User:   How should I set up the database?

6. Cache (Redis)

Every retrieval result is cached in Redis:

Key:   om:retrieve:{sha256(prompt:subject:mode:top_k)[:16]}
TTL:   60 seconds
Invalidated: on any write to the same subject

First query (cold): around 26 seconds (embedding, ANN, scoring, graph).
Second query (cache hit): around 14 milliseconds. That's about 1,800 times faster.

The cache hit rate is 40–60% in active sessions with repeated or similar queries. Combined with rate limiting (Redis sliding window, 100 req/s retrieval burst), the system stays responsive under load.


API (12 Endpoints)

Method Endpoint Description
GET /health System health check
POST /memories Write a single memory
POST /memories/batch Write up to 50 memories atomically
GET /memories List memories with pagination + filters (new)
GET /memories/{id} Read a specific memory
PATCH /memories/{id}/status Mark stale, superseded, or archived
POST /memories/retrieve Smart retrieval (embed → ANN → score → graph → context)
POST /memories/context Assembled context string only
GET /memories/search Keyword search (ILIKE)
GET /memories/graph/{id} Graph traversal from a memory
POST /memories/relations Link two memories (graph edge) (new)
GET /events Immutable event log (cursor-paginated)

Full API reference: documents/03-api-reference.md

Example: Smart Retrieval

curl -X POST localhost:8000/memories/retrieve \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "What database does this project use?",
    "mode": "smart",
    "top_k": 5
  }'
{
  "memories": [
    {
      "key": "database",
      "value": "PostgreSQL 16 + pgvector for semantic search",
      "score": 0.687,
      "status": "active",
      "confidence": 0.95
    }
  ],
  "assembled_context": "PostgreSQL 16 + pgvector for semantic search. Deploy via docker compose...",
  "graph_nodes_traversed": 2
}

Configuration

All settings in .env (copy from .env.example):

# Embedding provider (local = free, open-source, zero API cost)
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=all-MiniLM-L6-v2
EMBEDDING_DIM=384

# To switch to OpenAI:
# EMBEDDING_PROVIDER=openai
# EMBEDDING_MODEL=text-embedding-3-small
# EMBEDDING_DIM=1536
# OPENAI_API_KEY=sk-...

Architecture

AI Clients (Claude, ChatGPT, Cursor, Custom)
        │
        ├── MCP Protocol (stdio) ──► mcp_server/server.py
        │                              │
        └── REST (HTTP) ──────────────┤
                                       ▼
                              ┌─────────────────┐
                              │  FastAPI :8000   │
                              │  10 endpoints    │
                              └───┬────┬────┬───┘
                                  │    │    │
                    ┌─────────────┘    │    └──────────────┐
                    ▼                  ▼                   ▼
            ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
            │ PostgreSQL 16 │  │   Redis 7    │  │   Worker     │
            │  + pgvector   │  │  Cache/Rate  │  │  5 BG Jobs   │
            └──────────────┘  └──────────────┘  └──────────────┘

Demo

cd demo
python3 demo.py

Shows multi-agent shared memory: Agent A (Claude) writes project knowledge, Agent B (Cursor) retrieves it. Contradiction, caching, and lifecycle all demonstrated. Optionally uses LM Studio (localhost:1234) for AI-generated content.


Stack

Layer Technology
API FastAPI (Python 3.12), async
Database PostgreSQL 16 + pgvector
Cache Redis 7
Embeddings sentence-transformers (Apache 2.0)
Background asyncio (5 concurrent jobs)
MCP mcp 1.27.1 (stdio transport)
DevOps Docker Compose (4 services)

Documentation

12 detailed technical documents in documents/:

# Document Content
1 Architecture Overview System design, Docker, data flow
2 Schema Design Tables, indexes, storage analysis
3 API Reference All 10 endpoints with curl examples
4 Retrieval Scoring Formula, modes, two-phase pipeline
5 Performance Analysis Latency budget, index speedups, cache
6 Contradiction Engine Detection + resolution algorithms
7 Implementation Log Phase 1-4 build history, bugs, decisions
8 Embedding Strategy Local-first, model comparison, config
9 Phase 2: Retrieval Pipeline, graph, code reference
10 Phase 3: Jobs and Caching Workers, Redis, quantitative impact
11 Phase 4: MCP Adapter Claude and Cursor integration, tools
12 Confidence & Importance Standards Agent scoring guidelines, contradiction impact

Testing

pip install pytest pytest-asyncio
python -m pytest tests/ -v
# 12 passed

License

Apache 2.0

The default embedding model (all-MiniLM-L6-v2) is separately licensed under Apache 2.0 by its authors.


Status

v0.5 - All four build phases are complete. The system has 10 API endpoints, 5 background workers, 4 retrieval modes, an MCP adapter for Claude and Cursor, a full test suite, and 12 technical documents.

Built and tested. Ready to use.

About

Claude, ChatGPT, Cursor, custom agents. Any AI can read and write shared, structured memory through a standard protocol. Think of it as the common memory layer every AI system should have had from the start.

Topics

Resources

License

Stars

Watchers

Forks

Contributors