OpenMemory

Universal memory protocol and engine for AI systems.

Claude, ChatGPT, Cursor, custom agents. Any AI can read and write shared, structured memory through a standard protocol. Think of it as the common memory layer every AI system should have had from the start.

"Three different AI systems sharing one OpenMemory instance, each retrieving context written by another." This is what we set out to build.

Philosophy

Memory is external. AI models are stateless reasoning nodes. All continuity comes from the memory layer.
Local-first, cloud-optional. Default embeddings run locally (Apache 2.0, 384-dim, CPU). Zero API cost. Zero data leaving your machine.
Never delete. Status transitions only. Full immutable event log.
Protocol, not product. REST + MCP. Any AI that speaks HTTP can use it.

Quick Start

git clone https://github.com/AzazAhmedLipu79/openmemory
cd openmemory
cp .env.example .env
docker compose up -d
docker compose exec api alembic upgrade head

Verify:

curl localhost:8000/health
# {"status":"healthy","db":"connected","redis":"connected"}

curl -X POST localhost:8000/memories \
  -H 'Content-Type: application/json' \
  -d '{"type":"preference","key":"language","value":"Go","confidence":0.9}'

curl -X POST localhost:8000/memories/retrieve \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"What language does the user prefer?","top_k":3}'

Features

Memory Store

6 memory types: preference, belief, fact, task, episodic, procedural
4 scopes: user, project, session, team
Status lifecycle: active → stale → superseded → archived (never deleted)
Confidence scoring: 0.0–1.0, evolves with contradictions
Immutable event log: every action traced, cursor-paginated

Confidence and Importance Standards

Agents use consistent scoring so memory quality stays uniform across all AIs:

Memory Type	Confidence	Importance	Reason
`preference`	0.85-0.90	0.80-0.90	User stated, affects every decision
`fact`	0.85-1.00	0.70-0.85	Verifiable project knowledge
`procedural`	0.85-0.95	0.80-0.90	Tested workflow, critical
`task`	0.70-0.90	0.50-0.70	Commitment, matters until done
`belief`	0.50-0.80	0.40-0.60	Educated guess, might change
`episodic`	0.60-0.80	0.20-0.40	Past event, context that fades

Source	Confidence Effect
`explicit` (directly stated)	+0.15 to +0.20 boost
`observed` (noticed pattern)	Baseline
`inferred` (deduced)	-0.15 to -0.20 penalty

Full details: documents/12-confidence-importance-standards.md

Retrieval Engine

4-signal composite scoring: semantic similarity + recency + importance + frequency
4 retrieval modes: smart, focused, deep, working
Two-phase pipeline: pgvector ANN pre-filter → score + re-rank
Graph traversal: depth-decay scoring ($0.7^d$) across memory_relations
Assembled context: ready-to-inject string for AI system prompts

Embeddings

Default: all-MiniLM-L6-v2 (Apache 2.0, 384 dimensions, 80MB, runs on CPU)
Configurable: OpenAI text-embedding-3-small, Ollama nomic-embed-text, any sentence-transformers model
Async generation: memories written immediately, embeddings generated within 10s by background worker

Background Workers (5 jobs)

Job	Interval	Purpose
Auto-embedding	10s	Generate embeddings for new memories
Access flush	60s	Flush Redis access counters → DB
Recency recompute	Hourly	Update $e^{-\lambda \cdot \text{age}}$ scores
Session cleanup	Hourly	Archive expired session memories
Confidence decay	30 days	Decay confidence for untouched memories

Performance

Cold retrieval: ~26s (full pipeline)
Cache hit: ~14ms (~1,800× faster)
Rate limiting: Redis sliding window, 100 req/s retrieval, 20 req/s writes
Connection pool: 5–20 connections, 5s statement timeout

AI Integration

MCP adapter: 4 tools (retrieve, write, link, health). Claude and Cursor auto-discover them via VS Code
REST API: 10 endpoints, any HTTP client
Demo: demo/demo.py shows two agents sharing the same memory system

How It Works, End to End

  WRITE                    BACKGROUND                 RETRIEVE
 ───────                  ──────────                ─────────
 
 AI writes          Worker picks up           AI asks question
 memory via         un-embedded               via REST or MCP
 REST or MCP        memories every 10s
     │                    │                        │
     ▼                    ▼                        ▼
 ┌─────────┐      ┌──────────────┐         ┌──────────────┐
 │ Validate│      │ Embed via    │         │ Embed query  │
 │+Contra- │      │ all-MiniLM-  │         │ (same model) │
 │ diction │      │ L6-v2 (CPU)  │         │ → 384d vec   │
 └────┬────┘      └──────┬───────┘         └──────┬───────┘
      │                  │                        │
      ▼                  ▼                        ▼
 ┌─────────┐      ┌──────────────┐         ┌──────────────┐
 │ INSERT  │      │ UPDATE       │         │ Phase A: ANN │
 │ memories│      │ embedding    │         │ pre-filter   │
 │+events  │      │ column       │         │ 36 candidates│
 └─────────┘      └──────────────┘         └──────┬───────┘
                                                  │
      ┌───────────────────────────────────────────┘
      │
      ▼
 ┌────────────────────────────────────┐
 │ Phase B: Score 36 candidates       │
 │ score = sem×0.40 + rec×0.20       │
 │       + imp×0.20 + freq×0.10      │
 │ × scope_bonus × status_multiplier │
 │ → keep top 12                      │
 └────────────┬───────────────────────┘
              │
              ▼
 ┌────────────────────────────┐
 │ Phase C: Graph traversal   │
 │ Follow relations depth 2   │
 │ score × weight × 0.7^depth │
 │ Deduplicate, sort           │
 └────────────┬───────────────┘
              │
              ▼
 ┌────────────────────────────┐       ┌──────────────────┐
 │ Assemble response          │       │ Cache in Redis   │
 │ + assembled_context string │──────▶│ 60s TTL          │
 │ + scored memory list       │       │ Next query: 14ms │
 └────────────────────────────┘       └──────────────────┘

Every memory flows through Store → Embed → Index → Retrieve → Score → Deliver.

1. Store

An AI writes a memory via REST or MCP:

POST /memories
{
  "type": "fact",
  "key": "database",
  "value": "PostgreSQL 16 with pgvector",
  "confidence": 0.95
}

What happens:

Contradiction check: The engine looks for an existing memory with the same (subject_id, key). If found with a different value:
- New confidence > old + 0.2 → old is superseded
- Source is explicit → old is superseded
- Otherwise → old confidence decays 20%, both kept
Immutable event: Every write and status change appends to memory_events. This table is append-only, never updated, never deleted. It uses a BIGINT auto-increment ID with a space-efficient BRIN index.
Columns populated: recency_score = 1.0, frequency_score = 0.0, embedding = NULL (filled asynchronously)

2. Embed

The background worker picks up un-embedded memories every 10 seconds:

Worker: SELECT * FROM memories WHERE embedding IS NULL LIMIT 50
        → call all-MiniLM-L6-v2 (Apache 2.0, 384-dim, CPU)
        → UPDATE memories SET embedding = $vector

The memory becomes semantically searchable within about 10 seconds. Until then, it's still findable through recency, importance, and keyword search. Only the semantic similarity score stays at zero during that brief window.

3. Index

Eight PostgreSQL indexes make retrieval fast:

Index	Type	What It Speeds Up	Speedup at 100K rows
`ix_memories_subject_scope_status`	Composite B-tree	Filtering by owner + scope + status	~400× vs seq scan
`ix_memories_embedding_ivfflat`	IVFFlat vector	ANN cosine similarity search	~50× vs brute force
`ix_memories_subject_key`	B-tree	Contradiction detection on write	O(log n)
`ix_memories_session_active`	Partial B-tree	Session cleanup (only 5% of table)	~20× smaller
`ix_memory_relations_from/to`	B-tree ×2	Bidirectional graph traversal	O(log n)
`ix_memory_events_memory_ts`	Composite B-tree	Per-memory event history	O(log n)
`ix_memory_events_ts_brin`	BRIN	Chronological event scans	~1,000× smaller than B-tree

4. Retrieve

An AI asks a question:

POST /memories/retrieve
{ "prompt": "What database does this project use?", "mode": "smart", "top_k": 12 }

Phase A: ANN Pre-Filter (PostgreSQL, about 3ms)

SELECT * FROM memories
WHERE subject_id = $1 AND scope = ANY($2) AND status = ANY($3)
ORDER BY embedding <=> $query_vector   -- pgvector cosine distance
LIMIT 36                                -- 3× top_k

Returns 36 candidates. The IVFFlat index avoids scanning all 100K vectors.

Phase B: Score and Re-Rank (Python, about 2ms)

Each candidate gets a composite score:

score = (
    semantic_similarity × 0.40   ← cosine(query_vec, memory_vec), mapped to [0,1]
  + recency_score       × 0.20   ← e^(-λ × age_hours), pre-computed, hourly refresh
  + importance          × 0.20   ← user-set at write time
  + frequency_score     × 0.10   ← LOG2(access_count + 1) / 10, pre-computed
) × scope_bonus × status_multiplier

Weights vary by mode:

Mode	sem	rec	imp	freq	Graph	Use
smart	0.40	0.20	0.20	0.10	depth 2	General queries
focused	0.55	0.15	0.15	0.15	depth 1	Coding assistants
deep	0.30	0.10	0.25	0.05	depth 3	Summarization
working	0.50	0.40	0.05	0.05	none	Turn-by-turn chat

Scope bonus: exact match ×1.20, parent scope ×1.10, unrelated ×1.00.
Status penalty: active ×1.0, stale ×0.4, superseded ×0.1, archived ×0.0 (excluded).

The 36 candidates are sorted by score. Only the top 12 proceed to graph traversal.

Phase C: Graph Traversal (5 to 10ms)

For each top-12 memory, the engine follows memory_relations edges:

traversed_score = parent_score × relation_weight × 0.7^depth

Depth 1: ×0.70   closely related
Depth 2: ×0.49   moderately related
Depth 3: ×0.34   distantly related

This surfaces connected memories that pure vector search would miss. A memory about "PostgreSQL 15" can pull in a related memory about "Flyway migrations" through a depends_on edge, even if that memory wouldn't rank highly on vector similarity alone.

Traversed nodes already in the result set are deduplicated, keeping the higher score. Running graph traversal only on the top 12, instead of all 36 ANN candidates, cuts SQL queries by 67%.

5. Deliver

The final response is assembled:

{
  "memories": [
    {
      "key": "database",
      "value": "PostgreSQL 16 with pgvector",
      "score": 0.687,
      "confidence": 0.95,
      "status": "active"
    }
  ],
  "assembled_context": "PostgreSQL 16 with pgvector. Deploy via docker compose...",
  "graph_nodes_traversed": 2,
  "conflicts_suppressed": 0
}

assembled_context is the most important field. The AI never parses memory internals. It takes this string and places it directly into its system prompt, right before the user's message:

System: You are a coding assistant. User context: PostgreSQL 16 with pgvector.
        Deploy via docker compose up -d on VPS. Preferred language is Go.
User:   How should I set up the database?

6. Cache (Redis)

Every retrieval result is cached in Redis:

Key:   om:retrieve:{sha256(prompt:subject:mode:top_k)[:16]}
TTL:   60 seconds
Invalidated: on any write to the same subject

First query (cold): around 26 seconds (embedding, ANN, scoring, graph).
Second query (cache hit): around 14 milliseconds. That's about 1,800 times faster.

The cache hit rate is 40–60% in active sessions with repeated or similar queries. Combined with rate limiting (Redis sliding window, 100 req/s retrieval burst), the system stays responsive under load.

API (12 Endpoints)

Method	Endpoint	Description
`GET`	`/health`	System health check
`POST`	`/memories`	Write a single memory
`POST`	`/memories/batch`	Write up to 50 memories atomically
`GET`	`/memories`	List memories with pagination + filters (new)
`GET`	`/memories/{id}`	Read a specific memory
`PATCH`	`/memories/{id}/status`	Mark stale, superseded, or archived
`POST`	`/memories/retrieve`	Smart retrieval (embed → ANN → score → graph → context)
`POST`	`/memories/context`	Assembled context string only
`GET`	`/memories/search`	Keyword search (ILIKE)
`GET`	`/memories/graph/{id}`	Graph traversal from a memory
`POST`	`/memories/relations`	Link two memories (graph edge) (new)
`GET`	`/events`	Immutable event log (cursor-paginated)

Full API reference: documents/03-api-reference.md

Example: Smart Retrieval

curl -X POST localhost:8000/memories/retrieve \
  -H 'Content-Type: application/json' \
  -d '{
    "prompt": "What database does this project use?",
    "mode": "smart",
    "top_k": 5
  }'

{
  "memories": [
    {
      "key": "database",
      "value": "PostgreSQL 16 + pgvector for semantic search",
      "score": 0.687,
      "status": "active",
      "confidence": 0.95
    }
  ],
  "assembled_context": "PostgreSQL 16 + pgvector for semantic search. Deploy via docker compose...",
  "graph_nodes_traversed": 2
}

Configuration

All settings in .env (copy from .env.example):

# Embedding provider (local = free, open-source, zero API cost)
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=all-MiniLM-L6-v2
EMBEDDING_DIM=384

# To switch to OpenAI:
# EMBEDDING_PROVIDER=openai
# EMBEDDING_MODEL=text-embedding-3-small
# EMBEDDING_DIM=1536
# OPENAI_API_KEY=sk-...

Architecture

AI Clients (Claude, ChatGPT, Cursor, Custom)
        │
        ├── MCP Protocol (stdio) ──► mcp_server/server.py
        │                              │
        └── REST (HTTP) ──────────────┤
                                       ▼
                              ┌─────────────────┐
                              │  FastAPI :8000   │
                              │  10 endpoints    │
                              └───┬────┬────┬───┘
                                  │    │    │
                    ┌─────────────┘    │    └──────────────┐
                    ▼                  ▼                   ▼
            ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
            │ PostgreSQL 16 │  │   Redis 7    │  │   Worker     │
            │  + pgvector   │  │  Cache/Rate  │  │  5 BG Jobs   │
            └──────────────┘  └──────────────┘  └──────────────┘

Demo

cd demo
python3 demo.py

Shows multi-agent shared memory: Agent A (Claude) writes project knowledge, Agent B (Cursor) retrieves it. Contradiction, caching, and lifecycle all demonstrated. Optionally uses LM Studio (localhost:1234) for AI-generated content.

Stack

Layer	Technology
API	FastAPI (Python 3.12), async
Database	PostgreSQL 16 + pgvector
Cache	Redis 7
Embeddings	sentence-transformers (Apache 2.0)
Background	asyncio (5 concurrent jobs)
MCP	mcp 1.27.1 (stdio transport)
DevOps	Docker Compose (4 services)

Documentation

12 detailed technical documents in documents/:

#	Document	Content
1	Architecture Overview	System design, Docker, data flow
2	Schema Design	Tables, indexes, storage analysis
3	API Reference	All 10 endpoints with curl examples
4	Retrieval Scoring	Formula, modes, two-phase pipeline
5	Performance Analysis	Latency budget, index speedups, cache
6	Contradiction Engine	Detection + resolution algorithms
7	Implementation Log	Phase 1-4 build history, bugs, decisions
8	Embedding Strategy	Local-first, model comparison, config
9	Phase 2: Retrieval	Pipeline, graph, code reference
10	Phase 3: Jobs and Caching	Workers, Redis, quantitative impact
11	Phase 4: MCP Adapter	Claude and Cursor integration, tools
12	Confidence & Importance Standards	Agent scoring guidelines, contradiction impact

Testing

pip install pytest pytest-asyncio
python -m pytest tests/ -v
# 12 passed

License

Apache 2.0

The default embedding model (all-MiniLM-L6-v2) is separately licensed under Apache 2.0 by its authors.

Status

v0.5 - All four build phases are complete. The system has 10 API endpoints, 5 background workers, 4 retrieval modes, an MCP adapter for Claude and Cursor, a full test suite, and 12 technical documents.

Built and tested. Ready to use.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
api		api
demo		demo
documents		documents
mcp_server		mcp_server
migrations		migrations
tests		tests
worker		worker
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenMemory

Philosophy

Quick Start

Features

Memory Store

Confidence and Importance Standards

Retrieval Engine

Embeddings

Background Workers (5 jobs)

Performance

AI Integration

How It Works, End to End

1. Store

2. Embed

3. Index

4. Retrieve

5. Deliver

6. Cache (Redis)

API (12 Endpoints)

Example: Smart Retrieval

Configuration

Architecture

Demo

Stack

Documentation

Testing

License

Status

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenMemory

Philosophy

Quick Start

Features

Memory Store

Confidence and Importance Standards

Retrieval Engine

Embeddings

Background Workers (5 jobs)

Performance

AI Integration

How It Works, End to End

1. Store

2. Embed

3. Index

4. Retrieve

5. Deliver

6. Cache (Redis)

API (12 Endpoints)

Example: Smart Retrieval

Configuration

Architecture

Demo

Stack

Documentation

Testing

License

Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages