Universal memory protocol and engine for AI systems.
Claude, ChatGPT, Cursor, custom agents. Any AI can read and write shared, structured memory through a standard protocol. Think of it as the common memory layer every AI system should have had from the start.
"Three different AI systems sharing one OpenMemory instance, each retrieving context written by another." This is what we set out to build.
- Memory is external. AI models are stateless reasoning nodes. All continuity comes from the memory layer.
- Local-first, cloud-optional. Default embeddings run locally (Apache 2.0, 384-dim, CPU). Zero API cost. Zero data leaving your machine.
- Never delete. Status transitions only. Full immutable event log.
- Protocol, not product. REST + MCP. Any AI that speaks HTTP can use it.
git clone https://github.com/AzazAhmedLipu79/openmemory
cd openmemory
cp .env.example .env
docker compose up -d
docker compose exec api alembic upgrade headVerify:
curl localhost:8000/health
# {"status":"healthy","db":"connected","redis":"connected"}
curl -X POST localhost:8000/memories \
-H 'Content-Type: application/json' \
-d '{"type":"preference","key":"language","value":"Go","confidence":0.9}'
curl -X POST localhost:8000/memories/retrieve \
-H 'Content-Type: application/json' \
-d '{"prompt":"What language does the user prefer?","top_k":3}'- 6 memory types: preference, belief, fact, task, episodic, procedural
- 4 scopes: user, project, session, team
- Status lifecycle: active → stale → superseded → archived (never deleted)
- Confidence scoring: 0.0–1.0, evolves with contradictions
- Immutable event log: every action traced, cursor-paginated
Agents use consistent scoring so memory quality stays uniform across all AIs:
| Memory Type | Confidence | Importance | Reason |
|---|---|---|---|
preference |
0.85-0.90 | 0.80-0.90 | User stated, affects every decision |
fact |
0.85-1.00 | 0.70-0.85 | Verifiable project knowledge |
procedural |
0.85-0.95 | 0.80-0.90 | Tested workflow, critical |
task |
0.70-0.90 | 0.50-0.70 | Commitment, matters until done |
belief |
0.50-0.80 | 0.40-0.60 | Educated guess, might change |
episodic |
0.60-0.80 | 0.20-0.40 | Past event, context that fades |
| Source | Confidence Effect |
|---|---|
explicit (directly stated) |
+0.15 to +0.20 boost |
observed (noticed pattern) |
Baseline |
inferred (deduced) |
-0.15 to -0.20 penalty |
Full details: documents/12-confidence-importance-standards.md
- 4-signal composite scoring: semantic similarity + recency + importance + frequency
- 4 retrieval modes: smart, focused, deep, working
- Two-phase pipeline: pgvector ANN pre-filter → score + re-rank
-
Graph traversal: depth-decay scoring (
$0.7^d$ ) acrossmemory_relations - Assembled context: ready-to-inject string for AI system prompts
- Default:
all-MiniLM-L6-v2(Apache 2.0, 384 dimensions, 80MB, runs on CPU) - Configurable: OpenAI
text-embedding-3-small, Ollamanomic-embed-text, any sentence-transformers model - Async generation: memories written immediately, embeddings generated within 10s by background worker
| Job | Interval | Purpose |
|---|---|---|
| Auto-embedding | 10s | Generate embeddings for new memories |
| Access flush | 60s | Flush Redis access counters → DB |
| Recency recompute | Hourly | Update |
| Session cleanup | Hourly | Archive expired session memories |
| Confidence decay | 30 days | Decay confidence for untouched memories |
- Cold retrieval: ~26s (full pipeline)
- Cache hit: ~14ms (~1,800× faster)
- Rate limiting: Redis sliding window, 100 req/s retrieval, 20 req/s writes
- Connection pool: 5–20 connections, 5s statement timeout
- MCP adapter: 4 tools (retrieve, write, link, health). Claude and Cursor auto-discover them via VS Code
- REST API: 10 endpoints, any HTTP client
- Demo:
demo/demo.pyshows two agents sharing the same memory system
WRITE BACKGROUND RETRIEVE
─────── ────────── ─────────
AI writes Worker picks up AI asks question
memory via un-embedded via REST or MCP
REST or MCP memories every 10s
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────────┐ ┌──────────────┐
│ Validate│ │ Embed via │ │ Embed query │
│+Contra- │ │ all-MiniLM- │ │ (same model) │
│ diction │ │ L6-v2 (CPU) │ │ → 384d vec │
└────┬────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────────┐ ┌──────────────┐
│ INSERT │ │ UPDATE │ │ Phase A: ANN │
│ memories│ │ embedding │ │ pre-filter │
│+events │ │ column │ │ 36 candidates│
└─────────┘ └──────────────┘ └──────┬───────┘
│
┌───────────────────────────────────────────┘
│
▼
┌────────────────────────────────────┐
│ Phase B: Score 36 candidates │
│ score = sem×0.40 + rec×0.20 │
│ + imp×0.20 + freq×0.10 │
│ × scope_bonus × status_multiplier │
│ → keep top 12 │
└────────────┬───────────────────────┘
│
▼
┌────────────────────────────┐
│ Phase C: Graph traversal │
│ Follow relations depth 2 │
│ score × weight × 0.7^depth │
│ Deduplicate, sort │
└────────────┬───────────────┘
│
▼
┌────────────────────────────┐ ┌──────────────────┐
│ Assemble response │ │ Cache in Redis │
│ + assembled_context string │──────▶│ 60s TTL │
│ + scored memory list │ │ Next query: 14ms │
└────────────────────────────┘ └──────────────────┘
Every memory flows through Store → Embed → Index → Retrieve → Score → Deliver.
An AI writes a memory via REST or MCP:
POST /memories
{
"type": "fact",
"key": "database",
"value": "PostgreSQL 16 with pgvector",
"confidence": 0.95
}
What happens:
- Contradiction check: The engine looks for an existing memory with the
same
(subject_id, key). If found with a different value:- New confidence > old + 0.2 → old is superseded
- Source is
explicit→ old is superseded - Otherwise → old confidence decays 20%, both kept
- Immutable event: Every write and status change appends to
memory_events. This table is append-only, never updated, never deleted. It uses aBIGINTauto-increment ID with a space-efficient BRIN index. - Columns populated:
recency_score= 1.0,frequency_score= 0.0,embedding= NULL (filled asynchronously)
The background worker picks up un-embedded memories every 10 seconds:
Worker: SELECT * FROM memories WHERE embedding IS NULL LIMIT 50
→ call all-MiniLM-L6-v2 (Apache 2.0, 384-dim, CPU)
→ UPDATE memories SET embedding = $vector
The memory becomes semantically searchable within about 10 seconds. Until then, it's still findable through recency, importance, and keyword search. Only the semantic similarity score stays at zero during that brief window.
Eight PostgreSQL indexes make retrieval fast:
| Index | Type | What It Speeds Up | Speedup at 100K rows |
|---|---|---|---|
ix_memories_subject_scope_status |
Composite B-tree | Filtering by owner + scope + status | ~400× vs seq scan |
ix_memories_embedding_ivfflat |
IVFFlat vector | ANN cosine similarity search | ~50× vs brute force |
ix_memories_subject_key |
B-tree | Contradiction detection on write | O(log n) |
ix_memories_session_active |
Partial B-tree | Session cleanup (only 5% of table) | ~20× smaller |
ix_memory_relations_from/to |
B-tree ×2 | Bidirectional graph traversal | O(log n) |
ix_memory_events_memory_ts |
Composite B-tree | Per-memory event history | O(log n) |
ix_memory_events_ts_brin |
BRIN | Chronological event scans | ~1,000× smaller than B-tree |
An AI asks a question:
POST /memories/retrieve
{ "prompt": "What database does this project use?", "mode": "smart", "top_k": 12 }
Phase A: ANN Pre-Filter (PostgreSQL, about 3ms)
SELECT * FROM memories
WHERE subject_id = $1 AND scope = ANY($2) AND status = ANY($3)
ORDER BY embedding <=> $query_vector -- pgvector cosine distance
LIMIT 36 -- 3× top_kReturns 36 candidates. The IVFFlat index avoids scanning all 100K vectors.
Phase B: Score and Re-Rank (Python, about 2ms)
Each candidate gets a composite score:
score = (
semantic_similarity × 0.40 ← cosine(query_vec, memory_vec), mapped to [0,1]
+ recency_score × 0.20 ← e^(-λ × age_hours), pre-computed, hourly refresh
+ importance × 0.20 ← user-set at write time
+ frequency_score × 0.10 ← LOG2(access_count + 1) / 10, pre-computed
) × scope_bonus × status_multiplier
Weights vary by mode:
| Mode | sem | rec | imp | freq | Graph | Use |
|---|---|---|---|---|---|---|
| smart | 0.40 | 0.20 | 0.20 | 0.10 | depth 2 | General queries |
| focused | 0.55 | 0.15 | 0.15 | 0.15 | depth 1 | Coding assistants |
| deep | 0.30 | 0.10 | 0.25 | 0.05 | depth 3 | Summarization |
| working | 0.50 | 0.40 | 0.05 | 0.05 | none | Turn-by-turn chat |
Scope bonus: exact match ×1.20, parent scope ×1.10, unrelated ×1.00.
Status penalty: active ×1.0, stale ×0.4, superseded ×0.1, archived ×0.0 (excluded).
The 36 candidates are sorted by score. Only the top 12 proceed to graph traversal.
Phase C: Graph Traversal (5 to 10ms)
For each top-12 memory, the engine follows memory_relations edges:
traversed_score = parent_score × relation_weight × 0.7^depth
Depth 1: ×0.70 closely related
Depth 2: ×0.49 moderately related
Depth 3: ×0.34 distantly related
This surfaces connected memories that pure vector search would miss.
A memory about "PostgreSQL 15" can pull in a related memory about
"Flyway migrations" through a depends_on edge, even if that memory
wouldn't rank highly on vector similarity alone.
Traversed nodes already in the result set are deduplicated, keeping the higher score. Running graph traversal only on the top 12, instead of all 36 ANN candidates, cuts SQL queries by 67%.
The final response is assembled:
{
"memories": [
{
"key": "database",
"value": "PostgreSQL 16 with pgvector",
"score": 0.687,
"confidence": 0.95,
"status": "active"
}
],
"assembled_context": "PostgreSQL 16 with pgvector. Deploy via docker compose...",
"graph_nodes_traversed": 2,
"conflicts_suppressed": 0
}assembled_context is the most important field. The AI never parses
memory internals. It takes this string and places it directly into its
system prompt, right before the user's message:
System: You are a coding assistant. User context: PostgreSQL 16 with pgvector.
Deploy via docker compose up -d on VPS. Preferred language is Go.
User: How should I set up the database?
Every retrieval result is cached in Redis:
Key: om:retrieve:{sha256(prompt:subject:mode:top_k)[:16]}
TTL: 60 seconds
Invalidated: on any write to the same subject
First query (cold): around 26 seconds (embedding, ANN, scoring, graph).
Second query (cache hit): around 14 milliseconds. That's about 1,800 times faster.
The cache hit rate is 40–60% in active sessions with repeated or similar queries. Combined with rate limiting (Redis sliding window, 100 req/s retrieval burst), the system stays responsive under load.
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
System health check |
POST |
/memories |
Write a single memory |
POST |
/memories/batch |
Write up to 50 memories atomically |
GET |
/memories |
List memories with pagination + filters (new) |
GET |
/memories/{id} |
Read a specific memory |
PATCH |
/memories/{id}/status |
Mark stale, superseded, or archived |
POST |
/memories/retrieve |
Smart retrieval (embed → ANN → score → graph → context) |
POST |
/memories/context |
Assembled context string only |
GET |
/memories/search |
Keyword search (ILIKE) |
GET |
/memories/graph/{id} |
Graph traversal from a memory |
POST |
/memories/relations |
Link two memories (graph edge) (new) |
GET |
/events |
Immutable event log (cursor-paginated) |
Full API reference: documents/03-api-reference.md
curl -X POST localhost:8000/memories/retrieve \
-H 'Content-Type: application/json' \
-d '{
"prompt": "What database does this project use?",
"mode": "smart",
"top_k": 5
}'{
"memories": [
{
"key": "database",
"value": "PostgreSQL 16 + pgvector for semantic search",
"score": 0.687,
"status": "active",
"confidence": 0.95
}
],
"assembled_context": "PostgreSQL 16 + pgvector for semantic search. Deploy via docker compose...",
"graph_nodes_traversed": 2
}All settings in .env (copy from .env.example):
# Embedding provider (local = free, open-source, zero API cost)
EMBEDDING_PROVIDER=local
EMBEDDING_MODEL=all-MiniLM-L6-v2
EMBEDDING_DIM=384
# To switch to OpenAI:
# EMBEDDING_PROVIDER=openai
# EMBEDDING_MODEL=text-embedding-3-small
# EMBEDDING_DIM=1536
# OPENAI_API_KEY=sk-...AI Clients (Claude, ChatGPT, Cursor, Custom)
│
├── MCP Protocol (stdio) ──► mcp_server/server.py
│ │
└── REST (HTTP) ──────────────┤
▼
┌─────────────────┐
│ FastAPI :8000 │
│ 10 endpoints │
└───┬────┬────┬───┘
│ │ │
┌─────────────┘ │ └──────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ PostgreSQL 16 │ │ Redis 7 │ │ Worker │
│ + pgvector │ │ Cache/Rate │ │ 5 BG Jobs │
└──────────────┘ └──────────────┘ └──────────────┘
cd demo
python3 demo.pyShows multi-agent shared memory: Agent A (Claude) writes project knowledge,
Agent B (Cursor) retrieves it. Contradiction, caching, and lifecycle all
demonstrated. Optionally uses LM Studio (localhost:1234) for AI-generated content.
| Layer | Technology |
|---|---|
| API | FastAPI (Python 3.12), async |
| Database | PostgreSQL 16 + pgvector |
| Cache | Redis 7 |
| Embeddings | sentence-transformers (Apache 2.0) |
| Background | asyncio (5 concurrent jobs) |
| MCP | mcp 1.27.1 (stdio transport) |
| DevOps | Docker Compose (4 services) |
12 detailed technical documents in documents/:
| # | Document | Content |
|---|---|---|
| 1 | Architecture Overview | System design, Docker, data flow |
| 2 | Schema Design | Tables, indexes, storage analysis |
| 3 | API Reference | All 10 endpoints with curl examples |
| 4 | Retrieval Scoring | Formula, modes, two-phase pipeline |
| 5 | Performance Analysis | Latency budget, index speedups, cache |
| 6 | Contradiction Engine | Detection + resolution algorithms |
| 7 | Implementation Log | Phase 1-4 build history, bugs, decisions |
| 8 | Embedding Strategy | Local-first, model comparison, config |
| 9 | Phase 2: Retrieval | Pipeline, graph, code reference |
| 10 | Phase 3: Jobs and Caching | Workers, Redis, quantitative impact |
| 11 | Phase 4: MCP Adapter | Claude and Cursor integration, tools |
| 12 | Confidence & Importance Standards | Agent scoring guidelines, contradiction impact |
pip install pytest pytest-asyncio
python -m pytest tests/ -v
# 12 passedApache 2.0
The default embedding model (all-MiniLM-L6-v2) is separately licensed under Apache 2.0 by its authors.
v0.5 - All four build phases are complete. The system has 10 API endpoints, 5 background workers, 4 retrieval modes, an MCP adapter for Claude and Cursor, a full test suite, and 12 technical documents.
Built and tested. Ready to use.