Memories

Local semantic memory for AI assistants. Zero-cost, <50ms, hybrid BM25+vector search.

Works with Claude Code, Claude Desktop, Claude Chat, Codex, Cursor, ChatGPT, OpenClaw, and anything that can call HTTP or MCP.

Key capabilities (v5.3.0):

Hybrid search — BM25 + vector + recency + feedback + confidence + graph (6-signal RRF fusion with PPR-scored graph expansion)
Graph-aware retrieval — automatic related_to links between memories, PPR-scored multi-hop traversal, +20% retrieval lift on 2-hop benchmarks
Temporal reasoning — document_at timestamps, version preservation on UPDATE (archive + supersedes link), since/until date-range filters
Automatic extraction — LLM-powered AUDN (Add/Update/Delete/Noop) with dry-run, per-fact approval, and auto-linking
Operator workbench — create, edit, merge, bulk actions, extraction trigger, lifecycle panel, conflict resolution
Feedback-weighted ranking — search learns from useful/not_useful signals
Lifecycle policies — per-prefix TTL and confidence-based auto-archive with operator-visible evidence
Quality benchmarks — LongMemEval + MuSiQue eval harnesses with graph and temporal benchmarking
Full audit trail — every mutation tracked, lifecycle timeline in UI, version chains via supersedes links
Self-hosted — your data, your infrastructure, no cloud dependency

Start here:

API Quick Start

# 1. Clone and build
git clone git@github.com:divyekant/memories.git
cd memories
docker compose -f docker-compose.snippet.yml up -d

# 2. Verify
curl http://localhost:8900/health

# 3. Add a memory
curl -X POST http://localhost:8900/memory/add \
  -H "Content-Type: application/json" \
  -d '{"text": "Always use TypeScript strict mode", "source": "standards.md"}'

# 4. Search
curl -X POST http://localhost:8900/search \
  -H "Content-Type: application/json" \
  -d '{"query": "TypeScript config", "k": 3, "hybrid": true}'

The service runs at http://localhost:8900. API docs at http://localhost:8900/docs. Web UI at http://localhost:8900/ui.

Web UI

The built-in UI at /ui provides:

Dashboard — memory stats, extraction metrics, server info
Memories — browse, search, filter, and manage memories with list+detail or grid view
- Create, inline edit, pin/archive with undo, bulk actions (archive/delete/retag/re-source/merge)
- Extraction trigger with dry-run preview and per-fact approve/reject
- Tabbed detail panel: Overview (edit) | Lifecycle (origin, confidence, audit timeline, feedback history) | Links
- Conflict resolution modal (Keep A/B/Merge/Defer with soft archive)
Extractions — extraction job stats and token usage
API Keys — configure authentication
Health — conflicts, problem queries (negative feedback), stale memories (retrieved but never useful), evidence strength badges
Settings — provider config, server info, theme toggle (dark/light/system), export and maintenance

No build step — vanilla JS + CSS served directly from webui/.

CLI

The memories CLI provides full access to the API from your terminal.

Install

pip install -e .
# Or if using the Docker image, the CLI is included

Usage

# Search
memories search "TypeScript config"

# Add a memory
memories add "Always use strict mode" --source standards

# List memories
memories list --source standards

# Check novelty before adding
memories is-novel "TypeScript strict mode"

# Batch operations
memories batch add memories.jsonl

# Admin
memories admin stats
memories admin health

# Backups
memories backup create
memories backup list

# Full help
memories --help

Export & Import

# Export all memories
memories export -o backup.jsonl

# Export filtered by source
memories export --source "claude-code/" -o project.jsonl

# Export with date range
memories export --source "proj/" --since 2026-01-01 -o recent.jsonl

# Import (clean migration)
memories import backup.jsonl

# Import with smart dedup
memories import backup.jsonl --strategy smart

# Import with source remapping
memories import backup.jsonl --source-remap "old/=new/"

Agent Integration

The CLI auto-detects when piped and outputs JSON:

# JSON output for agents (automatic when piped)
memories search "auth" | jq '.data.results[0].text'

# Force JSON in any context
memories --json search "auth"

# Force human-readable when piped
memories --pretty list

Configuration

# Set server URL
memories config set url http://localhost:8900

# Set API key
memories config set api_key your-key-here

# View resolved config
memories config show

Config resolution: CLI flags > ~/.config/memories/config.json > env vars > defaults.

Architecture

AI Client (Claude, Codex, Cursor, ChatGPT, OpenClaw)
    |
    |-- MCP protocol (Claude Code / Desktop / Codex / Cursor)
    |-- REST API (everything else)
    v
MCP Server (mcp-server/index.js)
    |
    v
Memories Service (Docker :8900)
    |-- FastAPI REST API
    |-- Hybrid Search (Memories vector + BM25 keyword, RRF fusion)
    |-- Markdown-aware chunking
    |-- Event Bus (SSE stream + webhook delivery)
    |-- Audit Log (append-only trail)
    |-- Memory Relationships (graph edges between memories)
    |-- Confidence Decay (time-based relevance attenuation)
    |-- Auto-backups
    v
Persistent Storage (data/)
    |-- Qdrant vector store (embeddings + metadata)
    |-- metadata.json (memory text + metadata)
    |-- backups/ (auto, keeps last 10)

Detailed docs:

Integration Guides

Claude Code (CLI)

The MCP server gives Claude Code native memory_search, memory_add, memory_extract, memory_delete, memory_delete_batch, memory_delete_by_source, memory_count, memory_list, memory_stats, memory_is_novel, memory_is_useful, and memory_conflicts tools.

Setup:

Install the MCP server dependencies:

cd memories/mcp-server
npm install

Register the MCP server with Claude Code (user scope — available in every project):

claude mcp add -s user \
  -e MEMORIES_URL=http://localhost:8900 \
  -e MEMORIES_API_KEY=your-api-key-here \
  -- memories node /path/to/memories/mcp-server/index.js

This writes to ~/.claude.json. Do not add MCP servers to ~/.claude/settings.json or ~/.claude/.mcp.json — Claude Code CLI does not read MCP config from those files (Claude Desktop uses separate config, see below).

Restart Claude Code. The tools are now available in every project.
(Optional) Install the Memories skill for disciplined memory capture and proactive recall:

mkdir -p ~/.claude/skills/memories
ln -s /path/to/memories/skills/memories ~/.claude/skills/memories

The skill teaches the assistant three responsibilities: when to search (proactive recall), when and how to store (hybrid memory_add + memory_extract), and when to maintain (updates, deletes, cleanup via AUDN). It adds ~11% token overhead but improves memory discipline by ~43% in eval benchmarks.

(Recommended) Install the CC plugin for a single-step setup that bundles hooks, skills, and CLAUDE.md:

Hooks, skills, and CLAUDE.md are now packaged as a Claude Code plugin in the plugin/ directory. See plugin/INSTALL.md for details.

Usage (Claude Code will call these automatically when relevant):

"Search my memory for authentication patterns"
"Remember that we decided to use Prisma for the ORM"
"Check if this pattern is already in memory before adding it"
"Show me all memories from the bug-fixes source"

For a single project only, use project scope instead:

claude mcp add -s project \
  -e MEMORIES_URL=http://localhost:8900 \
  -e MEMORIES_API_KEY=your-api-key-here \
  -- memories node /path/to/memories/mcp-server/index.js

Claude Desktop (Chat / Cowork)

Same MCP server, different config file. Claude Desktop reads MCP config from its own config file — not from ~/.claude.json (which is Claude Code CLI only).

Setup:

Install dependencies (same as above):

cd memories/mcp-server
npm install

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "memories": {
      "command": "node",
      "args": ["/path/to/memories/mcp-server/index.js"],
      "env": {
        "MEMORIES_URL": "http://localhost:8900",
        "MEMORIES_API_KEY": "your-api-key-here"
      }
    }
  }
}

Restart the Claude Desktop app. Memory tools appear in chat and cowork mode.

Claude Chat (Web at claude.ai)

Claude Chat on the web does not support MCP directly. Two options:

Option A: Remote MCP via Cloudflare Tunnel (recommended)

If you expose the Memories service via a tunnel (e.g., memory.yourdomain.com), you can use Claude's remote MCP connector feature to connect to it. See the Remote Access section below.

Option B: Manual curl in prompts

Paste curl commands in your messages and ask Claude to interpret the results:

Search my memory service for React patterns:

curl -X POST https://memory.yourdomain.com/search \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_KEY" \
  -d '{"query": "React patterns", "k": 5, "hybrid": true}'

Codex (OpenAI)

Codex supports MCP natively via ~/.codex/config.toml.

Repo-local Codex plugin (optional):

This repository now includes a repo-local Codex plugin at plugins/memories, exposed through .agents/plugins/marketplace.json. If you're working inside this checkout, install the memories plugin from the repo marketplace and run $memories:setup. That skill bootstraps the canonical Codex installer from the repo root rather than duplicating machine-specific paths inside the cached plugin copy.

Setup:

Install dependencies:

cd memories/mcp-server
npm install

Add to ~/.codex/config.toml:

[mcp_servers.memories]
command = "node"
args = ["/path/to/memories/mcp-server/index.js"]

[mcp_servers.memories.env]
MEMORIES_URL = "http://localhost:8900"
MEMORIES_API_KEY = "your-api-key-here"

If your API key is prefix-scoped and does not allow codex/*, set hook source overrides in ~/.config/memories/env:

MEMORIES_SOURCE_PREFIXES="your-authorized-prefix/{project},learning/{project},wip/{project}"
MEMORIES_EXTRACT_SOURCE="your-authorized-prefix/{project}"

Restart Codex. The memory_search, memory_add, memory_extract, memory_delete, memory_delete_by_source, memory_count, memory_list, memory_stats, memory_is_novel, and other tools will be available.

Automatic memory layer for Codex:

cd memories/mcp-server
npm install
cd ..
./integrations/claude-code/install.sh --codex

This configures:

5 Codex hooks in ~/.codex/hooks.json (SessionStart, UserPromptSubmit, Stop, PreToolUse, PostToolUse)
hook scripts in ~/.codex/hooks/memory/
MCP server registration in ~/.codex/config.toml
default developer_instructions (if not already set) to bias memory_search usage on each turn
hook env loading from ~/.config/memories/env (or MEMORIES_ENV_FILE) for MEMORIES_URL, MEMORIES_API_KEY, and optional source overrides (MEMORIES_SOURCE_PREFIXES, MEMORIES_EXTRACT_SOURCE)

The installer requires jq, curl, and a running Memories service (/health must respond). For scoped API keys, set MEMORIES_SOURCE_PREFIXES and MEMORIES_EXTRACT_SOURCE so hook reads/writes stay inside authorized prefixes.

Codex uses ~/.codex/hooks.json for lifecycle hooks, ~/.codex/settings.json for permissions, and ~/.codex/config.toml for MCP + developer instructions.

Multi-backend: Codex uses its own hook scripts in integrations/codex/hooks/, and they honor the same multi-backend routing env/config described in multi-backend routing.

Usage (Codex will discover the tools automatically):

"Search memory for how we handle error logging"
"Store this architecture decision in memory"
"List all memories from the project-setup source"

Cursor

Cursor supports MCP with the same server.

Setup:

Install dependencies:

cd memories/mcp-server
npm install

Add to Cursor MCP config:

Global: ~/.cursor/mcp.json
Project: .cursor/mcp.json

{
  "mcpServers": {
    "memories": {
      "command": "node",
      "args": ["/path/to/memories/mcp-server/index.js"],
      "env": {
        "MEMORIES_URL": "http://localhost:8900",
        "MEMORIES_API_KEY": "your-api-key-here"
      }
    }
  }
}

Restart Cursor.

Cursor also supports the full hook lifecycle via its "Third-party skills" feature. Run ./integrations/claude-code/install.sh --cursor to install hooks alongside the MCP config.

Multi-backend: Cursor uses the same hook scripts as Claude Code, so multi-backend routing works automatically when configured.

ChatGPT (Custom GPT)

ChatGPT uses Custom Actions (OpenAPI schema) rather than MCP. This requires exposing the Memories service over the internet.

Prerequisites: Memories service accessible via HTTPS (see Remote Access).

Setup:

Enable API key auth on the Memories service (set API_KEY env var in docker-compose).
In ChatGPT, go to Explore GPTs > Create a GPT > Configure > Actions.
Import this OpenAPI schema (replace memory.yourdomain.com with your URL):

openapi: 3.0.0
info:
  title: Memories
  version: 2.0.0
  description: Semantic memory search and storage
servers:
  - url: https://memory.yourdomain.com
paths:
  /search:
    post:
      operationId: searchMemory
      summary: Search memories by semantic similarity
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [query]
              properties:
                query:
                  type: string
                  description: Natural language search query
                k:
                  type: integer
                  default: 5
                  description: Number of results
                hybrid:
                  type: boolean
                  default: true
                  description: Use hybrid BM25+vector search
      responses:
        '200':
          description: Search results

  /memory/add:
    post:
      operationId: addMemory
      summary: Store a new memory
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [text, source]
              properties:
                text:
                  type: string
                  description: Memory content
                source:
                  type: string
                  description: Source identifier
                deduplicate:
                  type: boolean
                  default: true
      responses:
        '200':
          description: Memory added

  /memory/is-novel:
    post:
      operationId: isNovel
      summary: Check if text is already known
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required: [text]
              properties:
                text:
                  type: string
                threshold:
                  type: number
                  default: 0.88
      responses:
        '200':
          description: Novelty check result

  /memories:
    get:
      operationId: listMemories
      summary: Browse stored memories with pagination
      parameters:
        - name: offset
          in: query
          schema:
            type: integer
            default: 0
        - name: limit
          in: query
          schema:
            type: integer
            default: 20
            maximum: 5000
        - name: source
          in: query
          description: Source prefix filter
          schema:
            type: string
      responses:
        '200':
          description: List of memories
    delete:
      operationId: deleteMemoriesByPrefix
      summary: Bulk delete all memories matching a source prefix
      parameters:
        - name: source
          in: query
          required: true
          description: Source prefix to match
          schema:
            type: string
      responses:
        '200':
          description: Delete count

  /memories/count:
    get:
      operationId: countMemories
      summary: Count memories optionally filtered by source prefix
      parameters:
        - name: source
          in: query
          description: Source prefix filter
          schema:
            type: string
      responses:
        '200':
          description: Memory count

  /stats:
    get:
      operationId: getStats
      summary: Memory index statistics
      responses:
        '200':
          description: Index stats

Under Authentication, choose API Key with header name X-API-Key.
Add instructions to the GPT system prompt:

You have access to a persistent memory system. Use it to:
- Search for relevant context before answering questions (searchMemory)
- Store important decisions, patterns, and learnings (addMemory)
- Check if something is already known before adding (isNovel)
- Browse what's stored (listMemories)

Always search memory at the start of conversations to load context.

OpenClaw

OpenClaw uses a Skill (SKILL.md) with shell helper functions that call the REST API directly.

Setup:

Create the skill directory and copy the skill file:

mkdir -p ~/.openclaw/skills/memories
cp integrations/openclaw-skill.md ~/.openclaw/skills/memories/SKILL.md

Or see the full SKILL.md in this repo at integrations/openclaw-skill.md.

Add the API key to the OpenClaw gateway config so skill exec calls can authenticate:

openclaw config patch '{"env": {"vars": {"MEMORIES_URL": "http://localhost:8900", "MEMORIES_API_KEY": "your-api-key-here"}}}'

Or edit ~/.openclaw/openclaw.json directly under env.vars, then restart the gateway. The SKILL.md reads $MEMORIES_API_KEY from the environment and includes automatic lifecycle guidance for when to recall, extract, and sync memories.

Key commands available to OpenClaw agents:

memory_search_memories "query" [k] [threshold] [hybrid]
memory_add_memories "text" "source" [deduplicate]
memory_is_novel "text" [threshold]
memory_delete_memories <id>
memory_delete_source_memories "pattern"
memory_delete_by_prefix "source_prefix"
memory_count_memories [source_prefix]
memory_list_memories [offset] [limit] [source]
memory_rebuild_index
memory_dedup_memories [dry_run] [threshold]
memory_stats
memory_health
memory_backup [prefix]
memory_restore "backup_name"

All functions use jq for safe JSON construction and read auth from $MEMORIES_API_KEY in the gateway environment (no hardcoded secrets).

Multi-backend: Not yet supported for OpenClaw. OpenClaw uses skill-based extraction (not hooks), so multi-backend routing does not apply. This is planned for a future release.

Remote Access

To use Memories from anywhere (Claude Chat web, ChatGPT, mobile, other machines), expose it via a Cloudflare Tunnel or similar.

Setup with Cloudflare Tunnel

Enable API key auth in your docker-compose:

environment:
  - API_KEY=your-secret-key-here

Rebuild and restart: docker compose build memories && docker compose up -d memories

Add to your Cloudflare tunnel config (e.g., in ~/.cloudflared/config.yml):

ingress:
  - hostname: memory.yourdomain.com
    service: http://localhost:8900

Update MCP server env to use the remote URL:

{
  "env": {
    "MEMORIES_URL": "https://memory.yourdomain.com",
    "MEMORIES_API_KEY": "your-secret-key-here"
  }
}

Now every client — Claude Code on your laptop, Cursor, Claude Desktop on your phone, ChatGPT, OpenClaw — all hit the same memory store running on your Mac mini.

Multi-Backend Routing (Optional)

A single agent session can talk to multiple Memories instances simultaneously. This is useful when you want to:

Dev + Prod: search both local dev and remote production, extract to dev only
Personal + Shared: search both personal and team memories, route decisions to shared

Multi-backend is configured via YAML files and is fully backward compatible — no config file means single-backend behavior from environment variables, exactly as before.

Config locations:

Global: ~/.config/memories/backends.yaml
Per-project: .memories/backends.yaml (should be gitignored)

Three tiers:

Scenario-based — pick a preset (dev+prod, personal+shared, single) and routing is automatic
Scenario + overrides — start from a scenario, then customize routing rules
DIY — define backends and routing rules from scratch

Quick example (dev + prod):

backends:
  dev:
    url: http://localhost:8900
    api_key: ${MEMORIES_DEV_KEY}
    scenario: dev
  prod:
    url: https://memory.yourdomain.com
    api_key: ${MEMORIES_PROD_KEY}
    scenario: prod

Config supports env var interpolation (${VAR_NAME}) for API keys and URLs.

Multi-backend works automatically with Claude Code, Codex, and Cursor because they all use the same hook scripts. OpenClaw does not yet support multi-backend (it uses skill-based extraction, not hooks).

For full setup instructions, config format, and verification steps, see the Multi-Backend Setup section in the LLM quickstart guide.

Authentication

Memories supports multiple API keys with role-based access control:

Three tiers: read-only (search/list), read-write (search/list + add/delete), admin (full access + key management)
Prefix scoping: keys can be restricted to specific source prefixes for tenant isolation
Key management: create, list, update, and revoke keys via POST/GET/PATCH/DELETE /api/keys or the Web UI (admin-only)
Backward compatible: the existing API_KEY env var still works as an implicit admin key

See the multi-auth design doc for details.

API Reference

All endpoints accept/return JSON. Auth via X-API-Key header.

Search

POST /search
{"query": "...", "k": 5, "hybrid": true, "threshold": 0.3,
 "vector_weight": 0.7, "recency_weight": 0.1, "recency_half_life_days": 30,
 "source_prefix": "team/project/"}

POST /search/batch
{"queries": [{"query": "...", "k": 5}, {"query": "...", "hybrid": true}]}

Add Memory

POST /memory/add
{"text": "...", "source": "file.md", "deduplicate": true}

Add Batch

POST /memory/add-batch
{"memories": [{"text": "...", "source": "..."}, ...], "deduplicate": true}

Delete

DELETE /memory/{id}
DELETE /memories?source=<prefix>              # Bulk delete by source prefix; returns {"count": N}
POST /memory/delete-batch     {"ids": [1, 2, 3]}
POST /memory/delete-by-source  {"source_pattern": "credentials"}
POST /memory/delete-by-prefix {"source_prefix": "team/project/"}

Get

GET  /memory/{id}
POST /memory/get-batch {"ids": [1, 2, 3]}

Upsert / Patch

POST  /memory/upsert
{"text":"...", "source":"team/project/file", "key":"entity-1", "metadata": {"owner":"team"}}

POST  /memory/upsert-batch
{"memories":[{"text":"...", "source":"...", "key":"..."}]}

PATCH /memory/{id}
{"text":"optional", "source":"optional", "metadata_patch":{"tag":"v2"}}

Novelty Check

POST /memory/is-novel
{"text": "...", "threshold": 0.88}

Browse

GET /memories?offset=0&limit=20&source=filter   # limit up to 5000; source uses prefix matching
GET /memories/count?source=<prefix>             # returns {"count": N}

Deduplication

POST /memory/deduplicate
{"threshold": 0.90, "dry_run": true}

Index Operations

POST /index/build    {"sources": ["file1.md", "file2.md"]}
GET  /stats
GET  /health
GET  /health/ready
GET  /metrics
POST /maintenance/embedder/reload

Backups

GET  /backups
POST /backup?prefix=manual
POST /restore          {"backup_name": "manual_20260213_120000"}

Extraction

POST /memory/extract    {"messages": "...", "source": "proj", "context": "stop", "debug": true}  # 202 queued
GET  /memory/extract/{job_id}
GET  /extract/status

Memory Relationships

POST   /memory/{id}/link          {"target_id": N, "type": "related"}
GET    /memory/{id}/links
DELETE /memory/{id}/link/{link_id}

Conflict Detection

GET /memory/conflicts?limit=10

Events (SSE + Webhooks)

GET    /events/stream              # SSE stream (auth-filtered)
POST   /webhooks                   # Register webhook
GET    /webhooks
DELETE /webhooks/{id}

Search Explainability

POST /search/explain               # Admin-only scoring breakdown

Quality & Metrics

GET  /metrics/search-quality?period=7d
POST /search/feedback              # Submit relevance feedback
GET  /metrics/quality-summary?period=7d
GET  /metrics/failures?type=retrieval&limit=10

Maintenance

POST /maintenance/reembed          # Migrate embedding model
POST /maintenance/compact          # Find similar clusters (dry-run); clusters tightened to prevent chain-connected outliers
POST /maintenance/consolidate      # LLM-powered merge

Audit

GET /audit/log?limit=50&source=prefix

Full OpenAPI schema at http://localhost:8900/docs.

Future API Candidates (Swarm Scale)

POST /memory/compare (pairwise conflict scoring for concurrent agent writes)
POST /memory/resolve-conflicts (policy-driven merge: latest/manual/model)
POST /memory/lock + DELETE /memory/lock/{key} (explicit lock reservation APIs)
POST /memory/events + GET /memory/events/stream (change feed for agent synchronization)
POST /search/stream (progressive search responses for very large corpora)
POST /memory/ttl (time-bound memories with auto-expiry)

MCP Tools Reference

When connected via MCP (Claude Code, Claude Desktop, Codex, Cursor), these tools are available:

Tool	Description
`memory_search`	Hybrid search (BM25 + vector). Default mode.
`memory_add`	Store a memory with auto-dedup.
`memory_extract`	LLM-based extraction with AUDN (Add/Update/Delete/Noop/Conflict) from conversation text.
`memory_delete`	Delete by ID.
`memory_delete_batch`	Delete multiple IDs in one operation.
`memory_delete_by_source`	Bulk delete all memories matching a source prefix.
`memory_count`	Count memories, optionally filtered by source prefix.
`memory_list`	Browse with pagination and source prefix filter.
`memory_stats`	Index stats (count, model, last updated).
`memory_is_novel`	Check if text is already known.
`memory_is_useful`	Submit search feedback (positive/negative).
`memory_conflicts`	List memories with unresolved conflicts.

Configuration

Environment Variables

Variable	Default	Description
`DATA_DIR`	`/data`	Persistent storage path
`WORKSPACE_DIR`	`/workspace`	Read-only workspace for index rebuilds
`API_KEY`	(empty)	API key for auth. Empty = no auth.
`EMBED_PROVIDER`	`onnx`	Embedding provider: `onnx` (local) or `openai` (BYOK)
`EMBED_MODEL`	(unset)	Provider-specific embedding model override
`MODEL_NAME`	`all-MiniLM-L6-v2`	Default ONNX model used when `EMBED_PROVIDER=onnx` and `EMBED_MODEL` is unset
`MODEL_CACHE_DIR`	(unset; Docker image sets `/data/model-cache`)	Optional writable cache path for downloaded model files
`PRELOADED_MODEL_CACHE_DIR`	(unset; Docker image sets `/opt/model-cache`)	Optional read-only cache to seed `MODEL_CACHE_DIR` when empty
`MAX_BACKUPS`	`10`	Number of backups to keep
`MAX_EXTRACT_MESSAGE_CHARS`	`120000`	Max characters accepted by `/memory/extract`
`EXTRACT_MAX_INFLIGHT`	`2`	Max concurrent extraction jobs
`MEMORY_TRIM_ENABLED`	`true`	Run post-extract GC/allocator trim
`MEMORY_TRIM_COOLDOWN_SEC`	`15`	Minimum seconds between trim attempts
`MEMORY_TRIM_PERIODIC_SEC`	`5`	Periodic trim probe interval (seconds). Set `0` to disable background trim loop.
`EMBEDDER_AUTO_RELOAD_ENABLED`	`false`	Enable periodic auto-reload of in-process embedder runtime
`EMBEDDER_AUTO_RELOAD_RSS_KB_THRESHOLD`	`1200000`	RSS threshold (KB) required before auto-reload decisions
`EMBEDDER_AUTO_RELOAD_CHECK_SEC`	`15`	Seconds between auto-reload checks
`EMBEDDER_AUTO_RELOAD_HIGH_STREAK`	`3`	Consecutive high-RSS checks required before trigger
`EMBEDDER_AUTO_RELOAD_MIN_INTERVAL_SEC`	`900`	Cooldown between reload attempts
`EMBEDDER_AUTO_RELOAD_WINDOW_SEC`	`3600`	Rolling window size for reload cap
`EMBEDDER_AUTO_RELOAD_MAX_PER_WINDOW`	`2`	Max reloads allowed per rolling window
`EMBEDDER_AUTO_RELOAD_MAX_ACTIVE_REQUESTS`	`2`	Skip reload when active HTTP requests exceed this
`EMBEDDER_AUTO_RELOAD_MAX_QUEUE_DEPTH`	`0`	Skip reload when extract queue depth exceeds this
`METRICS_LATENCY_SAMPLES`	`200`	Per-route latency sample window for `/metrics` percentiles
`METRICS_TREND_SAMPLES`	`120`	Memory trend sample window exposed by `/metrics`
`AUDIT_LOG`	(none)	Path to audit log file
`CONFIDENCE_DECAY_HALF_LIFE_DAYS`	`90`	Half-life for confidence decay
`PORT`	`8000`	Internal service port

Docker Compose guardrails

Default compose files now include:

mem_limit: ${MEMORIES_MEM_LIMIT:-3g} to bound container memory growth
MALLOC_ARENA_MAX=2 to reduce glibc arena fragmentation in multithreaded workloads
MALLOC_TRIM_THRESHOLD_=131072 and MALLOC_MMAP_THRESHOLD_=131072 to encourage earlier allocator release
extraction env passthrough (EXTRACT_PROVIDER, EXTRACT_MODEL, provider keys/URL) so deploys keep extraction enabled when set in shell or .env
embedder auto-reload env passthrough with anti-loop defaults (EMBEDDER_AUTO_RELOAD_*)

MCP Server Environment

Variable	Default	Description
`MEMORIES_URL`	`http://localhost:8900`	Memories service URL
`MEMORIES_API_KEY`	(empty)	API key if auth is enabled

Automatic Memory Layer

Memories supports automatic retrieval/extraction, with client-specific behavior:

Claude Code: full 12-hook lifecycle (session start, each prompt, after response, pre-compact, post-compact, subagent start, subagent stop, tool use, tool observe, file write guard, config change, session end)
Cursor: same 12-hook lifecycle via Third-party skills (loads from ~/.claude/settings.json)
Codex: 5-hook lifecycle via ~/.codex/hooks.json + permissions in ~/.codex/settings.json + MCP/developer instructions in ~/.codex/config.toml
OpenClaw: skill-driven retrieval/extraction flow

Claude Code / Cursor Hook Lifecycle

Event	Hook	What happens
Session start	`memory-recall.sh`	Loads project-scoped memories, hydrates MEMORY.md, checks service health, warns if backend version is outdated
Every prompt	`memory-query.sh`	Retrieves relevant memories with transcript context
After response	`memory-extract.sh`	Extracts facts via AUDN
Before compaction	`memory-flush.sh`	Aggressive extraction before context loss
After compaction	`memory-rehydrate.sh`	Re-injects memories using compact summary
Subagent start	`memory-subagent-recall.sh`	Injects project memories into subagents at spawn
Subagent stop	`memory-subagent-capture.sh`	Captures decisions from Plan/Explore agents
Tool use observed	`memory-observe.sh`	Logs MCP tool invocations (observability)
Tool use (Write/Edit/Bash)	`memory-tool-observe.sh`	Logs tool observations to session file
File write attempt	`memory-guard.sh`	Blocks direct MEMORY.md writes
Config changed	`memory-config-guard.sh`	Warns if hooks removed from settings
Session end	`memory-commit.sh`	Final extraction pass

Cursor compatibility note: Cursor sends workspace_roots[] (not cwd) and transcript_path (not inline messages) in hook payloads. The hook scripts handle both formats automatically — no separate configuration needed.

Codex Lifecycle

Event	Mechanism	What happens
Session start	`hooks.json` -> `memory-recall.sh`	Loads project-scoped memories and recall guidance for the session
Every prompt	`hooks.json` -> `memory-query.sh`	Retrieves relevant memories using transcript context for short follow-ups
After response	`hooks.json` -> `memory-extract.sh`	Extracts facts via AUDN with beefier Stop sampling to compensate for missing compaction/session-end hooks
Memory MCP tool calls	`hooks.json` -> `memory-observe.sh` (`PostToolUse` matcher `mcp__memories__`)	Logs memory MCP tool calls for observability
File writes	`hooks.json` -> `memory-guard.sh` (`PreToolUse` matcher `Write	Edit`)
On new turns	MCP tools + developer instructions	Encourages focused `memory_search` before implementation-heavy responses

Codex uses ~/.codex/hooks.json for these hooks, ~/.codex/settings.json for permissions, and ~/.codex/config.toml for MCP + developer instructions. Its Stop hook is intentionally beefier because Codex does not expose PreCompact or SessionEnd.

Quick setup

Prerequisites:

jq and curl installed (required by installer)
running Memories service (curl -s http://localhost:8900/health | jq .)
if installing Codex integration, MCP deps installed:

cd memories/mcp-server
npm install

One-command auto-detect installer (recommended):

./integrations/claude-code/install.sh --auto

This detects and configures any available targets on your machine:

Claude Code hooks (~/.claude/settings.json)
Codex hooks (~/.codex/hooks.json) + permissions (~/.codex/settings.json) + MCP/developer instructions (~/.codex/config.toml)
OpenClaw skill (~/.openclaw/skills/memories/SKILL.md)

Cursor is supported via manual MCP config (~/.cursor/mcp.json or .cursor/mcp.json). If you're inside this repo, you can also install the repo-local Codex plugin from .agents/plugins/marketplace.json and run $memories:setup, which bootstraps the same --codex installer flow from the checkout root.

The installer writes runtime config to:

~/.config/memories/env for hook vars (MEMORIES_URL, optional MEMORIES_API_KEY, optional MEMORIES_SOURCE_PREFIXES / MEMORIES_EXTRACT_SOURCE to override default source families)
repo .env for extraction vars (EXTRACT_PROVIDER, provider keys/URL)

Claude/Cursor read hooks also support an optional MEMORIES_SOURCE_PREFIXES env var in ~/.config/memories/env. It is a comma-separated list of source prefix templates and defaults to claude-code/{project},learning/{project},wip/{project}.

Target only Claude, Cursor, or Codex:

./integrations/claude-code/install.sh --claude
./integrations/claude-code/install.sh --cursor
./integrations/claude-code/install.sh --codex

Target only OpenClaw:

./integrations/claude-code/install.sh --openclaw

LLM-assisted setup: Feed integrations/QUICKSTART-LLM.md to your AI assistant and it will configure everything automatically.

Extraction providers

Provider	Cost	AUDN	Speed
Anthropic (recommended)	~$0.001/turn	Full (Add/Update/Delete/Noop/Conflict)	~1-2s
OpenAI	~$0.001/turn	Full	~1-2s
ChatGPT Subscription	Free (uses your subscription)	Full	~1-2s
Ollama	Free	Full	~5s
Skip	Free	None	N/A

Extraction is optional. Without it, retrieval still works.

By default, automatic write hooks do not store new memories when extraction is disabled. If you want a degraded automatic-write mode, set EXTRACT_FALLBACK_ADD=true to enable a strict heuristic + novelty-check fallback that writes at most a small number of high-confidence facts when extraction is disabled or the configured provider fails at runtime (for example rate limits/timeouts).

AUDN in plain English

AUDN is the memory decision loop:

ADD: store a genuinely new fact
UPDATE: refine an existing memory that is close but outdated/incomplete
DELETE: remove a stale/conflicting memory
NOOP: ignore non-useful or duplicate facts
CONFLICT: flag when two memories directly contradict each other

Why it matters:

cleaner memory store over time (less duplicate/stale data)
better retrieval quality in later sessions
less "memory drift" when decisions change

Recent extraction changes:

Signal keyword filter removed — extraction now fires unconditionally on every response
Extraction window widened to 4 message pairs / 8K chars (up from previous defaults)
Assertive injection framing: recalled memories include "IMPORTANT: MUST be considered" prefix to strengthen recall adherence

Cost vs quality

Anthropic/OpenAI extraction: small usage cost (typically around ~$0.001/turn), full AUDN quality.
ChatGPT Subscription extraction: no additional API cost (uses your existing subscription), full AUDN quality.
Ollama extraction: no API cost, full AUDN quality (with JSON format constraint).
Retrieval only (EXTRACT_PROVIDER unset): no extraction model cost.
Optional fallback writes (EXTRACT_FALLBACK_ADD=true): add-only, heuristic extraction path (no AUDN update/delete) used when extraction is disabled or provider calls fail at runtime.

Cost control knobs

Use these to keep extraction spend bounded:

MAX_EXTRACT_MESSAGE_CHARS: hard cap on transcript size per request
EXTRACT_MAX_FACTS: limits facts considered from each extraction
EXTRACT_MAX_FACT_CHARS: caps per-fact payload size
EXTRACT_SIMILAR_TEXT_CHARS and EXTRACT_SIMILAR_PER_FACT: limit context passed into AUDN

Async extraction API

POST /memory/extract is async-first. It enqueues work and returns 202 with a job_id. Poll GET /memory/extract/{job_id} for queued, running, completed, or failed. If the queue is full, the API returns 429 with a Retry-After header. When extraction is disabled and EXTRACT_FALLBACK_ADD=true, /memory/extract runs an immediate fallback add path and still returns a job object. When extraction is configured but fails at runtime, the queued worker also falls back to add-only mode when EXTRACT_FALLBACK_ADD=true.

Docker image targets (core / extract)

The Dockerfile publishes two runtime targets:

core (default): search/add/list endpoints, no Anthropic/OpenAI SDKs
extract: includes Anthropic/OpenAI SDKs for /memory/extract

Build both images directly:

docker build --target core -t memories:core .
docker build --target extract -t memories:extract .

Use compose with either target:

# Default (core target)
docker compose up -d --build memories

# Extraction-ready target
MEMORIES_IMAGE_TARGET=extract docker compose up -d --build memories

By default, images do not bake model weights. On first run, the service downloads them into MODEL_CACHE_DIR (/data/model-cache in Docker), so later restarts reuse the volume cache.

If you want a fully preloaded image (faster first boot, larger pull), set PRELOAD_MODEL=true:

docker build --target core --build-arg PRELOAD_MODEL=true -t memories:core .
docker build --target extract --build-arg PRELOAD_MODEL=true -t memories:extract .

Ollama uses HTTP directly and does not need the extra SDKs, so core is enough for Ollama extraction.

Extraction environment variables

Variable	Default	Description
`EXTRACT_PROVIDER`	(none)	`anthropic`, `openai`, `chatgpt-subscription`, `ollama`, or empty to disable
`EXTRACT_MODEL`	(per provider)	Model override
`ANTHROPIC_API_KEY`	(none)	Required for Anthropic provider (standard key or `sk-ant-oat01-` OAuth token)
`OPENAI_API_KEY`	(none)	Required for OpenAI provider
`CHATGPT_REFRESH_TOKEN`	(none)	Required for ChatGPT Subscription provider (from `python -m memories auth chatgpt`)
`CHATGPT_CLIENT_ID`	(none)	Required for ChatGPT Subscription provider
`OLLAMA_URL`	`http://host.docker.internal:11434`	Ollama server URL (on Linux, use `http://localhost:11434`)
`EXTRACT_FALLBACK_ADD`	`false`	Enable add-only fallback writes when extraction is disabled or provider calls fail at runtime
`EXTRACT_FALLBACK_MAX_FACTS`	`1`	Max fallback facts to store per extract request
`EXTRACT_FALLBACK_MIN_FACT_CHARS`	`24`	Minimum candidate fact length for fallback
`EXTRACT_FALLBACK_MAX_FACT_CHARS`	`280`	Maximum candidate fact length for fallback
`EXTRACT_FALLBACK_NOVELTY_THRESHOLD`	`0.88`	Novelty threshold used by fallback add mode
`EXTRACT_QUEUE_MAX`	`EXTRACT_MAX_INFLIGHT * 20`	Maximum queued extraction jobs before backpressure (`429`)
`EXTRACT_JOB_RETENTION_SEC`	`300`	How long completed/failed extraction jobs stay queryable
`EXTRACT_JOBS_MAX`	`200`	Hard cap on stored extraction job records (finished jobs evicted first)
`EXTRACT_MAX_FACTS`	`30`	Maximum facts kept from a single extraction
`EXTRACT_MAX_FACT_CHARS`	`500`	Max length per extracted fact
`EXTRACT_SIMILAR_TEXT_CHARS`	`280`	Max similar-memory text length passed into AUDN
`EXTRACT_SIMILAR_PER_FACT`	`5`	Similar memories included per fact during AUDN

Burst memory behavior

Extraction can create short-lived allocation spikes (large transcripts, large LLM JSON payloads, concurrent requests).

Mitigations built in:

/memory/extract request size limit (MAX_EXTRACT_MESSAGE_CHARS)
bounded in-flight extraction (EXTRACT_MAX_INFLIGHT)
post-extract + periodic memory reclamation (MEMORY_TRIM_ENABLED, MEMORY_TRIM_COOLDOWN_SEC, MEMORY_TRIM_PERIODIC_SEC)
optional auto-reload controller for the embedder runtime (EMBEDDER_AUTO_RELOAD_*)
bounded AUDN payload sizes (EXTRACT_MAX_FACTS, EXTRACT_MAX_FACT_CHARS, EXTRACT_SIMILAR_TEXT_CHARS)

Observability:

/metrics includes embedder_reload.auto and embedder_reload.manual counters/state
manual reload endpoint: POST /maintenance/embedder/reload

Reference benchmark: docs/benchmarks/2026-02-17-memory-reclamation.md

Uninstall

./integrations/claude-code/install.sh --uninstall

Then optionally remove MEMORIES_* from ~/.config/memories/env and EXTRACT_* from repo .env.

Backup & Recovery

Memories has three layers of backup protection:

1. Auto-backup (built-in)

The service automatically saves a snapshot after every write operation. The 10 most recent auto-backups are kept in the Docker volume under data/backups/.

# List backups
curl -H "X-API-Key: $MEMORIES_API_KEY" http://localhost:8900/backups

# Create manual backup
curl -X POST -H "X-API-Key: $MEMORIES_API_KEY" http://localhost:8900/backup?prefix=manual

# Restore from backup
curl -X POST -H "X-API-Key: $MEMORIES_API_KEY" http://localhost:8900/restore \
  -H "Content-Type: application/json" \
  -d '{"backup_name": "manual_20260214_120000"}'

2. Scheduled local snapshots (cron)

A cron job creates timestamped copies of the Memories index every 30 minutes. Snapshots are stored outside the Docker volume (default: ~/backups/memories/) with 30-day retention.

# Install the cron job
./scripts/install-cron.sh install

# Check status
./scripts/install-cron.sh status

# Run a backup manually
./scripts/backup.sh

# Dry run (no changes)
./scripts/backup.sh --test

Environment variables (all optional, sensible defaults):

Variable	Default	Description
`MEMORIES_URL`	`http://localhost:8900`	Service URL
`MEMORIES_API_KEY`	(empty)	API key if auth is enabled
`MEMORIES_DATA_DIR`	`./data` (relative to repo)	Docker volume data path
`BACKUP_DIR`	`~/backups/memories`	Where to store snapshots
`RETENTION_DAYS`	`30`	Days to keep local snapshots

3. Off-site backup to Google Drive (optional)

If you set GDRIVE_ACCOUNT, each backup automatically uploads the latest snapshot to Google Drive as a compressed tar.gz. Uploads are throttled to once per hour. 7-day retention on Drive.

Prerequisites:

Install gog CLI
Authenticate: gog auth add your-email@gmail.com --services drive
Set env var in your shell profile:

export GDRIVE_ACCOUNT="your-email@gmail.com"

Environment variables (all optional):

Variable	Default	Description
`GDRIVE_ACCOUNT`	(none)	Google account email. Required to enable GDrive.
`GDRIVE_FOLDER_NAME`	`memories-backups`	Folder name on Drive
`UPLOAD_INTERVAL_MIN`	`55`	Minimum minutes between uploads
`GDRIVE_RETENTION_DAYS`	`7`	Days to keep backups on Drive

Manual usage:

# Setup (create Drive folder + test auth)
./scripts/backup-gdrive.sh --setup

# Upload now (skip throttle)
./scripts/backup-gdrive.sh --force

# Dry run
./scripts/backup-gdrive.sh --test

# Only clean up old backups on Drive
./scripts/backup-gdrive.sh --cleanup

Alternative: S3-compatible cloud sync

For S3/MinIO/R2 backends, build with cloud sync enabled:

ENABLE_CLOUD_SYNC=true docker compose up -d --build memories

See CLOUD_SYNC_README.md for configuration details.

Project Structure

memories/
  app.py                  # FastAPI REST API
  memory_engine.py        # Memories engine (search, chunking, BM25, backups)
  onnx_embedder.py        # ONNX Runtime embedder (replaces PyTorch)
  llm_provider.py         # LLM provider abstraction (Anthropic/OpenAI/ChatGPT Subscription/Ollama)
  llm_extract.py          # Extraction pipeline with AUDN
  chatgpt_oauth.py        # ChatGPT OAuth2+PKCE token exchange helpers
  key_store.py            # SQLite-backed API key store (SHA-256 hashing)
  event_bus.py            # Event-driven architecture (SSE, webhooks)
  audit_log.py            # Append-only audit trail
  qdrant_store.py         # Qdrant vector store adapter
  usage_tracker.py        # Search quality and extraction metrics
  auth_context.py         # Request-scoped role and prefix enforcement
  memories_auth.py        # CLI auth tool (python -m memories auth chatgpt/status)
  __main__.py             # Entry point for python -m memories
  Dockerfile              # Multi-stage Docker build (core/extract targets)
  pyproject.toml          # Python dependencies (uv)
  uv.lock                 # Locked dependency resolutions
  docker-compose.snippet.yml
  docs/
    api.md                # Complete REST API reference
    architecture.md       # System architecture and runtime flows
    decisions.md          # Key design decisions and tradeoffs
    deployment.md         # Self-hosted deployment guide
    api-coverage.md       # API/MCP/CLI coverage matrix
    benchmarks/           # Reproducible benchmark notes
  mcp-server/
    index.js              # MCP server (wraps REST API as tools)
    package.json
  scripts/
    backup.sh             # Cron backup (local snapshots)
    backup-gdrive.sh      # Optional Google Drive upload
    install-cron.sh       # Cron job installer
  webui/
    index.html            # Memory browser entry page (/ui)
    styles.css            # UI styling
    app.js                # Browser-side pagination/filter logic
  integrations/
    claude-code/
      install.sh          # Auto-detect installer (Claude/Codex/Cursor/OpenClaw)
      hooks/              # Claude Code 12-hook scripts + hooks.json
        _lib.sh               # Shared hook utilities (logging, health check)
        memory-rehydrate.sh   # PostCompact rehydration hook
        memory-observe.sh     # PostToolUse observability hook
        memory-guard.sh       # PreToolUse MEMORY.md write guard
        memory-subagent-capture.sh  # SubagentStop extraction hook
        memory-config-guard.sh      # ConfigChange settings watchdog
        response-hints.json   # Response hint patterns
    codex/
      memory-codex-notify.sh # Legacy Codex notify hook (compatibility/manual fallback)
    claude-code.md        # Claude Code guide
    openclaw-skill.md     # OpenClaw SKILL.md
    QUICKSTART-LLM.md     # LLM-friendly setup guide
  .agents/
    plugins/
      marketplace.json    # Repo-local Codex plugin catalog
  plugins/
    memories/
      .codex-plugin/
        plugin.json       # Repo-local Codex plugin manifest
      skills/
        memories/
          SKILL.md        # Shared Memories discipline skill for Codex plugin
        setup/
          SKILL.md        # Codex bootstrap skill that runs the canonical installer
  tests/
    test_memory_engine.py # Memory engine tests
    test_llm_provider.py  # LLM provider tests (incl. ChatGPT Subscription)
    test_chatgpt_oauth.py # OAuth PKCE + token exchange tests
    test_memories_auth.py # CLI auth tool tests
    test_llm_extract.py   # Extraction pipeline tests
    test_extract_api.py   # API endpoint tests
    test_web_ui.py        # Web UI route/static tests
  skills/
    memories/
      SKILL.md            # Claude Code skill for memory discipline
  eval/
    benchmarks.py         # Benchmark suite runner
    scenarios/benchmark/  # 6 benchmark scenarios
    __main__.py           # CLI entrypoint (python -m eval)
    models.py             # Pydantic data models (Scenario, EvalReport, etc.)
    loader.py             # YAML scenario loader
    scorer.py             # Deterministic rubric scorer
    judge.py              # LLM-as-judge for non-deterministic rubrics
    memories_client.py    # Memories API client for eval runner
    cc_executor.py        # Claude Code executor with project isolation
    runner.py             # Orchestrates with/without-memory runs
    reporter.py           # JSON reporter and summary formatter
    config.yaml           # Default eval configuration
    scenarios/            # YAML test scenarios by category
    results/              # JSON eval reports (.gitignored)
    tests/                # 82 tests covering all eval components
  data/                   # .gitignored — persistent index + backups

Efficacy Eval

Memories includes a built-in eval harness that measures how much Memories improves AI assistant performance. It runs controlled A/B tests: each scenario executes via Claude Code (claude -p) both with and without Memories, then scores the outputs against deterministic rubrics.

# Start the isolated eval stack in OrbStack/Docker
docker compose -f docker-compose.eval.yml up -d --build

# Verify the eval instance is healthy (separate from the main service on :8900)
curl http://localhost:8901/health

# Run all scenarios (via wrapper script)
./eval/run.sh

# Or directly via Python
python -m eval

# Run a specific category
python -m eval --category coding

# Run a single scenario
python -m eval --scenario coding-001 -v

The eval defaults target http://localhost:8901, which is the isolated instance from docker-compose.eval.yml. ./eval/run.sh intentionally ignores your normal MEMORIES_URL from ~/.config/memories/env so it does not accidentally hit the main service. Override the wrapper with EVAL_MEMORIES_URL=http://host:port ./eval/run.sh ..., or use MEMORIES_URL=http://host:port python -m eval ... for direct Python runs.

Results

Category	With Memory	Without Memory	Delta
Coding	1.00	0.00	+1.00
Recall	1.00	0.20	+0.80
Compounding	1.00	0.27	+0.73
Overall	1.00	0.14	+0.86

11 scenarios across 3 categories. Each scenario uses fictional project context ("Voltis") with arbitrary, non-derivable facts — values like hvt_client, vtctl deploy-gate, VTX_LEGACY_DSN, port 7443, and 73% that Claude cannot guess from naming patterns or training data.

What it measures

Coding tasks (4 scenarios) — Does the agent apply project-specific tools and conventions?
Knowledge recall (4 scenarios) — Can the agent recall exact config values and decisions?
Compounding value (3 scenarios) — Can the agent synthesize multiple memories to diagnose problems?

How it works

Purges stale auto-memory from prior eval runs (~/.claude/projects/cc_eval*)
Clears eval memories, creates an isolated temp project (no CLAUDE.md, no .claude/)
Runs the prompt without Memories via claude -p --strict-mcp-config (empty MCP) → scores against rubrics
Seeds scenario memories, runs the prompt with Memories via claude -p --strict-mcp-config (Memories MCP only) → scores again
Computes efficacy delta = score_with - score_without
Aggregates across categories with configurable weights

Isolation strategy

--strict-mcp-config ensures Claude loads only the MCP config provided (or none), ignoring global settings
Fresh temp directories per run — no CLAUDE.md, no .claude/, no conversation history
Auto-memory cleanup removes ~/.claude/projects/cc_eval* dirs at startup and after each run
Scenario memories cleared before each run via Memories API

Results are saved as JSON in eval/results/ and printed as a human-readable summary.

See the design doc for full details.

Performance

Metric	Value
Docker image size	~430MB core / ~436MB extract (no baked model cache by default)
Search latency	<50ms
Add latency	~100ms (includes backup)
Model loading	Cold boot downloads model once; warm boots reuse `/data/model-cache`
Memory footprint	~180-260MB baseline; higher during extraction bursts
Index size	~1.5KB per memory

Uses ONNX Runtime for inference instead of PyTorch — same model (all-MiniLM-L6-v2), same embeddings, 68% smaller image.

Tested on Mac mini M4 Pro, 16GB RAM.

Development

# Install dependencies
uv sync                              # core only
uv sync --extra extract              # with extraction (Anthropic SDK)
uv sync --extra cloud                # with cloud sync (boto3)

# Run tests
uv run pytest -q

# Local dev server
uv run uvicorn app:app --reload

# Docker
docker build --target core -t memories:core .
docker build --target extract -t memories:extract .

When changing memory/index behavior: add or update tests, validate backup/restore still works, validate extraction if touching extraction paths, update README and/or docs/architecture.md.

Roadmap

Auto-rebuild on file changes (watch mode)
Multi-index support (different projects)
Memory tagging system
Search filters by date/type (source filter exists)
Scheduled index rebuilds via cron

Release Checklist

No hardcoded credentials in docs/examples
Public docs avoid product-specific assumptions unless the file is intentionally integration-specific
Benchmarks describe workload profile and caveats
Versioned behavior changes documented in README

Name		Name	Last commit message	Last commit date
Latest commit History 377 Commits
.agents/plugins		.agents/plugins
benchmarks		benchmarks
cli		cli
docs		docs
eval		eval
integrations		integrations
mcp-server		mcp-server
plugin		plugin
plugins/memories		plugins/memories
scripts		scripts
skills/memories		skills/memories
tests		tests
webui		webui
.dockerignore		.dockerignore
.gitignore		.gitignore
.kalos.yaml		.kalos.yaml
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
GETTING_STARTED.md		GETTING_STARTED.md
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
__main__.py		__main__.py
app.py		app.py
audit_log.py		audit_log.py
auth_context.py		auth_context.py
chatgpt_oauth.py		chatgpt_oauth.py
cloud_sync.py		cloud_sync.py
consolidator.py		consolidator.py
docker-compose.eval.yml		docker-compose.eval.yml
docker-compose.qdrant-cluster.template.yml		docker-compose.qdrant-cluster.template.yml
docker-compose.snippet.yml		docker-compose.snippet.yml
docker-compose.yml		docker-compose.yml
embedder_reloader.py		embedder_reloader.py
entity_locks.py		entity_locks.py
event_bus.py		event_bus.py
extraction_profiles.py		extraction_profiles.py
key_store.py		key_store.py
llm_extract.py		llm_extract.py
llm_provider.py		llm_provider.py
memories_auth.py		memories_auth.py
memories_client.py		memories_client.py
memory_engine.py		memory_engine.py
onnx_embedder.py		onnx_embedder.py
openai_embedder.py		openai_embedder.py
pyproject.toml		pyproject.toml
qdrant_config.py		qdrant_config.py
qdrant_store.py		qdrant_store.py
query_intent.py		query_intent.py
runtime_memory.py		runtime_memory.py
usage_tracker.py		usage_tracker.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Memories

API Quick Start

Web UI

CLI

Install

Usage

Export & Import

Agent Integration

Configuration

Architecture

Integration Guides

Claude Code (CLI)

Claude Desktop (Chat / Cowork)

Claude Chat (Web at claude.ai)

Codex (OpenAI)

Cursor

ChatGPT (Custom GPT)

OpenClaw

Remote Access

Setup with Cloudflare Tunnel

Multi-Backend Routing (Optional)

Authentication

API Reference

Search

Add Memory

Add Batch

Delete

Get

Upsert / Patch

Novelty Check

Browse

Deduplication

Index Operations

Backups

Extraction

Memory Relationships

Conflict Detection

Events (SSE + Webhooks)

Search Explainability

Quality & Metrics

Maintenance

Audit

Future API Candidates (Swarm Scale)

MCP Tools Reference

Configuration

Environment Variables

Docker Compose guardrails

MCP Server Environment

Automatic Memory Layer

Claude Code / Cursor Hook Lifecycle

Codex Lifecycle

Quick setup

Extraction providers

AUDN in plain English

Cost vs quality

Cost control knobs

Async extraction API

Docker image targets (core / extract)

Extraction environment variables

Burst memory behavior

Uninstall

Backup & Recovery

1. Auto-backup (built-in)

2. Scheduled local snapshots (cron)

3. Off-site backup to Google Drive (optional)

Alternative: S3-compatible cloud sync

Project Structure

Efficacy Eval

Results

What it measures

How it works

Isolation strategy

Performance

Development

Roadmap

Release Checklist

Packages