Skip to content

Growth-Kinetics/DiffMem

Repository files navigation

DiffMem: Git-Based Differential Memory for AI Agents

License: MIT Python 3.11+ Production Version Ask DeepWiki

DiffMem is a lightweight, git-based memory backend for AI agents and conversational systems. It uses Markdown files for human-readable storage, Git for tracking temporal evolution through differentials, and a git-native retrieval agent that explores the repository via shell commands (grep, git log, git diff, git blame) to build targeted context. No vector databases, no embeddings, no BM25 — just git and an LLM.

At its core, DiffMem treats memory as a versioned repository: the "current state" of knowledge is stored in editable files, while historical changes are preserved in Git's commit graph. This separation allows agents to query and search against a compact, up-to-date surface without the overhead of historical data, while enabling deep dives into evolution when needed.

Live in Production

DiffMem powers Annabelle, a simulated intelligence that maintains persistent memory across thousands of conversations on WhatsApp and Messenger.

In production, DiffMem enables Annabelle to:

  • Reference details from conversations weeks ago
  • Track the evolution of relationships over time
  • Build structured understanding of each person she talks to
  • Consolidate memories automatically as conversations grow

See how DiffMem processes a novel chapter by chapter

Roadmap

  • Indexing strategy from PoC needs to be made more robust, too memory intensive without need.
  • Parametrized method for context caps on retrieval.
  • Sometimes an entity will become a catch-all and the thing will insist in overloading it.
  • Retrieval History so that we can spin up a "linked entities" model to support wikification
  • PDF export.
  • Research PoC: Visual retrieval for context compression.

Why Git for AI Memory?

Traditional memory systems for AI agents often rely on databases, vector stores, or graph structures. These work well for certain scales but can become bloated or inefficient when dealing with long-term, evolving personal knowledge. DiffMem takes a different path by leveraging Git's strengths:

  • Current-State Focus: Memory files store only the "now" view of information (e.g., current relationships, facts, or timelines). This reduces the surface area for queries and searches, making operations faster and more token-efficient in LLM contexts. Historical states are not loaded by default — they live in Git's history, accessible on-demand.

  • Differential Intelligence: Git diffs and logs provide a natural way to track how memories evolve. Agents can ask "How has this fact changed over time?" without scanning entire histories, pulling only relevant commits.

  • Durability and Portability: Plaintext Markdown ensures memories are human-readable and tool-agnostic. Git's distributed nature means your data is backup-friendly and not locked into proprietary formats.

  • Efficiency for Agents: By separating "surface" (current files) from "depth" (git history), agents can be selective — load the now for quick responses, dive into diffs for analytical tasks. This keeps context windows lean while enabling rich temporal reasoning.

This approach shines for long-horizon AI systems where memories accumulate over years: it scales without sprawl, maintains auditability, and allows "smart forgetting" through pruning while preserving reconstructability.

How It Works

DiffMem ships as a small FastAPI service. Key components:

  • Writer Agent (writer_agent): Analyzes conversation transcripts, identifies/creates entities, and stages updates in Git's working tree. Commits are explicit and atomic.

  • Retrieval Agent (retrieval_agent): A multi-turn LLM agent with a single run(command="...") tool that explores the memory repository via sandboxed shell commands. It reads index.md, probes git history for temporal patterns, and outputs a structured retrieval plan (file sections, git diffs, commit logs) that gets resolved into context.

  • API Layer (api.py + server.py): HTTP endpoints for onboarding users, processing sessions, and retrieving context. Also importable as a Python library.

Each user gets an isolated orphan branch (user/{user_id}) inside a single local storage repo, checked out into a per-user worktree when active. Branches share no history with each other — it's strict isolation without per-user repos.

Storage architecture

The service has two pluggable concerns:

  • Storage backend — where the repo and worktrees live. Default is local (a mounted disk). This is a hard requirement of the retrieval agent, which shells out to grep/git log on a real directory.
  • Backup backend — an optional bidirectional mirror. Options: none (default; rely on volume snapshots) or github (mirror user branches to a private GitHub repo you own). Pushes run on a scheduler and are never in the request hot path. Pulls happen at worktree mount time (first request per user after a restart), keeping the local volume in sync with any edits made from other machines.

This separation means self-hosters can run DiffMem with zero external dependencies, and users who want an offsite mirror can opt in with two env vars.

Self-Hosting

DiffMem is designed to be deployed on a single small Linux box with a mounted volume. It's I/O-bound, not compute-bound — an e2-small / 1 vCPU VPS is plenty for thousands of conversations.

One-click deploy with Coolify

Coolify is an open-source, self-hostable Heroku/Vercel alternative. It's the easiest way to run DiffMem.

  1. In Coolify, create a new Docker Compose resource.
  2. Point it at this repository: https://github.com/Growth-Kinetics/DiffMem.
  3. Set the compose file path to docker-compose.yml (default).
  4. In the Environment Variables tab, set OPENROUTER_API_KEY to your key from openrouter.ai/keys.
  5. (Optional) Attach a domain — Coolify handles TLS via Let's Encrypt automatically.
  6. Click Deploy.

Coolify will build the image, provision a named volume at /data (persists across deployments), run the healthcheck, and route traffic through its built-in Traefik reverse proxy. No TLS certs, no nginx configs, no open ports on the host.

DiffMem listens on PORT, defaulting to 8000. If Coolify asks for the service port or proxy target, use the same value you set for PORT.

Leave REQUIRE_AUTH=false (the default) if you're only calling DiffMem from another service on the same Coolify instance. Set REQUIRE_AUTH=true + API_KEY=<long-random-string> if you expose the domain publicly.

Production deployment with Hatchet

For durable, observable, per-user-serialized task execution with Hatchet on a Hetzner Cloud VPS, see docs/deployment-hatchet.md.

The production compose file is deploy/docker-compose.hatchet.yml — it runs two services from the same image: diffmem-api (HTTP server) and diffmem-worker (Hatchet consumer).

Plain Docker Compose

On any Linux box with Docker:

git clone https://github.com/Growth-Kinetics/DiffMem.git
cd DiffMem
cp .env.example .env
# Edit .env and set OPENROUTER_API_KEY
docker compose up -d

The service listens on http://localhost:8000. All state lives in the diffmem_data named volume — back it up with docker run --rm -v diffmem_data:/data -v $(pwd):/backup alpine tar czf /backup/diffmem-$(date +%F).tar.gz /data.

As a Python library

from diffmem import DiffMemory

memory = DiffMemory("/path/to/worktree", "alex", "your-openrouter-key")
memory.process_and_commit_session("Had coffee with mom today...", "session-123")
context = memory.get_context([{"role": "user", "content": "Tell me about mom"}])

Configuration

Everything is configured via environment variables. Only OPENROUTER_API_KEY is required; see .env.example for the full list with defaults.

Variable Default Purpose
OPENROUTER_API_KEY (required) Your OpenRouter key
DEFAULT_MODEL (required) Shared LLM for writer, onboarding, and retrieval agents
RETRIEVAL_MODEL (unset) Optional retrieval-only model override; uses DEFAULT_MODEL when unset
REQUIRE_AUTH false Enable bearer-token auth (set true for public deployments)
API_KEY (unset) Shared bearer token when REQUIRE_AUTH=true
ALLOWED_ORIGINS * CORS origins, comma-separated
BACKUP_BACKEND none none or github
BACKUP_INTERVAL_MINUTES 30 Backup cadence (0 disables periodic backups)
GITHUB_REPO_URL (unset) Private repo for the github backup backend
GITHUB_TOKEN (unset) PAT with repo scope, for github backup
STORAGE_PATH /data/storage Where the central git repo lives
WORKTREE_ROOT /data/worktrees Where per-user worktrees are mounted
DIFFMEM_ONTOLOGY personal Entity taxonomy: personal, corporate, or absolute path to a custom ontology dir
EXECUTOR inline Task executor: inline (default) or hatchet (durable, observable)
HATCHET_CLIENT_TOKEN (unset) Hatchet Cloud API token; required when EXECUTOR=hatchet
HATCHET_NAMESPACE diffmem Namespace prefix for Hatchet workflow and worker names
HATCHET_WORKER_SLOTS 10 In-process concurrency slots per worker replica

Enabling GitHub backup (optional)

Want an offsite mirror without paying for external storage? Create a private GitHub repo (e.g. yourname/my-diffmem-backup), generate a Personal Access Token with repo scope, then set:

BACKUP_BACKEND=github
GITHUB_REPO_URL=https://github.com/yourname/my-diffmem-backup
GITHUB_TOKEN=ghp_...

Cold-start restore: on a brand-new deployment with an empty /data volume, DiffMem fetches any existing user/* branches from the remote so you don't start from scratch (useful for migrations and disaster recovery). Once the volume has user branches, startup-time restores are skipped — the mounted volume is the source of truth.

Per-commit backup: a post-commit hook fires a webhook that pushes the user's branch to GitHub in the background. Push failures never block the request — the periodic backup (BACKUP_INTERVAL_MINUTES) catches up on the next tick.

Remote pull at mount time: when the service starts and a user's worktree is first accessed, DiffMem fetches their branch from GitHub and fast-forwards the local branch before serving any reads. This means edits made to memory files from another machine (pushed to GitHub) are visible as soon as the service restarts. Pull failures are non-fatal — the service continues with its local state.

Credentials: the token is passed to git via GIT_ASKPASS at call time, never written into .git/config on the volume.

Migrating from earlier DiffMem versions

If you're upgrading from a pre-0.4 deployment (where GitHub was the primary datastore), three behaviors change:

  • GitHub is now a backup, not a database. The mounted /data volume is the source of truth. Snapshot it like any other stateful service.
  • Writes no longer block on GitHub. POST /process-and-commit returns as soon as the local commit lands; the push runs in the background. Expect noticeably faster API responses.
  • Default storage paths moved from /app/storage and /app/worktrees to /data/storage and /data/worktrees. Existing deployments that set these env vars explicitly are unaffected; deployments relying on the old defaults should either rebind the volume or set STORAGE_PATH / WORKTREE_ROOT to the old paths.

To preserve backwards compatibility, setting GITHUB_REPO_URL + GITHUB_TOKEN without an explicit BACKUP_BACKEND automatically enables the GitHub backup backend.

API

Full interactive docs live at http://<your-host>/docs (Swagger UI) once the server is running.

The endpoints you'll actually use:

  • POST /memory/{user_id}/onboard — create a new user
  • POST /memory/{user_id}/process-and-commit — ingest a session transcript and commit
  • POST /memory/{user_id}/context — retrieve context for a conversation
  • GET /memory/{user_id}/jobs/{job_id} — poll a queued/running write or consolidate job

Write endpoints accept ?sync=true|false to override the default response mode and an optional callback_url in the body for webhook-style completion. Under EXECUTOR=inline (default) the default is synchronous (block-until-done; preserves pre-executor contract). Under EXECUTOR=hatchet the default is async (returns {job_id, status} in <500ms; poll the /jobs/{job_id} endpoint or supply callback_url).

Example:

curl -X POST "http://localhost:8000/memory/alex/onboard" \
  -H "Content-Type: application/json" \
  -d '{"user_info": "Alex is a software engineer from Seattle.", "session_id": "onboard-001"}'

curl -X POST "http://localhost:8000/memory/alex/process-and-commit" \
  -H "Content-Type: application/json" \
  -d '{"memory_input": "Had coffee with mom today. She mentioned her new job.", "session_id": "s-001"}'

curl -X POST "http://localhost:8000/memory/alex/context" \
  -H "Content-Type: application/json" \
  -d '{"conversation": [{"role": "user", "content": "Tell me about mom"}], "max_tokens": 15000}'

If REQUIRE_AUTH=true, add -H "Authorization: Bearer $API_KEY" to every request.

Consolidation

DiffMem ships with an out-of-band consolidator agent that repairs three failure modes the writer agent accumulates at scale: duplicate entities, an overgrown user entity, and missing interlinking. It exposes three tools — dedupe, redistribute, link — invokable independently or chained.

Every consolidation commit is prefixed with consolidate(dedupe):, consolidate(redistribute):, or consolidate(link): so retrieval agents and humans can tell repair commits apart from session-formation commits.

Endpoints:

  • POST /memory/{user_id}/consolidate — run any subset of tools on demand.
  • POST /memory/{user_id}/process-commit-and-consolidate — ingest a session, commit, then consolidate in one HTTP call.

Examples:

# Full chain (dedupe → redistribute → link), defaults.
curl -X POST "http://localhost:8000/memory/alex/consolidate" \
  -H "Content-Type: application/json" -d '{}'

# Just dedupe, with custom soft cap and link window for any tools that use them.
curl -X POST "http://localhost:8000/memory/alex/consolidate" \
  -H "Content-Type: application/json" \
  -d '{"tools": ["dedupe"], "window": 5, "soft_cap_tokens": 24000}'

# Ingest a session, commit, then consolidate — single call.
curl -X POST "http://localhost:8000/memory/alex/process-commit-and-consolidate" \
  -H "Content-Type: application/json" \
  -d '{
    "memory_input": "Met Maya again today...",
    "session_id": "s-042",
    "consolidate_tools": ["dedupe", "link"]
  }'

What each tool does:

  • dedupe — prefilters duplicate candidates (name similarity + overlapping related_entities / hard_cues), asks an LLM to confirm at high confidence, then merges the lower-strength file into the higher-strength one. The loser's filename stem is preserved as an alias in the survivor's SEMANTIC INDEX so the writer agent recognizes it on future sessions.
  • redistribute — scans for entities exceeding soft_cap_tokens (default 32 000, len(content)//4 heuristic), then either (a) moves attributed sections to their real subject's file (e.g. content about a colleague living in the user entity → the colleague's memories/people/*.md) or (b) extracts orphan themes into new memories/contexts/{slug}.md files. Prefers smaller target entities (balancing rule).
  • link — mines git log over the last window commits (default 3) for file co-occurrence, then asks an LLM to weave Obsidian-style wikilinks ([[memories/people/maya|Maya]]) inline in the prose. Idempotent: existing wikilinks are not duplicated. Opens the memory folder for navigation as an Obsidian vault.

A per-user lockfile (<worktree>/.diffmem/consolidator.lock) prevents concurrent consolidator / writer runs.

Repository layout

Each user's memory is organized as:

<worktree_root>/{user_id}/
├── {user_id}.md              # User's own profile
├── index.md                  # Auto-generated keyword index
├── memories/
│   ├── people/               # Per-person profiles
│   └── contexts/             # Thematic contexts (health, work, ...)
└── timeline/
    └── YYYY-MM.md            # Monthly timeline entries

See repo_guide.md in the repo root for the full memory schema (this file is copied into each user's worktree as repo_guide.md so the writer agent can reference it).

Status

DiffMem is production software. It runs Annabelle's memory across thousands of conversations and has been through several iterations of hardening:

  • v0.5 — pluggable ontologies via DIFFMEM_ONTOLOGY env var. Two built-in profiles (personal, corporate); custom ontologies via absolute path; community contribution path in ontologies/. All agent scanning (writer, consolidator, retrieval) is now ontology-aware.
  • v0.4 — pluggable task executor (inline thread pool or Hatchet for durable/observable execution), per-user write serialization enforced server-side, out-of-band consolidation tools (dedupe, redistribute, link), bidirectional GitHub sync.
  • v0.3 — retrieval agent with sandboxed shell commands, git-native temporal reasoning, fallback to baseline on agent failure.
  • v0.2 — async write pipeline, thread pool to keep the event loop free, Railway/Docker hardening.

Known limitations:

  • Write operations take 60–600s (LLM + git I/O). By default the HTTP response blocks until completion; pass ?sync=false to get a job_id back immediately and poll GET /memory/{user_id}/jobs/{job_id}.
  • Retrieval quality is model-dependent. GPT-4o-class models produce materially better entity linking and temporal reasoning than smaller models.
  • Writer agent can default to updating the user entity on every session. Run the consolidator periodically (POST /memory/{user_id}/consolidate) to redistribute overgrown entities.
  • Prompt tuning is ongoing — contributions welcome.

Future Vision

DiffMem points to a future where AI memory is as versioned and collaborative as code:

  • Agent-Driven Pruning: LLMs that "forget" low-strength memories by archiving to git branches, mimicking neural plasticity.
  • Collaborative Memories: Multi-agent systems sharing repos, with merge requests for "memory reconciliation."
  • Temporal Agents: Specialized models that query git logs to answer "how did I change?"
  • Multi-Provider Retrieval: Swap between OpenRouter, Cerebras, or any OpenAI-compatible provider.
  • Open-Source Ecosystem: Plugins for voice input, mobile sync, or integration with tools like Obsidian.

DiffMem is built and maintained by Growth Kinetics. We'd love collaborations, PRs, or honest feedback.

Contributing

Fork, experiment, PR. We're especially interested in:

  • Alternative storage / backup backends (S3, GCS, plain rsync).
  • Retrieval strategy improvements.
  • Real-world integrations.

License: MIT Growth Kinetics © 2025


FAQ

What is DiffMem?

DiffMem is a lightweight, git-based memory backend for AI agents and conversational systems. It uses Markdown files for human-readable storage, Git for tracking temporal evolution through differentials, and a git-native retrieval agent. No vector databases, no embeddings, no BM25 — just git and an LLM.

Key Features

Feature Description
Git-Based Memory as versioned repository
Markdown Storage Human-readable memory files
Native Retrieval grep, git log, git diff
Temporal Evolution Track changes over time
No Embeddings Pure git + LLM approach
Production Ready Powers Annabelle AI

Production Usage

DiffMem powers Annabelle, a simulated intelligence that maintains persistent memory across thousands of conversations on WhatsApp and Messenger.

Installation

pip install diffmem

Requirements

Requirement Version
Python 3.11+
Git Any version
LLM API OpenAI/Anthropic/etc.

License

MIT License

Help & Resources

About

Git Based Memory Storage for Conversational AI Agent

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors