tinyMem gives small and medium language models (7Bβ13B) reliable long-term memory in complex codebases. It sits between you and the LLM, injecting verified context and capturing validated factsβall locally, without model retraining or cloud dependencies.
- What tinyMem Is (and Isn't)
- Purpose
- Key Features
- Quick Start
- Installation
- Usage
- tinyTasks
- Integration
- Architecture
- Token Economics
- Configuration
- Development
- Benchmarks
- Contributing
- License
- A deterministic, evidence-gated memory system for LLMs in long-lived codebases
- Lexical recall engine (FTS5) with CoVe filtering for noise reduction
- Truth state authority enforcement preventing hallucinated facts
- Memory governance layer that decides what is known, recalled, and trusted
- β An autonomous agent or execution engine
- β A repair/retry loop system
- β A semantic/vector search system
- β A task execution framework
Core Principle: tinyMem governs memory, not behavior. It decides what is knownβnever what is done.
tinyMem records and evaluates evidence but never executes commands to gather evidence.
Clear boundary:
- Agents execute - Run tests, build code, verify behavior
- tinyMem records - Stores evidence results (exit codes, file existence, grep matches)
- tinyMem evaluates - Gates fact promotion based on evidence validity
Example:
- Agent: Runs
go test ./...and gets exit code 0 - Agent: Submits evidence
cmd_exit0::go test ./...with memory - tinyMem: Verifies evidence format and gates fact promotion
- tinyMem: DOES NOT re-run the command itself
This keeps tinyMem as pure memory governance, never execution.
If you've ever used an AI for a large project, you know it eventually starts to "forget." It forgets which database you chose, it forgets the naming conventions you agreed on, and it starts making things up (hallucinating).
tinyMem is a "Hard Drive for your AI's Brain."
tinyMem was initially built to solve a specific problem: improving the reliability of small, locally-hosted LLMs (7Bβ13B). These models often suffer from "context drift" where they lose track of project decisions over long sessions.
As the project grew, we realized that memory alone wasn't enough. Reliability requires Truth Discipline. This led to the expansion of tinyMem into what it is today: a comprehensive Control Protocol that mandates evidence-based validation and strict execution phases for any agent touching a repository.
- No more repeating yourself: "Remember, we use Go for the backend."
- No more AI hallucinations: If the AI isn't sure, it checks its memory.
- Total Privacy: Your project data never leaves your machine to "train" a model.
Get up and running in seconds.
Go to your project root and initialize the memory database:
cd /path/to/your/project
tinymem healthStart the server (choose one mode):
Option A: Proxy Mode (for generic LLM clients)
tinymem proxy
# Then point your client (e.g., OpenAI SDK) to http://localhost:8080/v1Option B: MCP Mode (for Claude Desktop, Cursor, VS Code)
tinymem mcp
# Configure your IDE to run this commandSee the Quick Start Guide for Beginners for a detailed walkthrough.
Download from the Releases Page.
macOS / Linux:
os="$(uname -s | tr '[:upper:]' '[:lower:]')"
arch="$(uname -m)"
case "$arch" in
x86_64|amd64) arch="amd64" ;;
aarch64|arm64) arch="arm64" ;;
*) echo "Unsupported arch: $arch" >&2; exit 1 ;;
esac
curl -L "https://github.com/daverage/tinyMem/releases/latest/download/tinymem-${os}-${arch}" -o tinymem
chmod +x tinymem
sudo mv tinymem /usr/local/bin/Windows:
Download tinymem-windows-amd64.exe, rename to tinymem.exe, and add to your system PATH.
Requires Go 1.25.6+.
git clone https://github.com/daverage/tinyMem.git
cd tinyMem
./build/build.sh # Build only
# or
./build/build.sh patch # Release (patch version bump)Cross-Compilation (on Mac): To build Windows or Linux binaries on a Mac, you need C cross-compilers:
- For Windows (Intel/AMD):
brew install mingw-w64 - For Windows (ARM64):
brew install zig - For Linux:
brew install FiloSottile/musl-cross/musl-cross(static) orbrew install zig
Cross-Compilation (on Windows): To build macOS or Linux binaries on Windows, you need Zig:
winget install zig.zig
Tip: zig is the recommended way to enable cross-compilation for all platforms with a single tool, regardless of whether you are on Mac or Windows.
Use the GitHub Container Registry image. Replace OWNER with your GitHub username or org (for this repo, daverage).
docker pull ghcr.io/OWNER/tinymem:latest
docker run --rm ghcr.io/OWNER/tinymem:latest healthThe tinyMem CLI is your primary way to interact with the system from your terminal.
| Command | What it is | Why use it? | Example |
|---|---|---|---|
health |
System Check | To make sure tinyMem is installed correctly and can talk to its database. | tinymem health |
stats |
Memory Overview | To see how many memories you've stored and how your tasks are progressing. | tinymem stats |
dashboard |
Visual Status | To get a quick, beautiful summary of your project's memory "health." | tinymem dashboard |
query |
Search | To find specific information you or the AI saved previously. | tinymem query "API" |
recent |
Recent History | To see the last few things tinyMem learned or recorded. | tinymem recent |
write |
Manual Note | To tell the AI something important that it should never forget. | tinymem write --type decision --summary "Use Go 1.25" |
run |
Command Wrapper | To run a script or tool (like make or npm test) while "reminding" it of project context. |
tinymem run make build |
proxy / mcp |
Server Modes | To start the "brain" that connects tinyMem to your IDE or AI client. | tinymem mcp |
doctor |
Diagnostics | To fix the system if it stops working or has configuration issues. | tinymem doctor |
init |
Project Bootstrap | Creates .tinyMem, writes the config, and installs the correct agent contracts for your model size. |
tinymem init |
update |
Refresh | Re-runs migrations and downloads whichever agent contract matches your configuration. | tinymem update |
Think of writing memories as "tagging" reality for the AI.
# Record a decision so the AI doesn't suggest an alternative later
tinymem write --type decision --summary "Switching to REST" --detail "GraphQL was too complex for this scale."
# Add a simple note for yourself or the AI
tinymem write --type note --summary "The database password is in the vault, not .env"| Type | Evidence Required? | Truth State | Recall Tier |
|---|---|---|---|
| Fact | β Yes | Verified | Always |
| Decision | β Yes (Confirmation) | Asserted | Contextual |
| Constraint | β Yes | Asserted | Always |
| Claim | β No | Tentative | Contextual |
| Plan | β No | Tentative | Opportunistic |
Evidence types supported: file_exists, grep_hit, cmd_exit0, test_pass.
tinyTasks β file-authoritative task ledger enforced by tinyMem
tinyTasks is a built-in task management system that lives alongside your code in tinyTasks.md.
What tinyTasks Is:
- File-authoritative -
tinyTasks.mdis the single source of truth - Human-authored - Only humans create and define tasks
- Intent ledger - Grounds what work is authorized
- Enforcement anchor - STRICT mode refuses work without tasks
What tinyMem Does With tinyTasks:
- Reads task state to verify human intent
- Enforces authority (refuses work without tasks)
- Guards against false completion claims
What tinyMem Does NOT Do:
- Execute tasks
- Update task status automatically
- Drive task completion
- Create task entries
tinyTasks exists to ground authority, not to drive execution.
Agents may read tinyTasks for intent, update tasks after completing work, and use tasks as execution checkpoints. tinyMem validates that task state exists and refuses STRICT work without tasks, but never autonomously manages or completes tasks.
Intercepts standard OpenAI-compatible requests.
export OPENAI_API_BASE_URL=http://localhost:8080/v1
# Your existing scripts now use tinyMem automaticallyWhile proxying, tinyMem now reports recall activity back to the client so that downstream UIs or agents can show βmemory checkedβ indicators:
- Streaming responses append an SSE event of type
tinymem.memory_statusonce the upstream LLM finishes. The payload includesrecall_count,recall_status(none/injected/failed), and a timestamp. - Non-streaming responses carry the same data via new headers:
X-TinyMem-Recall-StatusandX-TinyMem-Recall-Count. Agents or dashboards that read those fields can display whenever recall was applied or when the proxy skipped it.
Compatible with Claude Desktop, Cursor, and other MCP clients.
Claude Desktop Configuration (claude_desktop_config.json):
{
"mcpServers": {
"tinymem": {
"command": "/absolute/path/to/tinymem",
"args": ["mcp"]
}
}
}When tinyMem is running in MCP mode, your AI agent (like Claude or Gemini) gains these "superpowers":
memory_query: Search the past. The AI uses this to find facts, decisions, or notes related to its current task.memory_recent: Get up to speed. The AI uses this when it first starts to see what has happened recently in the project.memory_write: Learn something new. The AI uses this to save a new fact or decision it just discovered or made. Facts require "Evidence" (like checking if a file exists).memory_ralph: Self-Repair. This is the "Nuclear Option." The AI uses this to try and fix a bug autonomously by running tests, reading errors, and retrying until it works.memory_stats&memory_health: System Check. The AI uses these to check if its memory is working correctly or how much it has learned.memory_doctor: Self-Diagnosis. If the AI feels "confused" or senses memory issues, it can run this to identify problems.
CRITICAL: If you are building an AI agent, you MUST include the appropriate directive in its system prompt to ensure it uses tinyMem correctly.
Quick Setup: Run tinymem init once to bootstrap .tinyMem, create config, and install the correct agent contract for your model size. Use tinymem update later to rerun migrations and refresh the contract (it will download the small or large version that your configuration points to).
- Claude:
docs/agents/CLAUDE.md - Gemini:
docs/agents/GEMINI.md - Qwen:
docs/agents/QWEN.md - Other (Large LLMs):
docs/agents/AGENT_CONTRACT.md - Other (Tiny LLMs):
docs/agents/AGENT_CONTRACT_SMALL.md
Detailed integration guides for various tools and ecosystems can be found in the examples/ directory:
- Claude Integration (Desktop & CLI)
- Aider Integration
- GitHub Copilot
- Local LLM Setup (Ollama, LM Studio)
- IDE Configuration (Cursor, VS Code, Zed)
flowchart TD
User[LLM Client / IDE] <-->|Request/Response| Proxy[TinyMem Proxy / MCP]
subgraph "1. Recall Phase"
Proxy --> Recall[Recall Engine]
Recall -->|FTS5 Lexical| DB[(SQLite)]
Recall -->|CoVe Filter| Tiers{Recall Tiers}
Tiers -->|Always/Contextual| Context[Context Injection]
end
subgraph "2. Extraction Phase"
LLM[LLM Backend] -->|Stream| Proxy
Proxy --> Extractor[Extractor]
Extractor -->|Parse| CoVe{CoVe Filter}
CoVe -->|High Conf| Evidence{Evidence Check}
Evidence -->|Verified| DB
end
Context --> LLM
.
βββ .tinyMem/ # Project-scoped storage (DB, logs, config)
βββ assets/ # Logos and icons
βββ build/ # Build scripts
βββ cmd/ # Application entry points
βββ docs/ # Documentation & Agent Contracts
βββ internal/ # Core logic (Memory, Evidence, Recall)
βββ README.md # This file
tinyMem provides built-in tools to help you understand your project's memory state and health.
- Dashboard: Run
tinymem dashboardto see a visual summary of memories, tasks, and CoVe performance. - Doctor: Run
tinymem doctorto perform a comprehensive diagnostic check of the database, configuration, and connectivity. - Stats: Run
tinymem statsfor a detailed terminal breakdown of memory types and task completion rates.
These savings are empirically measured under identical workloads, not theoretical. See Evidence above for enforcement-backed benchmarks.
tinyMem uses more tokens per minute but significantly fewer tokens per task compared to standard agents.
| Feature | Token Impact | Why? |
|---|---|---|
| Recall Engine | π Saves | Replaces "Read All Files" with targeted context snippets. |
| CoVe Filtering | π Saves | Reduces noise and improves recall precision, avoiding irrelevant context. |
| Context Reset | π Saves | Prevents chat history from snowballing by starting iterations fresh. |
| Truth Discipline | π Saves | Stops expensive "hallucination rabbit holes" before they start. |
The Verdict: tinyMem acts as a "Sniper Rifle" for context. By ensuring the few tokens sent are the correct ones, it avoids the massive waste of re-reading files and debugging hallucinated code.
Zero-config by default. Override in .tinyMem/config.toml:
[recall]
max_items = 10 # Maximum memories to recall per query
[cove]
enabled = true # Chain-of-Verification (Extraction + Recall filtering)
confidence_threshold = 0.6
[execution]
mode = "STRICT" # PASSIVE, GUARDED, or STRICT (default: STRICT)
[logging]
level = "info" # "debug", "info", "warn", "error", "off"
file = "tinymem.log" # Relative to .tinyMem/logs/
For quick overrides, you can use:
TINYMEM_LOG_LEVEL=debugTINYMEM_LLM_API_KEY=sk-...TINYMEM_PROXY_PORT=8080
See Configuration Docs for details.
# Run tests
go test ./...
# Build
./build/build.shSee Task Management for how we track work.
tinyMem is designed to be provable, not aspirational. Its core claims are backed by automated, adversarial benchmarks that measure enforcement, memory stability, and token usage under identical conditions.
- Runs: 40 identical scenarios per mode
- Models: Local LLMs (7Bβ13B class)
- Temperature: 0 (deterministic)
- Scenarios:
- Forbidden task mutation
- Fact promotion without evidence
- Noisy / ambiguous memory extraction
- Comparison:
- Baseline (no memory governance)
- tinyMem (full enforcement enabled)
All measurements are derived from enforced outcomes, not model claims.
tinyMem treats blocking forbidden actions as success.
Across 40 runs:
- Violations: 0
- Forbidden actions blocked: 100%
- False success claims detected: reduced by ~66%
This means:
- The model may attempt unsafe or incorrect actions
- tinyMem consistently detects and prevents them
- No forbidden task edits or fact promotions slipped through
Enforcement failures are the only failure condition. None were observed.
This directly addresses:
- hallucinated facts
- silent task corruption
- "looks right but is wrong" behavior
Without governance, models routinely:
- re-assert previously rejected decisions
- contradict earlier facts
- invent new "truths" under pressure
tinyMem prevents this structurally by:
- Requiring evidence for fact promotion
- Persisting verified facts across runs
- Refusing contradictory durable writes
In benchmarks:
- Baseline runs produced frequent unverified success claims
- tinyMem downgraded or blocked these automatically
- Verified facts remained stable across all runs
This is not prompt discipline. It is enforced state.
tinyMem reduces token usage per completed task, even though it performs additional checks.
Across identical workloads:
- Total tokens (baseline): ~32k
- Total tokens (tinyMem): ~18k
- Reduction: ~44%
Why this happens:
- Targeted recall replaces "read everything"
- CoVe filtering removes irrelevant context
- Enforcement stops hallucination-driven retries
- Context resets prevent runaway conversations
The result is fewer tokens wasted on:
- re-reading files
- debugging imaginary bugs
- correcting false assumptions
tinyMem does not claim to:
- make models smarter
- increase raw success rates
- eliminate hallucinations at generation time
It does guarantee:
- hallucinations cannot become durable truth
- unsafe actions are blocked, not trusted
- memory remains consistent across time
tinyMem is benchmarked on enforcement, not persuasion.
Tests measure whether forbidden actions are reliably blocked, whether hallucinated facts are prevented from becoming durable, and whether task and memory boundaries hold under repeated runs. Agent compliance is measured separately and never treated as authority.
Full methodology and results: BENCHMARK.md
- Evidence-Based Truth: Typed memories (
fact,claim,decision, etc.). Only verified claims become facts. - Chain-of-Verification (CoVe): LLM-based quality filter to reduce hallucinations before storage and improve recall relevance (enabled by default). See docs/COVE.md for details.
- FTS5 Lexical Recall: Fast, deterministic full-text search across memory summaries and details using SQLite's FTS5 extension.
- Automatic Database Maintenance: Self-healing database with automatic compaction (PRAGMA optimize + incremental vacuum) and optional retention policies to prevent unbounded growth.
- Local & Private: Runs as a single binary. Data lives in
.tinyMem/. - Zero Configuration: Works out of the box.
- Dual Mode: Works as an HTTP Proxy or Model Context Protocol (MCP) server.
- Mode Enforcement: PASSIVE, GUARDED, STRICT execution modes with authority boundaries.
- Recall Tiers: Prioritizes
Always(facts) >Contextual(decisions) >Opportunistic(notes).
We value truth and reliability.
- Truth Discipline: No shortcuts on verification.
- Streaming: No buffering allowed.
- Tests: Must pass
go test ./....
See CONTRIBUTING.md.
MIT Β© 2026 Andrzej Marczewski

