cogni-mem is a local-first Python agent memory framework for developers building AI agents, coding agents, and multi-agent systems that gives agents a structured, multi-channel long-term memory with adaptive lifecycle governance.
cogni-mem (GitHub: cogniweave-memory) is a local-first Python agent memory framework that models agent memory as a set of specialized channels — Key, Semantic, Episodic, Perceptual, Experience, and Sensory Buffer — rather than a single flat vector store. Each channel has its own storage backend, retrieval strategy, retention policy, and decay curve, so the agent's memory behaves more like human memory and less like a database with embeddings.
The framework runs entirely on local infrastructure: JSON for key-value memory, SQLite for metadata, and local Qdrant path mode for vector search. No external services are required to get started. Optional integrations for Neo4j (graph-enhanced semantic memory) and MiniMax (OpenAI-compatible LLM provider) are available as install extras.
At its core, cogni-mem is not just a memory store — it is a complete memory-aware agent execution loop: task understanding → sensory buffering → multi-channel parallel retrieval → cross-channel fusion → context orchestration → execution → writeback decision → classified long-term storage → memory lifecycle governance → adaptive feedback tuning.
Most memory solutions for LLM agents are essentially "RAG with a vector database" — store everything, embed everything, retrieve by similarity. This works for document search but breaks down for agent memory:
| Problem with plain RAG | How cogni-mem handles it |
|---|---|
| One-size-fits-all retrieval | Task-aware routing selects relevant memory channels per task type (coding, planning, QA, dialogue) |
| No concept of memory importance | Multi-dimensional scoring (importance, novelty, confidence, consistency, reusability) drives retention and retrieval priority |
| Storage grows unbounded | Memory lifecycle governance with per-channel retention profiles, forgetting, summarization, demotion, and archiving |
| Retrieved context is flat | Cross-channel fusion normalizes and merges results from semantic, episodic, perceptual, and experience memory |
| No feedback loop | Adaptive policy updates adjust retrieval bias, channel weights, and write thresholds based on execution outcomes |
| Everything is a vector | Key memory (explicit facts/rules) uses JSON key-value storage, not vector search — faster and deterministic |
cogni-mem is closer to MemGPT in ambition (both aim for memory-aware agents) but follows a different design philosophy: MemGPT relies on a single simulated memory hierarchy managed by an LLM orchestrator, while cogni-mem uses engineered, deterministic channels with explicit routing, scoring, and lifecycle policies — making it more predictable, debuggable, and suitable for production agent workflows. Unlike Mem0, which focuses on user-level memory for personalization, cogni-mem is designed for task-level agent reasoning across multiple memory types simultaneously. Compared to LangMem, cogni-mem provides a ready-to-use runtime with built-in retrieval, writeback, and forgetting, rather than a set of memory primitives that you assemble yourself.
- Multi-channel memory architecture — Key, Semantic, Episodic, Perceptual, Experience, Sensory Buffer
- Local-first runtime — JSON + SQLite + local Qdrant path mode, zero external dependencies
- Memory-aware agent loop — retrieval, context building, execution, and writeback in one cycle
- RAG retrieval pipeline — query expansion, HyDE, multi-query expansion, and cross-channel fusion
- Memory lifecycle governance — per-channel retention profiles, forgetting, summarization, demotion, and archiving
- Adaptive feedback loop — execution outcomes automatically tune retrieval bias and channel weights
- Tool Registry — inject memory tools (search, forget, lifecycle, ingestion) into the agent loop
- Optional Neo4j graph — entity linking and graph traversal for semantic memory (
pip install cogni-mem[graph]) - Optional MiniMax provider — OpenAI-compatible LLM integration (
pip install cogni-mem[minimax]) - Task-aware routing — automatically selects memory channels based on task type (coding, QA, planning, dialogue)
from cogniweave_full import (
CalculatorTool,
Config,
LLMFactory,
MemoryAgent,
MemoryForgetTool,
MemoryLifecycleTool,
MemoryManager,
MemorySearchTool,
OfflineIngestionTool,
ToolRegistry,
)
# Configure with mock LLM for zero-cost local testing
config = Config(
llm_provider="mock",
enable_hyde=False,
enable_mqe=False,
enable_qdrant=False,
enable_neo4j=False,
)
llm = LLMFactory.create(config=config)
registry = ToolRegistry()
manager = MemoryManager(
llm=llm,
tool_registry=registry,
base_dir="./runtime_demo",
config=config,
)
# Register memory tools into the agent loop
registry.register_tool(CalculatorTool())
registry.register_tool(MemorySearchTool(manager))
registry.register_tool(MemoryForgetTool(manager))
registry.register_tool(MemoryLifecycleTool(manager))
registry.register_tool(OfflineIngestionTool(manager))
# Create a memory-aware agent
agent = MemoryAgent(
name="assistant",
llm=llm,
memory_manager=manager,
user_id="demo_user",
session_id="demo_session",
system_prompt="You are a memory-aware assistant.",
)
# The agent remembers across turns
print(agent.run("Remember that future answers should start with the conclusion."))
print(agent.run("What did we agree on earlier?"))For a full working example with ReAct agent mode, see demo.py.
User Input
│
▼
┌─────────────────────────────────────────────────┐
│ TaskModalityRouter │
│ Classifies task → selects relevant channels │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ MemoryRAGPipeline │
│ Query expansion → parallel channel retrieval │
│ → scoring → cross-channel fusion → normalize │
└─────────────────────────────────────────────────┘
│
├── Key Memory (JSON, explicit facts/rules)
├── Semantic Memory (SQLite + Qdrant + optional Neo4j)
├── Episodic Memory (SQLite + Qdrant)
├── Perceptual Memory (SQLite + Qdrant)
└── Experience Memory (SQLite + Qdrant)
│
▼
┌─────────────────────────────────────────────────┐
│ ContextOrchestrator │
│ Builds prompt context from retrieved items │
│ + system prompt + tool schemas + dialogue │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Agent Execution (LLM + Tool Calls) │
│ MemoryAgent / ReActMemoryAgent │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ PostRunMemoryRouter │
│ Decides what to write, where, and at what level │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Consolidation + AsyncWriteBack │
│ Candidate extraction → record creation → write │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Memory Lifecycle (ForgetScheduler) │
│ Retention scoring → demote / summarize / │
│ archive / delete per channel │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Feedback Loop (PolicyUpdater) │
│ Execution outcomes → adjust retrieval bias, │
│ channel weights, write thresholds │
└─────────────────────────────────────────────────┘
Default local runtime:
- Key memory → JSON file
- Metadata → SQLite (one database per channel)
- Vector memory → local Qdrant path mode (no server needed)
Optional services:
- Set
COGNIWEAVE_ENABLE_QDRANT=trueto connect to a remote Qdrant instance - Set
COGNIWEAVE_ENABLE_NEO4J=trueand installcogni-mem[graph]for graph-enhanced semantic memory
# Base install (local runtime only)
pip install cogni-mem
# With MiniMax OpenAI-compatible provider
pip install "cogni-mem[minimax]"
# With Neo4j graph enhancement
pip install "cogni-mem[graph]"
# With everything
pip install "cogni-mem[all]"
# Pin a specific version
pip install cogni-mem==0.1.0
# Install from GitHub
pip install "git+https://github.com/luckly06/cogniweave-memory.git@v0.1.0"Requirements: Python >= 3.11
Environment variables (see .env.example):
| Variable | Description |
|---|---|
COGNIWEAVE_LLM_PROVIDER |
LLM provider (e.g., minimax, mock) |
MINIMAX_API_KEY |
API key for MiniMax provider |
MINIMAX_BASE_URL |
Base URL for MiniMax-compatible endpoint |
MINIMAX_MODEL |
Model name for MiniMax provider |
COGNIWEAVE_ENABLE_QDRANT |
Set to true for remote Qdrant |
COGNIWEAVE_ENABLE_NEO4J |
Set to true for Neo4j graph |
Mem0 is primarily a user-level memory layer for personalization — it remembers user preferences across sessions. cogni-mem is a task-level agent memory framework that manages multiple memory channels (semantic, episodic, perceptual, experience) simultaneously, with built-in retrieval, writeback, and lifecycle governance. If you need an agent that remembers how to solve problems, not just what the user likes, cogni-mem is the better fit.
MemGPT simulates a memory hierarchy (core memory / archival memory) managed by an LLM that decides what to move between tiers. cogni-mem takes a different approach: engineered, deterministic memory channels with explicit routing, scoring, and retention policies. This makes cogni-mem more predictable, easier to debug, and less dependent on the LLM for memory management decisions.
Yes. The default configuration uses a mock LLM and local storage (JSON + SQLite + local Qdrant path mode), so the entire agent memory framework runs offline. When you're ready to use a real LLM, configure the MiniMax provider or bring your own OpenAI-compatible endpoint.
- Local (default): JSON for key memory, SQLite for metadata, Qdrant path mode for vector search
- Remote Qdrant: set
COGNIWEAVE_ENABLE_QDRANT=true - Neo4j graph: install
cogni-mem[graph]and setCOGNIWEAVE_ENABLE_NEO4J=true
Yes. The framework includes LLMFactory with a pluggable provider interface. The built-in MiniMax provider is OpenAI-compatible, so any provider that exposes an OpenAI-compatible API (including local models via Ollama or vLLM) can be used by configuring the base URL and model name.
No. cogni-mem has a memory lifecycle system with per-channel retention profiles. Each channel has configurable thresholds for retention scoring, summarization, archiving, and deletion. A background scheduler periodically evaluates memories and applies the appropriate action based on importance, recency, reuse frequency, and capacity pressure.
cogni-mem is at v0.1.0 (alpha). The core architecture — multi-channel memory, RAG pipeline, agent loop, lifecycle governance, and feedback system — is implemented and functional. The 0.1.x release line focuses on packaging, dependency management, and documentation. Production use is encouraged for evaluation and prototyping; expect API changes in future releases.
cogni-mem is the PyPI package name (pip install cogni-mem). cogniweave-memory is the GitHub repository name (github.com/luckly06/cogniweave-memory). The import namespace is cogniweave_full. They refer to the same project.
MIT License. See LICENSE for details.