An agentic long-term memory system for language models that enables AI assistants to store, retrieve, and intelligently process conversational context across sessions.
Live Demo: https://deep-memory.vercel.app/
Deep Memory extends language models beyond single-session context windows by providing persistent memory capabilities. Unlike traditional memory systems that compress information ahead-of-time (AOT), Deep Memory follows a just-in-time (JIT) compilation approach: it maintains complete conversation history in a searchable page-store while performing intensive "deep research" at query time to retrieve and synthesize exactly the information needed.
This design avoids the information loss inherent in pre-compressed memory systems, enabling high-fidelity retrieval and task-adaptive context generation.
Deep Memory implements a dual-agent framework:
Processes conversations into two complementary forms:
Lightweight Memory
- Generates concise abstracts highlighting key information from each session
- Enables fast preliminary searches across conversation history
Complete Page-Store
- Preserves full conversation content with contextual headers
- Ensures accurate retrieval without information loss
Storage Implementation
- Four Pinecone indexes for multi-modal search
- Dense indexes using semantic embeddings (text-embed-v2)
- Sparse indexes using keyword matching (pinecone-sparse-english-v0)
Performs deep research through three phases:
Planning
- Analyzes queries and existing memory to determine information needs
- Selects optimal search strategies (keyword, vector, hybrid, or image)
Searching
- Executes planned searches across the page-store
- Retrieves relevant conversations using multiple search modalities
Integration
- Synthesizes retrieved information into coherent factual summaries
- Filters and combines facts from multiple sources
When Research is enabled, the system performs intensive just-in-time retrieval:
-
Planning Phase: The Planning Agent examines your query along with existing memory abstracts to identify what specific information is needed and determine which search tools are most appropriate (keyword for exact matches, vector for semantic similarity, hybrid for comprehensive coverage, image for visual analysis).
-
Search Phase: The Research Agent executes searches across the page-store using keyword searches for exact entities, vector searches for conceptually related content, hybrid searches combining both approaches with AI reranking, and image searches for query-specific analysis of stored images.
-
Integration Phase: The Integrate Agent synthesizes all retrieved information by filtering out irrelevant content, combining facts from multiple sources, and producing a coherent factual summary focused on your query.
-
Response Generation: The final integrated context is injected into the AI prompt, enabling informed, accurate responses grounded in your complete conversation history.
When Memorize is enabled, conversations are preserved for future retrieval:
- Processing: Memory Agent receives the conversation and any attached images
- Image Analysis: Visual content is analyzed using GPT-4 Vision
- Abstract Generation: AI creates concise summaries highlighting key information
- Dual Storage: Abstracts stored in memo indexes (fast lightweight search) and full content stored in page indexes (complete detail preservation)
- Embedding: Content is converted to both dense (semantic) and sparse (keyword) vectors
- Indexing: All forms are uploaded to Pinecone for future deep research queries
- Next.js 13
- React 18
- TypeScript
- Tailwind CSS + shadcn/ui components
- Python 3.12
- Flask
- OpenAI API
- Pinecone SDK
- Redis
- Tenacity
- Planning Agent: Analyzes queries and formulates search strategies
- Research Agent: Executes multi-tool searches with parallel retrieval
- Integrate Agent: Synthesizes evidence into coherent summaries
- Memory Agent: Processes and stores conversations with dual indexing
- Node.js 18+ and pnpm
- Python 3.12
- uv (Python package manager)
- Redis server
- OpenAI API key
- Pinecone API key
Clone the repository:
git clone https://github.com/jkatyan/deep-memory.git
cd deep-memoryInstall frontend dependencies:
pnpm installInstall backend dependencies:
uv syncStart the development server:
pnpm devThis will start:
- Next.js frontend on
http://localhost:3000 - Flask backend on
http://127.0.0.1:5328
- Open
http://localhost:3000in your browser - Enter your OpenAI and Pinecone API keys (stored in Redis with 1-hour expiry)
- Start chatting with the AI assistant
- Toggle Research to retrieve relevant context from past conversations
- Toggle Memorize to store current conversation for future retrieval
Configure the following in your deployment environment:
REDIS_URL=redis://localhost:6379/0 # Redis connection stringThis implementation is inspired by the General Agentic Memory (GAM) framework introduced in:
"General Agentic Memory Via Deep Research"
B.Y. Yan, Chaofan Li, Hongjin Qian, Shuqi Lu, and Zheng Liu (2025)
arXiv:2511.18423
