Long-term Memory via RAG
Inspired by Dreamfinder's memory system.
Currently Gremlin has a 10-message sliding window and a 30-minute conversation TTL. Important context just... evaporates. This adds persistent long-term memory with semantic retrieval.
How it works
- Embedding pipeline — As conversations happen (or during the dream cycle), important messages/decisions/context get embedded via Voyage AI and stored in the DB with their vector representations.
- Memory retriever — On each new message, retrieve the top-N semantically similar memories using cosine similarity and inject them into the system prompt context window.
- Memory consolidator — Periodically merge/deduplicate similar memories to avoid bloat.
What gets remembered
- Decisions made in chat
- Task assignments and outcomes
- Team preferences and conventions
- Recurring topics and project context
- Anything the dream cycle explicitly files as worth remembering
DB additions needed
memories table: id, chat_id, content, embedding (blob/json), created_at, source (conversation/dream), similarity score metadata
- Index on chat_id for fast retrieval
System prompt injection
Retrieved memories injected as a ## Long-term Memory section near the top of the system prompt, formatted as a concise bullet list. Top ~5 most relevant memories per message.
Dependencies
- Voyage AI API key (
VOYAGE_API_KEY env var) — or swap for any embeddings provider (OpenAI, local, etc.)
- Vector similarity search (cosine, done in-process over SQLite blob storage — no need for a vector DB at this scale)
Why this is good
Right now every conversation starts cold. With RAG memory, Gremlin actually knows the team over time — remembers that @Thinkerer prefers tasks in a specific format, that the auth system has a known quirk, that sprint planning always happens Tuesday. It gets smarter the longer it runs.
Reference implementation: lib/src/memory/ in Dreamfinder (Dart). Uses Voyage AI embeddings + cosine similarity retrieval.
Long-term Memory via RAG
Inspired by Dreamfinder's memory system.
Currently Gremlin has a 10-message sliding window and a 30-minute conversation TTL. Important context just... evaporates. This adds persistent long-term memory with semantic retrieval.
How it works
What gets remembered
DB additions needed
memoriestable: id, chat_id, content, embedding (blob/json), created_at, source (conversation/dream), similarity score metadataSystem prompt injection
Retrieved memories injected as a
## Long-term Memorysection near the top of the system prompt, formatted as a concise bullet list. Top ~5 most relevant memories per message.Dependencies
VOYAGE_API_KEYenv var) — or swap for any embeddings provider (OpenAI, local, etc.)Why this is good
Right now every conversation starts cold. With RAG memory, Gremlin actually knows the team over time — remembers that @Thinkerer prefers tasks in a specific format, that the auth system has a known quirk, that sprint planning always happens Tuesday. It gets smarter the longer it runs.
Reference implementation:
lib/src/memory/in Dreamfinder (Dart). Uses Voyage AI embeddings + cosine similarity retrieval.