feat: RAG long-term memory with vector embeddings

## Long-term Memory via RAG

Inspired by [Dreamfinder](https://github.com/imagineering-cc/dreamfinder)'s memory system.

Currently Gremlin has a 10-message sliding window and a 30-minute conversation TTL. Important context just... evaporates. This adds persistent long-term memory with semantic retrieval.

### How it works

1. **Embedding pipeline** — As conversations happen (or during the dream cycle), important messages/decisions/context get embedded via Voyage AI and stored in the DB with their vector representations.
2. **Memory retriever** — On each new message, retrieve the top-N semantically similar memories using cosine similarity and inject them into the system prompt context window.
3. **Memory consolidator** — Periodically merge/deduplicate similar memories to avoid bloat.

### What gets remembered

- Decisions made in chat
- Task assignments and outcomes
- Team preferences and conventions
- Recurring topics and project context
- Anything the dream cycle explicitly files as worth remembering

### DB additions needed

- `memories` table: id, chat_id, content, embedding (blob/json), created_at, source (conversation/dream), similarity score metadata
- Index on chat_id for fast retrieval

### System prompt injection

Retrieved memories injected as a `## Long-term Memory` section near the top of the system prompt, formatted as a concise bullet list. Top ~5 most relevant memories per message.

### Dependencies

- Voyage AI API key (`VOYAGE_API_KEY` env var) — or swap for any embeddings provider (OpenAI, local, etc.)
- Vector similarity search (cosine, done in-process over SQLite blob storage — no need for a vector DB at this scale)

### Why this is good

Right now every conversation starts cold. With RAG memory, Gremlin actually *knows* the team over time — remembers that @thinkerer prefers tasks in a specific format, that the auth system has a known quirk, that sprint planning always happens Tuesday. It gets smarter the longer it runs.

---

Reference implementation: [`lib/src/memory/`](https://github.com/imagineering-cc/dreamfinder/tree/main/lib/src/memory) in Dreamfinder (Dart). Uses Voyage AI embeddings + cosine similarity retrieval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: RAG long-term memory with vector embeddings #71

Long-term Memory via RAG

How it works

What gets remembered

DB additions needed

System prompt injection

Dependencies

Why this is good

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: RAG long-term memory with vector embeddings #71

Description

Long-term Memory via RAG

How it works

What gets remembered

DB additions needed

System prompt injection

Dependencies

Why this is good

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions