AgentToolkit · jayaramkr · Mar 23, 2026 · Mar 23, 2026 · coderabbitai · Mar 23, 2026
diff --git a/demo/gist-memory/README.md b/demo/gist-memory/README.md
@@ -0,0 +1,69 @@
+# Gist Memory Demo — Buried Preference Recall
+
+This demo shows how purpose-directed gisting enables an AI agent to recall user preferences that were briefly mentioned during an unrelated conversation — a task that generic summarization and standard RAG both fail at.
+
+## The Problem
+
+When a user buries a preference in a long, topically unrelated conversation, generic approaches fail:
+- **Topic-preserving summarization** discards the preference as low-salience noise
+- **Standard RAG** dilutes the preference signal in full-passage embeddings dominated by the conversation's main topic
+
+Purpose-directed gisting solves this by compressing conversations specifically to foreground user attributes.
+
+## Setup
+
+### Option A: Kaizen Lite (Claude Code Plugin)
+
+```bash
+# Install the plugin
+claude --plugin-dir /path/to/kaizen/platform-integrations/claude/plugins/kaizen-lite
+```
+
+### Option B: Full Kaizen (MCP Server)
+
+```bash
+# Start the MCP server
+uv run fastmcp run kaizen/frontend/mcp/mcp_server.py --transport sse --port 8201
+```
+
+## Demo Script
+
+### Session 1: Preference Embedding
+
+Have a multi-turn conversation about an unrelated technical topic. Bury a preference in one of the messages.
+
+See [session1_script.md](session1_script.md) for the full conversation script.
+
+**Key message (message 5 of 12):**
+> "That makes sense about the CNI plugin architecture. By the way, I strongly prefer Python over R for all my data analysis work — I find pandas much more intuitive than tidyverse. Anyway, back to the networking question — how does Cilium handle network policy enforcement?"
+
+The preference ("Python over R", "pandas over tidyverse") is <5% of the total conversation content.
+
+**At end of session:**
+- **Lite path:** Run `/kaizen:gist`
+- **MCP path:** Call `store_gist` with the conversation JSON
+
+**Expected gist output:**
+```
+user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking (Cilium, CNI plugins)
+```
+
+Note how the gist foregrounds the Python/pandas preference despite it being a tiny fraction of the conversation.
+
+### Session 2: Preference Recall
+
+Start a new session and ask:
+
+> "I need to start a new data analysis project working with network telemetry data. What language and tools would you recommend I use?"
+
+**With gist memory:** Claude recommends Python and pandas, citing your stated preference.
+
+**Without gist memory:** Claude gives a generic recommendation (likely mentioning both Python and R, or asking about your preference).
+
+See [session2_script.md](session2_script.md) for the verification prompts.
+
+## What to Look For
+
+1. **Gist content:** Does the gist capture the Python/pandas preference despite it being buried?
+2. **Recall accuracy:** In Session 2, does the agent correctly apply the preference?
+3. **A/B contrast:** Run Session 2 without gist memory to see the failure mode.
diff --git a/demo/gist-memory/session1_script.md b/demo/gist-memory/session1_script.md
@@ -0,0 +1,62 @@
+# Session 1: Preference Embedding
+
+Use these messages in order. The buried preference is in **Message 4**.
+
+---
-# Session 1: Preference Embedding
-
-Use these messages in order. The buried preference is in **Message 4**.
-
---
+# Session 1: Preference Embedding
+
+Use these messages in order. The buried preference is in **Message 5**.
+
+---
-# Session 1: Preference Embedding
-
-Use these messages in order. The buried preference is in **Message 4**.
-
---
+# Session 1: Preference Embedding
+
+Use these messages in order. The buried preference is in **Message 5**.
+
+---
+
+**Message 1 (User):**
+> I'm trying to understand how Kubernetes handles pod-to-pod networking across nodes. Can you explain how the CNI plugin architecture works?
+
+**Message 2 (Assistant):**
+> [Detailed explanation of CNI plugin architecture, pod networking, veth pairs, bridge interfaces, etc.]
+
+**Message 3 (User):**
+> That's helpful. How does this differ between Calico and Cilium? I've heard Cilium uses eBPF instead of iptables.
+
+**Message 4 (Assistant):**
+> [Explanation comparing Calico's iptables-based approach vs Cilium's eBPF dataplane, performance characteristics, etc.]
+
+**Message 5 (User) — THE BURIED PREFERENCE:**
+> That makes sense about the CNI plugin architecture. By the way, I strongly prefer Python over R for all my data analysis work — I find pandas much more intuitive than tidyverse. Anyway, back to the networking question — how does Cilium handle network policy enforcement at the kernel level?
+
+**Message 6 (Assistant):**
+> [Explanation of Cilium's eBPF-based network policy enforcement, kernel-level packet filtering, etc.]
+
+**Message 7 (User):**
+> What about service mesh integration? Does Cilium replace the need for something like Istio?
+
+**Message 8 (Assistant):**
+> [Discussion of Cilium service mesh capabilities vs Istio, sidecar-free model, etc.]
+
+**Message 9 (User):**
+> I'm also curious about network observability. What tools do you recommend for monitoring pod-to-pod traffic patterns in a large cluster?
+
+**Message 10 (Assistant):**
+> [Recommendations for Hubble, Pixie, Grafana with Cilium metrics, etc.]
+
+**Message 11 (User):**
+> Great, this has been really helpful. One last question — how do I troubleshoot DNS resolution failures in pods? I've been seeing intermittent CoreDNS timeouts.
+
+**Message 12 (Assistant):**
+> [DNS troubleshooting guidance for CoreDNS, ndots settings, etc.]
+
+---
+
+## After the conversation
+
+**Kaizen Lite:** Run `/kaizen:gist`
+
+**Full Kaizen (MCP):**
+```bash
+# Store the conversation as a gist
+curl -X POST http://localhost:8201/tools/store_gist \
+  -H "Content-Type: application/json" \
+  -d '{"conversation_data": "<JSON of messages above>", "conversation_id": "demo-session-1"}'
+```
+
+## Expected Gist Output
+
+The gist should surface the buried preference:
+```
+user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking; troubleshooting CoreDNS; large cluster environment
+```
diff --git a/demo/gist-memory/session2_script.md b/demo/gist-memory/session2_script.md
@@ -0,0 +1,45 @@
+# Session 2: Preference Recall Verification
+
+Start a **new session** (no conversation history from Session 1). The gist from Session 1 should be automatically injected via the recall hook.
+
+---
+
+## Primary Verification Prompt
+
+> I need to start a new data analysis project working with network telemetry data. What language and tools would you recommend I use?
+
+### Expected Response WITH Gist Memory
+
+The agent should recommend **Python and pandas**, referencing your known preference. Example:
+
+> "Based on your preference for Python and pandas, I'd recommend using Python with pandas for the data analysis..."
+
+### Expected Response WITHOUT Gist Memory
+
+The agent gives a **generic recommendation** — likely mentioning both Python and R as options, or asking about your preference:
+
+> "For network telemetry data analysis, popular options include Python (with pandas/numpy) or R (with tidyverse). Which do you prefer?"
+
+---
+
+## Additional Verification Prompts
+
+These test whether the gist captured other signals:
+
+**Prompt 2:**
+> What's my background — do you know what kind of infrastructure I work with?
+
+Expected (with gist): Mentions Kubernetes, container networking, cluster operations.
+
+**Prompt 3:**
+> If I need to do some quick data wrangling, which library should I reach for?
+
+Expected (with gist): Recommends pandas specifically (not tidyverse or dplyr).
+
+---
+
+## Running the A/B Comparison
+
+1. **With gist memory:** Ensure the gist entity from Session 1 exists in `.kaizen/entities/gist/` (Lite) or in the MCP backend
+2. **Without gist memory:** Temporarily rename/remove the gist entity, or use a clean project directory
+3. Run each verification prompt in both conditions and compare responses
diff --git a/kaizen/config/kaizen.py b/kaizen/config/kaizen.py
@@ -8,6 +8,8 @@ class KaizenConfig(BaseSettings):
     namespace_id: str = "kaizen"
     settings: BaseSettings | None = None
     clustering_threshold: float = 0.80
+    gist_context_budget: int = 64000
+    gist_trigger_interval: int = 5
 
 
 # to reload settings call kaizen_config.__init__()

diff --git a/kaizen/config/llm.py b/kaizen/config/llm.py
@@ -25,6 +25,7 @@ class LLMSettings(BaseSettings):
     tips_model: str = Field(default_factory=_default_model_name)
     conflict_resolution_model: str = Field(default_factory=_default_model_name)
     fact_extraction_model: str = Field(default_factory=_default_model_name)
+    gist_model: str = Field(default_factory=_default_model_name)
     categorization_mode: Literal["predefined", "dynamic", "hybrid"] = "predefined"
     allow_dynamic_categories: bool = False
     confirm_new_categories: bool = False

diff --git a/kaizen/frontend/client/kaizen_client.py b/kaizen/frontend/client/kaizen_client.py
@@ -1,9 +1,11 @@
 import logging
+import uuid
 from typing import Any
 
 from kaizen.backend.base import BaseEntityBackend
 from kaizen.config.kaizen import KaizenConfig
 from kaizen.llm.fact_extraction.fact_extraction import ExtractedFact, extract_facts_from_messages
+from kaizen.llm.gist.gist import generate_gist
 from kaizen.schema.conflict_resolution import EntityUpdate
 from kaizen.schema.core import Entity, Namespace, RecordedEntity
 from kaizen.schema.exceptions import NamespaceAlreadyExistsException, NamespaceNotFoundException
@@ -295,3 +297,135 @@ def retrieve_user_facts(
             )
 
         return categorized_preferences
+
+    # ── Gist memory ──────────────────────────────────────────────────
+
+    def store_gists(
+        self,
+        namespace_id: str,
+        messages: list[dict],
+        conversation_id: str | None = None,
+        metadata: dict[str, Any] | None = None,
+    ) -> list[EntityUpdate]:
+        """Generate purpose-directed gists from conversation messages and store them.
+
+        Implements rolling consolidation: deletes any existing gists for the same
+        conversation_id before storing new ones, so the latest gist always reflects
+        the full session.
+        """
+        if not messages:
+            return []
+
+        conversation_id = conversation_id or str(uuid.uuid4())
+        self.ensure_namespace(namespace_id)
+
+        # Delete existing gists for this conversation (rolling replacement)
+        existing = self.search_entities(
+            namespace_id=namespace_id,
+            query=None,
+            filters={"type": "gist", "metadata.conversation_id": conversation_id},
+            limit=100,
+        )
+        for entity in existing:
+            try:
+                self.delete_entity_by_id(namespace_id, entity.id)
+            except Exception:
+                logger.warning("Failed to delete old gist %s during rolling replacement", entity.id, exc_info=True)
+
+        # Generate gists
+        result = generate_gist(messages, conversation_id=conversation_id)
+
+        if not result.gists:
+            return []
+
+        # Store gist entities
+        base_metadata: dict[str, Any] = dict(metadata or {})
+        base_metadata["conversation_id"] = conversation_id
+        base_metadata["message_count"] = result.message_count
+
+        gist_entities = []
+        for i, gist_text in enumerate(result.gists):
+            gist_metadata = dict(base_metadata)
+            gist_metadata["chunk_index"] = i
+            gist_metadata["chunk_count"] = result.chunk_count
+            gist_entities.append(Entity(type="gist", content=gist_text, metadata=gist_metadata))
+
+        updates = self.update_entities(namespace_id, gist_entities, enable_conflict_resolution=False)
+
+        # Store original messages as gist_source for durable retrieval
+        source_entities = []
+        for i, msg in enumerate(messages):
+            content = msg.get("content", "")
+            if isinstance(content, list):
+                content = str(content)
+            source_entities.append(
+                Entity(
+                    type="gist_source",
+                    content=content,
+                    metadata={
+                        "conversation_id": conversation_id,
+                        "message_index": i,
+                        "role": msg.get("role", "unknown"),
+                    },
+                )
+            )
+
+        if source_entities:
+            # Delete existing sources for this conversation first
+            existing_sources = self.search_entities(
+                namespace_id=namespace_id,
+                query=None,
+                filters={"type": "gist_source", "metadata.conversation_id": conversation_id},
+                limit=1000,
+            )
+            for entity in existing_sources:
+                try:
+                    self.delete_entity_by_id(namespace_id, entity.id)
+                except Exception:
+                    logger.warning("Failed to delete old gist_source %s", entity.id, exc_info=True)
+
+            self.update_entities(namespace_id, source_entities, enable_conflict_resolution=False)
+
+        return updates
+
+    def retrieve_gists(
+        self,
+        namespace_id: str,
+        query: str,
+        limit: int = 10,
+    ) -> list[RecordedEntity]:
+        """Retrieve gists relevant to a query via semantic search."""
+        if not self.namespace_exists(namespace_id):
+            return []
+        return self.search_entities(
+            namespace_id=namespace_id,
+            query=query,
+            filters={"type": "gist"},
+            limit=limit,
+        )
+
+    def retrieve_gist_with_source(
+        self,
+        namespace_id: str,
+        query: str,
+        limit: int = 3,
+    ) -> list[dict[str, Any]]:
+        """Retrieve gists with their original source messages.
+
+        Returns a list of dicts, each with 'gist' (RecordedEntity) and
+        'source_messages' (list[RecordedEntity]) keys.
+        """
+        gists = self.retrieve_gists(namespace_id, query=query, limit=limit)
+        results = []
+        for gist in gists:
+            conversation_id = (gist.metadata or {}).get("conversation_id")
+            source_messages: list[RecordedEntity] = []
+            if conversation_id:
+                source_messages = self.search_entities(
+                    namespace_id=namespace_id,
+                    query=None,
+                    filters={"type": "gist_source", "metadata.conversation_id": conversation_id},
+                    limit=100,
+                )
+            results.append({"gist": gist, "source_messages": source_messages})
+        return results