-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add purpose-directed gist memory for user personalization #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| # Gist Memory Demo — Buried Preference Recall | ||
|
|
||
| This demo shows how purpose-directed gisting enables an AI agent to recall user preferences that were briefly mentioned during an unrelated conversation — a task that generic summarization and standard RAG both fail at. | ||
|
|
||
| ## The Problem | ||
|
|
||
| When a user buries a preference in a long, topically unrelated conversation, generic approaches fail: | ||
| - **Topic-preserving summarization** discards the preference as low-salience noise | ||
| - **Standard RAG** dilutes the preference signal in full-passage embeddings dominated by the conversation's main topic | ||
|
|
||
| Purpose-directed gisting solves this by compressing conversations specifically to foreground user attributes. | ||
|
|
||
| ## Setup | ||
|
|
||
| ### Option A: Kaizen Lite (Claude Code Plugin) | ||
|
|
||
| ```bash | ||
| # Install the plugin | ||
| claude --plugin-dir /path/to/kaizen/platform-integrations/claude/plugins/kaizen-lite | ||
| ``` | ||
|
|
||
| ### Option B: Full Kaizen (MCP Server) | ||
|
|
||
| ```bash | ||
| # Start the MCP server | ||
| uv run fastmcp run kaizen/frontend/mcp/mcp_server.py --transport sse --port 8201 | ||
| ``` | ||
|
|
||
| ## Demo Script | ||
|
|
||
| ### Session 1: Preference Embedding | ||
|
|
||
| Have a multi-turn conversation about an unrelated technical topic. Bury a preference in one of the messages. | ||
|
|
||
| See [session1_script.md](session1_script.md) for the full conversation script. | ||
|
|
||
| **Key message (message 5 of 12):** | ||
| > "That makes sense about the CNI plugin architecture. By the way, I strongly prefer Python over R for all my data analysis work — I find pandas much more intuitive than tidyverse. Anyway, back to the networking question — how does Cilium handle network policy enforcement?" | ||
|
|
||
| The preference ("Python over R", "pandas over tidyverse") is <5% of the total conversation content. | ||
|
|
||
| **At end of session:** | ||
| - **Lite path:** Run `/kaizen:gist` | ||
| - **MCP path:** Call `store_gist` with the conversation JSON | ||
|
|
||
| **Expected gist output:** | ||
| ``` | ||
| user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking (Cilium, CNI plugins) | ||
| ``` | ||
|
|
||
| Note how the gist foregrounds the Python/pandas preference despite it being a tiny fraction of the conversation. | ||
|
|
||
| ### Session 2: Preference Recall | ||
|
|
||
| Start a new session and ask: | ||
|
|
||
| > "I need to start a new data analysis project working with network telemetry data. What language and tools would you recommend I use?" | ||
|
|
||
| **With gist memory:** Claude recommends Python and pandas, citing your stated preference. | ||
|
|
||
| **Without gist memory:** Claude gives a generic recommendation (likely mentioning both Python and R, or asking about your preference). | ||
|
|
||
| See [session2_script.md](session2_script.md) for the verification prompts. | ||
|
|
||
| ## What to Look For | ||
|
|
||
| 1. **Gist content:** Does the gist capture the Python/pandas preference despite it being buried? | ||
| 2. **Recall accuracy:** In Session 2, does the agent correctly apply the preference? | ||
| 3. **A/B contrast:** Run Session 2 without gist memory to see the failure mode. | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,62 @@ | ||||||||||||||||||||||
| # Session 1: Preference Embedding | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Use these messages in order. The buried preference is in **Message 4**. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| --- | ||||||||||||||||||||||
|
Comment on lines
+1
to
+5
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix inconsistent message number reference. Line 3 states the buried preference is in "Message 4", but the actual buried preference appears in "Message 5 (User)" at line 19. 📝 Proposed fix # Session 1: Preference Embedding
-Use these messages in order. The buried preference is in **Message 4**.
+Use these messages in order. The buried preference is in **Message 5**.
---📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 1 (User):** | ||||||||||||||||||||||
| > I'm trying to understand how Kubernetes handles pod-to-pod networking across nodes. Can you explain how the CNI plugin architecture works? | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 2 (Assistant):** | ||||||||||||||||||||||
| > [Detailed explanation of CNI plugin architecture, pod networking, veth pairs, bridge interfaces, etc.] | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 3 (User):** | ||||||||||||||||||||||
| > That's helpful. How does this differ between Calico and Cilium? I've heard Cilium uses eBPF instead of iptables. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 4 (Assistant):** | ||||||||||||||||||||||
| > [Explanation comparing Calico's iptables-based approach vs Cilium's eBPF dataplane, performance characteristics, etc.] | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 5 (User) — THE BURIED PREFERENCE:** | ||||||||||||||||||||||
| > That makes sense about the CNI plugin architecture. By the way, I strongly prefer Python over R for all my data analysis work — I find pandas much more intuitive than tidyverse. Anyway, back to the networking question — how does Cilium handle network policy enforcement at the kernel level? | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 6 (Assistant):** | ||||||||||||||||||||||
| > [Explanation of Cilium's eBPF-based network policy enforcement, kernel-level packet filtering, etc.] | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 7 (User):** | ||||||||||||||||||||||
| > What about service mesh integration? Does Cilium replace the need for something like Istio? | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 8 (Assistant):** | ||||||||||||||||||||||
| > [Discussion of Cilium service mesh capabilities vs Istio, sidecar-free model, etc.] | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 9 (User):** | ||||||||||||||||||||||
| > I'm also curious about network observability. What tools do you recommend for monitoring pod-to-pod traffic patterns in a large cluster? | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 10 (Assistant):** | ||||||||||||||||||||||
| > [Recommendations for Hubble, Pixie, Grafana with Cilium metrics, etc.] | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 11 (User):** | ||||||||||||||||||||||
| > Great, this has been really helpful. One last question — how do I troubleshoot DNS resolution failures in pods? I've been seeing intermittent CoreDNS timeouts. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Message 12 (Assistant):** | ||||||||||||||||||||||
| > [DNS troubleshooting guidance for CoreDNS, ndots settings, etc.] | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| --- | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ## After the conversation | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Kaizen Lite:** Run `/kaizen:gist` | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| **Full Kaizen (MCP):** | ||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||
| # Store the conversation as a gist | ||||||||||||||||||||||
| curl -X POST http://localhost:8201/tools/store_gist \ | ||||||||||||||||||||||
| -H "Content-Type: application/json" \ | ||||||||||||||||||||||
| -d '{"conversation_data": "<JSON of messages above>", "conversation_id": "demo-session-1"}' | ||||||||||||||||||||||
| ``` | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ## Expected Gist Output | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| The gist should surface the buried preference: | ||||||||||||||||||||||
| ``` | ||||||||||||||||||||||
| user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking; troubleshooting CoreDNS; large cluster environment | ||||||||||||||||||||||
| ``` | ||||||||||||||||||||||
|
Comment on lines
+57
to
+62
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add language specifier to the expected output code block. The fenced code block is missing a language specifier, which triggers a Markdown lint warning. 📝 Proposed fix ## Expected Gist Output
The gist should surface the buried preference:
-```
+```text
user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking; troubleshooting CoreDNS; large cluster environmentVerify each finding against the current code and only fix it if needed. In |
||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| # Session 2: Preference Recall Verification | ||
|
|
||
| Start a **new session** (no conversation history from Session 1). The gist from Session 1 should be automatically injected via the recall hook. | ||
|
|
||
| --- | ||
|
|
||
| ## Primary Verification Prompt | ||
|
|
||
| > I need to start a new data analysis project working with network telemetry data. What language and tools would you recommend I use? | ||
|
|
||
| ### Expected Response WITH Gist Memory | ||
|
|
||
| The agent should recommend **Python and pandas**, referencing your known preference. Example: | ||
|
|
||
| > "Based on your preference for Python and pandas, I'd recommend using Python with pandas for the data analysis..." | ||
|
|
||
| ### Expected Response WITHOUT Gist Memory | ||
|
|
||
| The agent gives a **generic recommendation** — likely mentioning both Python and R as options, or asking about your preference: | ||
|
|
||
| > "For network telemetry data analysis, popular options include Python (with pandas/numpy) or R (with tidyverse). Which do you prefer?" | ||
|
|
||
| --- | ||
|
|
||
| ## Additional Verification Prompts | ||
|
|
||
| These test whether the gist captured other signals: | ||
|
|
||
| **Prompt 2:** | ||
| > What's my background — do you know what kind of infrastructure I work with? | ||
|
|
||
| Expected (with gist): Mentions Kubernetes, container networking, cluster operations. | ||
|
|
||
| **Prompt 3:** | ||
| > If I need to do some quick data wrangling, which library should I reach for? | ||
|
|
||
| Expected (with gist): Recommends pandas specifically (not tidyverse or dplyr). | ||
|
|
||
| --- | ||
|
|
||
| ## Running the A/B Comparison | ||
|
|
||
| 1. **With gist memory:** Ensure the gist entity from Session 1 exists in `.kaizen/entities/gist/` (Lite) or in the MCP backend | ||
| 2. **Without gist memory:** Temporarily rename/remove the gist entity, or use a clean project directory | ||
| 3. Run each verification prompt in both conditions and compare responses |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,9 +1,11 @@ | ||
| import logging | ||
| import uuid | ||
| from typing import Any | ||
|
|
||
| from kaizen.backend.base import BaseEntityBackend | ||
| from kaizen.config.kaizen import KaizenConfig | ||
| from kaizen.llm.fact_extraction.fact_extraction import ExtractedFact, extract_facts_from_messages | ||
| from kaizen.llm.gist.gist import generate_gist | ||
| from kaizen.schema.conflict_resolution import EntityUpdate | ||
| from kaizen.schema.core import Entity, Namespace, RecordedEntity | ||
| from kaizen.schema.exceptions import NamespaceAlreadyExistsException, NamespaceNotFoundException | ||
|
|
@@ -295,3 +297,135 @@ def retrieve_user_facts( | |
| ) | ||
|
|
||
| return categorized_preferences | ||
|
|
||
| # ── Gist memory ────────────────────────────────────────────────── | ||
|
|
||
| def store_gists( | ||
| self, | ||
| namespace_id: str, | ||
| messages: list[dict], | ||
| conversation_id: str | None = None, | ||
| metadata: dict[str, Any] | None = None, | ||
| ) -> list[EntityUpdate]: | ||
| """Generate purpose-directed gists from conversation messages and store them. | ||
|
|
||
| Implements rolling consolidation: deletes any existing gists for the same | ||
| conversation_id before storing new ones, so the latest gist always reflects | ||
| the full session. | ||
| """ | ||
| if not messages: | ||
| return [] | ||
|
|
||
| conversation_id = conversation_id or str(uuid.uuid4()) | ||
| self.ensure_namespace(namespace_id) | ||
|
|
||
| # Delete existing gists for this conversation (rolling replacement) | ||
| existing = self.search_entities( | ||
| namespace_id=namespace_id, | ||
| query=None, | ||
| filters={"type": "gist", "metadata.conversation_id": conversation_id}, | ||
| limit=100, | ||
| ) | ||
| for entity in existing: | ||
| try: | ||
| self.delete_entity_by_id(namespace_id, entity.id) | ||
| except Exception: | ||
| logger.warning("Failed to delete old gist %s during rolling replacement", entity.id, exc_info=True) | ||
|
|
||
| # Generate gists | ||
| result = generate_gist(messages, conversation_id=conversation_id) | ||
|
|
||
| if not result.gists: | ||
| return [] | ||
|
|
||
| # Store gist entities | ||
| base_metadata: dict[str, Any] = dict(metadata or {}) | ||
| base_metadata["conversation_id"] = conversation_id | ||
| base_metadata["message_count"] = result.message_count | ||
|
|
||
| gist_entities = [] | ||
| for i, gist_text in enumerate(result.gists): | ||
| gist_metadata = dict(base_metadata) | ||
| gist_metadata["chunk_index"] = i | ||
| gist_metadata["chunk_count"] = result.chunk_count | ||
| gist_entities.append(Entity(type="gist", content=gist_text, metadata=gist_metadata)) | ||
|
|
||
| updates = self.update_entities(namespace_id, gist_entities, enable_conflict_resolution=False) | ||
|
|
||
| # Store original messages as gist_source for durable retrieval | ||
| source_entities = [] | ||
| for i, msg in enumerate(messages): | ||
| content = msg.get("content", "") | ||
| if isinstance(content, list): | ||
| content = str(content) | ||
| source_entities.append( | ||
| Entity( | ||
| type="gist_source", | ||
| content=content, | ||
| metadata={ | ||
| "conversation_id": conversation_id, | ||
| "message_index": i, | ||
| "role": msg.get("role", "unknown"), | ||
| }, | ||
| ) | ||
| ) | ||
|
|
||
| if source_entities: | ||
| # Delete existing sources for this conversation first | ||
| existing_sources = self.search_entities( | ||
| namespace_id=namespace_id, | ||
| query=None, | ||
| filters={"type": "gist_source", "metadata.conversation_id": conversation_id}, | ||
| limit=1000, | ||
| ) | ||
| for entity in existing_sources: | ||
| try: | ||
| self.delete_entity_by_id(namespace_id, entity.id) | ||
| except Exception: | ||
| logger.warning("Failed to delete old gist_source %s", entity.id, exc_info=True) | ||
|
|
||
| self.update_entities(namespace_id, source_entities, enable_conflict_resolution=False) | ||
|
|
||
| return updates | ||
|
Comment on lines
+303
to
+389
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check if there are any other callers of store_gists that might invoke it concurrently
rg -n "store_gists" --type=py -C2Repository: AgentToolkit/kaizen Length of output: 802 🏁 Script executed: cat -n kaizen/frontend/mcp/mcp_server.py | sed -n '260,280p'Repository: AgentToolkit/kaizen Length of output: 949 🏁 Script executed: # Also check if there are any async/concurrent patterns in the MCP server
rg -n "async|concurrent|thread|lock|pool" kaizen/frontend/mcp/mcp_server.pyRepository: AgentToolkit/kaizen Length of output: 152 🏁 Script executed: # Check the broader context of how the MCP server handles requests
head -50 kaizen/frontend/mcp/mcp_server.pyRepository: AgentToolkit/kaizen Length of output: 1696 🏁 Script executed: # Check the backend implementation for any locking mechanisms
rg -n "def delete_entity_by_id|def search_entities|def update_entities" kaizen/frontend/client/kaizen_client.py -A 5Repository: AgentToolkit/kaizen Length of output: 1085 🏁 Script executed: # Check if there's a storage backend with its own locking
find . -name "*.py" -type f | xargs grep -l "delete_entity_by_id" | head -5Repository: AgentToolkit/kaizen Length of output: 194 🏁 Script executed: # Check the filesystem backend for delete and search implementations
rg -n "def delete_entity_by_id|def search_entities|def update_entities|lock|Lock" kaizen/backend/filesystem.py -B 2 -A 8Repository: AgentToolkit/kaizen Length of output: 5080 🏁 Script executed: # Get size of filesystem backend to decide how much to read
wc -l kaizen/backend/filesystem.pyRepository: AgentToolkit/kaizen Length of output: 95 Document the non-atomic nature of the rolling replacement logic in The search-delete-insert sequence is not atomic. While the backend uses per-operation locking, concurrent calls with the same
This is likely acceptable for the current single-user MCP tool context, but should be documented in the docstring or with a code comment explaining the assumption and acceptable degradation mode. 🤖 Prompt for AI Agents |
||
|
|
||
| def retrieve_gists( | ||
| self, | ||
| namespace_id: str, | ||
| query: str, | ||
| limit: int = 10, | ||
| ) -> list[RecordedEntity]: | ||
| """Retrieve gists relevant to a query via semantic search.""" | ||
| if not self.namespace_exists(namespace_id): | ||
| return [] | ||
| return self.search_entities( | ||
| namespace_id=namespace_id, | ||
| query=query, | ||
| filters={"type": "gist"}, | ||
| limit=limit, | ||
| ) | ||
|
|
||
| def retrieve_gist_with_source( | ||
| self, | ||
| namespace_id: str, | ||
| query: str, | ||
| limit: int = 3, | ||
| ) -> list[dict[str, Any]]: | ||
| """Retrieve gists with their original source messages. | ||
|
|
||
| Returns a list of dicts, each with 'gist' (RecordedEntity) and | ||
| 'source_messages' (list[RecordedEntity]) keys. | ||
| """ | ||
| gists = self.retrieve_gists(namespace_id, query=query, limit=limit) | ||
| results = [] | ||
| for gist in gists: | ||
| conversation_id = (gist.metadata or {}).get("conversation_id") | ||
| source_messages: list[RecordedEntity] = [] | ||
| if conversation_id: | ||
| source_messages = self.search_entities( | ||
| namespace_id=namespace_id, | ||
| query=None, | ||
| filters={"type": "gist_source", "metadata.conversation_id": conversation_id}, | ||
| limit=100, | ||
| ) | ||
| results.append({"gist": gist, "source_messages": source_messages}) | ||
| return results | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a language specifier to the fenced code block.
The code block is missing a language identifier, which triggers a markdownlint warning (MD040). Since this shows plain text output, use
textorplaintext.📝 Suggested fix
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 47-47: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents