diff --git a/demo/gist-memory/README.md b/demo/gist-memory/README.md new file mode 100644 index 0000000..f218426 --- /dev/null +++ b/demo/gist-memory/README.md @@ -0,0 +1,69 @@ +# Gist Memory Demo — Buried Preference Recall + +This demo shows how purpose-directed gisting enables an AI agent to recall user preferences that were briefly mentioned during an unrelated conversation — a task that generic summarization and standard RAG both fail at. + +## The Problem + +When a user buries a preference in a long, topically unrelated conversation, generic approaches fail: +- **Topic-preserving summarization** discards the preference as low-salience noise +- **Standard RAG** dilutes the preference signal in full-passage embeddings dominated by the conversation's main topic + +Purpose-directed gisting solves this by compressing conversations specifically to foreground user attributes. + +## Setup + +### Option A: Kaizen Lite (Claude Code Plugin) + +```bash +# Install the plugin +claude --plugin-dir /path/to/kaizen/platform-integrations/claude/plugins/kaizen-lite +``` + +### Option B: Full Kaizen (MCP Server) + +```bash +# Start the MCP server +uv run fastmcp run kaizen/frontend/mcp/mcp_server.py --transport sse --port 8201 +``` + +## Demo Script + +### Session 1: Preference Embedding + +Have a multi-turn conversation about an unrelated technical topic. Bury a preference in one of the messages. + +See [session1_script.md](session1_script.md) for the full conversation script. + +**Key message (message 5 of 12):** +> "That makes sense about the CNI plugin architecture. By the way, I strongly prefer Python over R for all my data analysis work — I find pandas much more intuitive than tidyverse. Anyway, back to the networking question — how does Cilium handle network policy enforcement?" + +The preference ("Python over R", "pandas over tidyverse") is <5% of the total conversation content. + +**At end of session:** +- **Lite path:** Run `/kaizen:gist` +- **MCP path:** Call `store_gist` with the conversation JSON + +**Expected gist output:** +``` +user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking (Cilium, CNI plugins) +``` + +Note how the gist foregrounds the Python/pandas preference despite it being a tiny fraction of the conversation. + +### Session 2: Preference Recall + +Start a new session and ask: + +> "I need to start a new data analysis project working with network telemetry data. What language and tools would you recommend I use?" + +**With gist memory:** Claude recommends Python and pandas, citing your stated preference. + +**Without gist memory:** Claude gives a generic recommendation (likely mentioning both Python and R, or asking about your preference). + +See [session2_script.md](session2_script.md) for the verification prompts. + +## What to Look For + +1. **Gist content:** Does the gist capture the Python/pandas preference despite it being buried? +2. **Recall accuracy:** In Session 2, does the agent correctly apply the preference? +3. **A/B contrast:** Run Session 2 without gist memory to see the failure mode. diff --git a/demo/gist-memory/session1_script.md b/demo/gist-memory/session1_script.md new file mode 100644 index 0000000..966bb2a --- /dev/null +++ b/demo/gist-memory/session1_script.md @@ -0,0 +1,62 @@ +# Session 1: Preference Embedding + +Use these messages in order. The buried preference is in **Message 4**. + +--- + +**Message 1 (User):** +> I'm trying to understand how Kubernetes handles pod-to-pod networking across nodes. Can you explain how the CNI plugin architecture works? + +**Message 2 (Assistant):** +> [Detailed explanation of CNI plugin architecture, pod networking, veth pairs, bridge interfaces, etc.] + +**Message 3 (User):** +> That's helpful. How does this differ between Calico and Cilium? I've heard Cilium uses eBPF instead of iptables. + +**Message 4 (Assistant):** +> [Explanation comparing Calico's iptables-based approach vs Cilium's eBPF dataplane, performance characteristics, etc.] + +**Message 5 (User) — THE BURIED PREFERENCE:** +> That makes sense about the CNI plugin architecture. By the way, I strongly prefer Python over R for all my data analysis work — I find pandas much more intuitive than tidyverse. Anyway, back to the networking question — how does Cilium handle network policy enforcement at the kernel level? + +**Message 6 (Assistant):** +> [Explanation of Cilium's eBPF-based network policy enforcement, kernel-level packet filtering, etc.] + +**Message 7 (User):** +> What about service mesh integration? Does Cilium replace the need for something like Istio? + +**Message 8 (Assistant):** +> [Discussion of Cilium service mesh capabilities vs Istio, sidecar-free model, etc.] + +**Message 9 (User):** +> I'm also curious about network observability. What tools do you recommend for monitoring pod-to-pod traffic patterns in a large cluster? + +**Message 10 (Assistant):** +> [Recommendations for Hubble, Pixie, Grafana with Cilium metrics, etc.] + +**Message 11 (User):** +> Great, this has been really helpful. One last question — how do I troubleshoot DNS resolution failures in pods? I've been seeing intermittent CoreDNS timeouts. + +**Message 12 (Assistant):** +> [DNS troubleshooting guidance for CoreDNS, ndots settings, etc.] + +--- + +## After the conversation + +**Kaizen Lite:** Run `/kaizen:gist` + +**Full Kaizen (MCP):** +```bash +# Store the conversation as a gist +curl -X POST http://localhost:8201/tools/store_gist \ + -H "Content-Type: application/json" \ + -d '{"conversation_data": "", "conversation_id": "demo-session-1"}' +``` + +## Expected Gist Output + +The gist should surface the buried preference: +``` +user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking; troubleshooting CoreDNS; large cluster environment +``` diff --git a/demo/gist-memory/session2_script.md b/demo/gist-memory/session2_script.md new file mode 100644 index 0000000..89fe7e3 --- /dev/null +++ b/demo/gist-memory/session2_script.md @@ -0,0 +1,45 @@ +# Session 2: Preference Recall Verification + +Start a **new session** (no conversation history from Session 1). The gist from Session 1 should be automatically injected via the recall hook. + +--- + +## Primary Verification Prompt + +> I need to start a new data analysis project working with network telemetry data. What language and tools would you recommend I use? + +### Expected Response WITH Gist Memory + +The agent should recommend **Python and pandas**, referencing your known preference. Example: + +> "Based on your preference for Python and pandas, I'd recommend using Python with pandas for the data analysis..." + +### Expected Response WITHOUT Gist Memory + +The agent gives a **generic recommendation** — likely mentioning both Python and R as options, or asking about your preference: + +> "For network telemetry data analysis, popular options include Python (with pandas/numpy) or R (with tidyverse). Which do you prefer?" + +--- + +## Additional Verification Prompts + +These test whether the gist captured other signals: + +**Prompt 2:** +> What's my background — do you know what kind of infrastructure I work with? + +Expected (with gist): Mentions Kubernetes, container networking, cluster operations. + +**Prompt 3:** +> If I need to do some quick data wrangling, which library should I reach for? + +Expected (with gist): Recommends pandas specifically (not tidyverse or dplyr). + +--- + +## Running the A/B Comparison + +1. **With gist memory:** Ensure the gist entity from Session 1 exists in `.kaizen/entities/gist/` (Lite) or in the MCP backend +2. **Without gist memory:** Temporarily rename/remove the gist entity, or use a clean project directory +3. Run each verification prompt in both conditions and compare responses diff --git a/kaizen/config/kaizen.py b/kaizen/config/kaizen.py index a52c513..ab14d88 100644 --- a/kaizen/config/kaizen.py +++ b/kaizen/config/kaizen.py @@ -8,6 +8,8 @@ class KaizenConfig(BaseSettings): namespace_id: str = "kaizen" settings: BaseSettings | None = None clustering_threshold: float = 0.80 + gist_context_budget: int = 64000 + gist_trigger_interval: int = 5 # to reload settings call kaizen_config.__init__() diff --git a/kaizen/config/llm.py b/kaizen/config/llm.py index c30c3e1..0396d9c 100644 --- a/kaizen/config/llm.py +++ b/kaizen/config/llm.py @@ -25,6 +25,7 @@ class LLMSettings(BaseSettings): tips_model: str = Field(default_factory=_default_model_name) conflict_resolution_model: str = Field(default_factory=_default_model_name) fact_extraction_model: str = Field(default_factory=_default_model_name) + gist_model: str = Field(default_factory=_default_model_name) categorization_mode: Literal["predefined", "dynamic", "hybrid"] = "predefined" allow_dynamic_categories: bool = False confirm_new_categories: bool = False diff --git a/kaizen/frontend/client/kaizen_client.py b/kaizen/frontend/client/kaizen_client.py index 4302536..43e1704 100644 --- a/kaizen/frontend/client/kaizen_client.py +++ b/kaizen/frontend/client/kaizen_client.py @@ -1,9 +1,11 @@ import logging +import uuid from typing import Any from kaizen.backend.base import BaseEntityBackend from kaizen.config.kaizen import KaizenConfig from kaizen.llm.fact_extraction.fact_extraction import ExtractedFact, extract_facts_from_messages +from kaizen.llm.gist.gist import generate_gist from kaizen.schema.conflict_resolution import EntityUpdate from kaizen.schema.core import Entity, Namespace, RecordedEntity from kaizen.schema.exceptions import NamespaceAlreadyExistsException, NamespaceNotFoundException @@ -295,3 +297,135 @@ def retrieve_user_facts( ) return categorized_preferences + + # ── Gist memory ────────────────────────────────────────────────── + + def store_gists( + self, + namespace_id: str, + messages: list[dict], + conversation_id: str | None = None, + metadata: dict[str, Any] | None = None, + ) -> list[EntityUpdate]: + """Generate purpose-directed gists from conversation messages and store them. + + Implements rolling consolidation: deletes any existing gists for the same + conversation_id before storing new ones, so the latest gist always reflects + the full session. + """ + if not messages: + return [] + + conversation_id = conversation_id or str(uuid.uuid4()) + self.ensure_namespace(namespace_id) + + # Delete existing gists for this conversation (rolling replacement) + existing = self.search_entities( + namespace_id=namespace_id, + query=None, + filters={"type": "gist", "metadata.conversation_id": conversation_id}, + limit=100, + ) + for entity in existing: + try: + self.delete_entity_by_id(namespace_id, entity.id) + except Exception: + logger.warning("Failed to delete old gist %s during rolling replacement", entity.id, exc_info=True) + + # Generate gists + result = generate_gist(messages, conversation_id=conversation_id) + + if not result.gists: + return [] + + # Store gist entities + base_metadata: dict[str, Any] = dict(metadata or {}) + base_metadata["conversation_id"] = conversation_id + base_metadata["message_count"] = result.message_count + + gist_entities = [] + for i, gist_text in enumerate(result.gists): + gist_metadata = dict(base_metadata) + gist_metadata["chunk_index"] = i + gist_metadata["chunk_count"] = result.chunk_count + gist_entities.append(Entity(type="gist", content=gist_text, metadata=gist_metadata)) + + updates = self.update_entities(namespace_id, gist_entities, enable_conflict_resolution=False) + + # Store original messages as gist_source for durable retrieval + source_entities = [] + for i, msg in enumerate(messages): + content = msg.get("content", "") + if isinstance(content, list): + content = str(content) + source_entities.append( + Entity( + type="gist_source", + content=content, + metadata={ + "conversation_id": conversation_id, + "message_index": i, + "role": msg.get("role", "unknown"), + }, + ) + ) + + if source_entities: + # Delete existing sources for this conversation first + existing_sources = self.search_entities( + namespace_id=namespace_id, + query=None, + filters={"type": "gist_source", "metadata.conversation_id": conversation_id}, + limit=1000, + ) + for entity in existing_sources: + try: + self.delete_entity_by_id(namespace_id, entity.id) + except Exception: + logger.warning("Failed to delete old gist_source %s", entity.id, exc_info=True) + + self.update_entities(namespace_id, source_entities, enable_conflict_resolution=False) + + return updates + + def retrieve_gists( + self, + namespace_id: str, + query: str, + limit: int = 10, + ) -> list[RecordedEntity]: + """Retrieve gists relevant to a query via semantic search.""" + if not self.namespace_exists(namespace_id): + return [] + return self.search_entities( + namespace_id=namespace_id, + query=query, + filters={"type": "gist"}, + limit=limit, + ) + + def retrieve_gist_with_source( + self, + namespace_id: str, + query: str, + limit: int = 3, + ) -> list[dict[str, Any]]: + """Retrieve gists with their original source messages. + + Returns a list of dicts, each with 'gist' (RecordedEntity) and + 'source_messages' (list[RecordedEntity]) keys. + """ + gists = self.retrieve_gists(namespace_id, query=query, limit=limit) + results = [] + for gist in gists: + conversation_id = (gist.metadata or {}).get("conversation_id") + source_messages: list[RecordedEntity] = [] + if conversation_id: + source_messages = self.search_entities( + namespace_id=namespace_id, + query=None, + filters={"type": "gist_source", "metadata.conversation_id": conversation_id}, + limit=100, + ) + results.append({"gist": gist, "source_messages": source_messages}) + return results diff --git a/kaizen/frontend/mcp/mcp_server.py b/kaizen/frontend/mcp/mcp_server.py index fdbd65f..4c5b2ab 100644 --- a/kaizen/frontend/mcp/mcp_server.py +++ b/kaizen/frontend/mcp/mcp_server.py @@ -242,3 +242,71 @@ def delete_entity(entity_id: str) -> str: except KaizenException as e: logger.exception(f"Error deleting entity {entity_id}: {str(e)}") return json.dumps({"success": False, "error": str(e)}) + + +@mcp.tool() +def store_gist(conversation_data: str, conversation_id: str | None = None) -> str: + """ + Generate purpose-directed gists from a conversation and store them. + Gists are compressed representations optimized for answering questions about the user. + Uses rolling consolidation: re-calling with the same conversation_id replaces previous gists. + + Args: + conversation_data: A JSON formatted list of conversation messages (each with 'role' and 'content'). + conversation_id: Optional identifier for the conversation. If not provided, a UUID is generated. + + Returns: + JSON string with stored gist details. + """ + logger.info("Storing gist for conversation") + try: + messages = json.loads(conversation_data) + conversation_id = conversation_id or str(uuid.uuid4()) + + updates = get_client().store_gists( + namespace_id=kaizen_config.namespace_id, + messages=messages, + conversation_id=conversation_id, + ) + + return json.dumps({ + "success": True, + "conversation_id": conversation_id, + "gists_stored": len(updates), + "gists": [{"id": u.id, "content": u.content} for u in updates], + }) + except Exception as e: + logger.exception(f"Error storing gist: {e}") + return json.dumps({"success": False, "error": str(e)}) + + +@mcp.tool() +def get_gists(query: str, limit: int = 10) -> str: + """ + Retrieve stored conversation gists relevant to a query. + Gists are purpose-directed compressed representations of past conversations, + optimized for answering questions about the user. + + Args: + query: A description or question to search for relevant gists. + limit: Maximum number of gists to return. Defaults to 10. + + Returns: + Formatted string with relevant gists. + """ + logger.info(f"Getting gists for query: {query}") + results = get_client().retrieve_gists( + namespace_id=kaizen_config.namespace_id, + query=query, + limit=limit, + ) + + if not results: + return "No relevant gists found." + + response_lines = [f"# Conversation Gists for: {query}\n"] + for i, entity in enumerate(results, 1): + conversation_id = (entity.metadata or {}).get("conversation_id", "unknown") + response_lines.append(f"{i}. [conversation:{conversation_id}] {entity.content}") + + return "\n".join(response_lines) diff --git a/kaizen/llm/gist/__init__.py b/kaizen/llm/gist/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/kaizen/llm/gist/gist.py b/kaizen/llm/gist/gist.py new file mode 100644 index 0000000..e1f4d7e --- /dev/null +++ b/kaizen/llm/gist/gist.py @@ -0,0 +1,138 @@ +import json +import logging +from pathlib import Path + +import litellm +from jinja2 import Template +from litellm import completion, get_supported_openai_params, supports_response_schema + +from kaizen.config.kaizen import kaizen_config +from kaizen.config.llm import llm_settings +from kaizen.schema.gist import GistResponse, GistResult +from kaizen.utils.utils import clean_llm_response + +logger = logging.getLogger(__name__) + +_PROMPT_TEMPLATE = Template((Path(__file__).parent / "prompts/generate_gist.jinja2").read_text()) + + +def _estimate_tokens(text: str) -> int: + """Rough token estimate: ~4 chars per token.""" + return len(text) // 4 + + +def _chunk_messages(messages: list[dict], context_budget: int) -> list[list[dict]]: + """Split messages into chunks that fit within the context budget. + + Returns a list of message chunks. Most sessions will produce a single chunk. + """ + chunks: list[list[dict]] = [] + current_chunk: list[dict] = [] + current_tokens = 0 + + # Reserve tokens for prompt template + response + available_tokens = context_budget - 2000 + + for message in messages: + content = message.get("content", "") + if isinstance(content, list): + content = str(content) + msg_tokens = _estimate_tokens(str(content)) + + if current_chunk and (current_tokens + msg_tokens) > available_tokens: + chunks.append(current_chunk) + current_chunk = [] + current_tokens = 0 + + current_chunk.append(message) + current_tokens += msg_tokens + + if current_chunk: + chunks.append(current_chunk) + + return chunks + + +def _generate_single_gist(messages: list[dict], constrained_decoding_supported: bool) -> str | None: + """Generate a gist for a single chunk of messages. Returns the gist string or None on failure.""" + prompt = _PROMPT_TEMPLATE.render( + messages=messages, + constrained_decoding_supported=constrained_decoding_supported, + ) + + last_error = None + for _ in range(3): + try: + if constrained_decoding_supported: + litellm.enable_json_schema_validation = True + raw = ( + completion( + model=llm_settings.gist_model, + messages=[{"role": "user", "content": prompt}], + response_format=GistResponse, + custom_llm_provider=llm_settings.custom_llm_provider, + ) + .choices[0] + .message.content + ) + else: + litellm.enable_json_schema_validation = False + raw = ( + completion( + model=llm_settings.gist_model, + messages=[{"role": "user", "content": prompt}], + custom_llm_provider=llm_settings.custom_llm_provider, + ) + .choices[0] + .message.content + ) + raw = clean_llm_response(raw) + + if not raw: + logger.warning("LLM returned empty response for gist generation.") + return None + + parsed = GistResponse.model_validate(json.loads(raw)) + return parsed.gist + except Exception as exc: + last_error = exc + continue + + logger.warning(f"Failed to generate gist after 3 attempts: {last_error}") + return None + + +def generate_gist(messages: list[dict], conversation_id: str | None = None) -> GistResult: + """Generate purpose-directed gists from conversation messages. + + Messages are chunked based on the context budget. Each chunk produces one gist. + Most sessions fit in a single chunk, producing one consolidated gist. + """ + if not messages: + return GistResult(gists=[], conversation_id=conversation_id, message_count=0, chunk_count=0) + + supported_params = get_supported_openai_params( + model=llm_settings.gist_model, + custom_llm_provider=llm_settings.custom_llm_provider, + ) + supports_response_format = supported_params and "response_format" in supported_params + response_schema_enabled = supports_response_schema( + model=llm_settings.gist_model, + custom_llm_provider=llm_settings.custom_llm_provider, + ) + constrained_decoding_supported = supports_response_format and response_schema_enabled + + chunks = _chunk_messages(messages, kaizen_config.gist_context_budget) + gists: list[str] = [] + + for chunk in chunks: + gist = _generate_single_gist(chunk, constrained_decoding_supported) + if gist and gist != "no user signal": + gists.append(gist) + + return GistResult( + gists=gists, + conversation_id=conversation_id, + message_count=len(messages), + chunk_count=len(chunks), + ) diff --git a/kaizen/llm/gist/prompts/generate_gist.jinja2 b/kaizen/llm/gist/prompts/generate_gist.jinja2 new file mode 100644 index 0000000..fd5a708 --- /dev/null +++ b/kaizen/llm/gist/prompts/generate_gist.jinja2 @@ -0,0 +1,18 @@ +You are given messages from a conversation between a user and an AI agent. +Please create a gist of the conversation in a way that can be used to answer questions about the user. +The gist will be stored in a vector database and used to answer questions about the user. +Therefore, it can contain phrases and keywords and does not have to have complete sentences. +The gist is not intended to be read by humans. +You do not have to explain your reasoning. Just give me the gist. +If there is nothing notable about the user in the conversation, return "no user signal". +If you are not able to shorten the conversation, just give me the original messages. + +{% if not constrained_decoding_supported %} +Respond with a JSON object with a single key "gist" containing the gist string. +Do not include any other text or explanation outside the JSON. +{% endif %} + +Conversation: +{% for message in messages %} +{{ message.role }}: {{ message.content }} +{% endfor %} diff --git a/kaizen/schema/gist.py b/kaizen/schema/gist.py new file mode 100644 index 0000000..a4bd15f --- /dev/null +++ b/kaizen/schema/gist.py @@ -0,0 +1,19 @@ +from dataclasses import dataclass + +from pydantic import BaseModel, Field + + +class GistResponse(BaseModel): + """LLM response schema for gist generation.""" + + gist: str = Field(description="Purpose-directed gist of the conversation") + + +@dataclass(frozen=True) +class GistResult: + """Result from generate_gist(), containing one gist per chunk.""" + + gists: list[str] + conversation_id: str | None = None + message_count: int = 0 + chunk_count: int = 0 diff --git a/platform-integrations/claude/plugins/kaizen-lite/skills/gist/SKILL.md b/platform-integrations/claude/plugins/kaizen-lite/skills/gist/SKILL.md new file mode 100644 index 0000000..e28686f --- /dev/null +++ b/platform-integrations/claude/plugins/kaizen-lite/skills/gist/SKILL.md @@ -0,0 +1,89 @@ +--- +name: gist +description: Generate a purpose-directed gist of the current conversation optimized for remembering user preferences and attributes across sessions. +context: fork +--- + +# Gist Memory + +## Overview + +This skill generates a **purpose-directed gist** of the current conversation — a compressed representation optimized for answering questions about the user (preferences, behaviors, habits, attributes). Unlike generic summarization, purpose-directed gisting foregrounds user-relevant signal and discards topical noise, making it dramatically more effective for personalization in future sessions. + +The gist will be stored as an entity and automatically injected into future sessions via the recall hook. + +## Workflow + +### Step 1: Walk Through Conversation Messages + +Review all messages in the current conversation from start to finish. Collect all user and assistant messages as a list. + +### Step 2: Generate the Gist + +Create a gist of the conversation following these specific instructions: + +**You are creating a gist that will be stored in a vector database and used to answer questions about the user.** + +Therefore: +- **Focus on what the conversation reveals about the user** — their preferences, behaviors, habits, expertise, opinions, constraints, and attributes +- **It can contain phrases and keywords** — does not need complete sentences +- **It is not intended to be read by humans** — optimize for machine retrieval +- **Discard topical noise** — if the user discussed Kubernetes for 20 messages but mentioned preferring Python in one sentence, the Python preference is higher signal for the gist than the Kubernetes discussion +- If there is nothing notable about the user in the conversation, output "no user signal" and stop + +**Example:** A conversation about network routing where the user mentions "By the way, I strongly prefer Python over R for data analysis" should produce a gist like: +``` +user prefers Python over R for data analysis; mentioned during networking discussion +``` +NOT a summary of the networking discussion. + +### Step 3: Save the Gist + +Output the gist as a JSON entity and save it using the save_entities.py script: + +```bash +echo '' | python3 ${CLAUDE_PLUGIN_ROOT}/skills/learn/scripts/save_entities.py +``` + +The JSON format: +```json +{ + "entities": [ + { + "content": "", + "type": "gist", + "rationale": "Purpose-directed gist for personalization", + "trigger": "When answering questions about the user's preferences or attributes" + } + ] +} +``` + +### Step 4: Confirm + +Tell the user what was captured in the gist. Be brief — just list the user-relevant signals that were preserved. + +## Examples + +### Good Gist (purpose-directed) +Conversation: 20 messages about Kubernetes pod networking, one mention of preferring dark mode in IDEs +``` +user prefers dark mode in IDEs; works with Kubernetes networking; container orchestration context +``` + +### Bad Gist (topic-preserving summary) +``` +Discussion covered Kubernetes pod networking including CNI plugins, service mesh patterns, and ingress configuration. The user asked about Calico vs Cilium performance benchmarks. +``` +This is a topic summary, not a user-attribute gist. It would fail to surface the dark mode preference in future sessions. + +### Good Gist (multiple signals) +``` +user: senior backend engineer; prefers Go over Rust for systems work; uses Neovim; dislikes ORMs; team of 5; shipping deadline March 30 +``` + +### No-Signal Case +Conversation: User asks "What time is it?" and gets an answer. +``` +no user signal +``` diff --git a/platform-integrations/claude/plugins/kaizen-lite/skills/recall/scripts/retrieve_entities.py b/platform-integrations/claude/plugins/kaizen-lite/skills/recall/scripts/retrieve_entities.py index 9e2c2f6..e63b57d 100644 --- a/platform-integrations/claude/plugins/kaizen-lite/skills/recall/scripts/retrieve_entities.py +++ b/platform-integrations/claude/plugins/kaizen-lite/skills/recall/scripts/retrieve_entities.py @@ -37,24 +37,45 @@ def log(message): def format_entities(entities): """Format all entities for Claude to review.""" - header = """## Entities for this task + # Separate gists from other entities + gists = [e for e in entities if e.get("type") == "gist"] + other = [e for e in entities if e.get("type") != "gist"] + + sections = [] + + if other: + header = """## Entities for this task Review these entities and apply any relevant ones: """ - items = [] - for e in entities: - content = e.get("content") - if not content: - continue - item = f"- **[{e.get('type', 'general')}]** {content}" - if e.get("rationale"): - item += f"\n - _Rationale: {e['rationale']}_" - if e.get("trigger"): - item += f"\n - _When: {e['trigger']}_" - items.append(item) - - return header + "\n".join(items) + items = [] + for e in other: + content = e.get("content") + if not content: + continue + item = f"- **[{e.get('type', 'general')}]** {content}" + if e.get("rationale"): + item += f"\n - _Rationale: {e['rationale']}_" + if e.get("trigger"): + item += f"\n - _When: {e['trigger']}_" + items.append(item) + sections.append(header + "\n".join(items)) + + if gists: + gist_header = """## Conversation Gists + +These are gists from prior conversations, optimized for recalling user preferences and attributes: + +""" + gist_items = [] + for g in gists: + content = g.get("content") + if content: + gist_items.append(f"- {content}") + sections.append(gist_header + "\n".join(gist_items)) + + return "\n\n".join(sections) def main(): diff --git a/tests/unit/test_gist.py b/tests/unit/test_gist.py new file mode 100644 index 0000000..7ba000a --- /dev/null +++ b/tests/unit/test_gist.py @@ -0,0 +1,144 @@ +"""Tests for gist generation.""" + +import json +from unittest.mock import MagicMock, patch + +import pytest + +from kaizen.llm.gist.gist import _chunk_messages, _estimate_tokens, generate_gist +from kaizen.schema.gist import GistResult + + +@pytest.mark.unit +class TestEstimateTokens: + def test_empty_string(self): + assert _estimate_tokens("") == 0 + + def test_short_string(self): + assert _estimate_tokens("hello world") == 2 # 11 chars // 4 + + def test_long_string(self): + text = "a" * 1000 + assert _estimate_tokens(text) == 250 + + +@pytest.mark.unit +class TestChunkMessages: + def test_single_chunk_when_within_budget(self): + messages = [ + {"role": "user", "content": "Hello"}, + {"role": "assistant", "content": "Hi there"}, + ] + chunks = _chunk_messages(messages, context_budget=64000) + assert len(chunks) == 1 + assert chunks[0] == messages + + def test_splits_when_exceeds_budget(self): + # Each message ~250 tokens (1000 chars), budget 3000 tokens + # Available = 3000 - 2000 (reserved) = 1000 tokens, so ~4 messages per chunk + messages = [{"role": "user", "content": "x" * 1000} for _ in range(10)] + chunks = _chunk_messages(messages, context_budget=3000) + assert len(chunks) > 1 + # All messages accounted for + total = sum(len(chunk) for chunk in chunks) + assert total == 10 + + def test_empty_messages(self): + chunks = _chunk_messages([], context_budget=64000) + assert chunks == [] + + def test_single_large_message_gets_own_chunk(self): + # One huge message that exceeds budget on its own + messages = [ + {"role": "user", "content": "x" * 300000}, # ~75k tokens + {"role": "user", "content": "small"}, + ] + chunks = _chunk_messages(messages, context_budget=64000) + # The large message gets its own chunk, the small one gets another + assert len(chunks) == 2 + + +@pytest.mark.unit +class TestGenerateGist: + @patch("kaizen.llm.gist.gist.get_supported_openai_params") + @patch("kaizen.llm.gist.gist.supports_response_schema") + @patch("kaizen.llm.gist.gist.completion") + def test_generates_gist_from_messages(self, mock_completion, mock_supports, mock_params): + mock_params.return_value = ["response_format"] + mock_supports.return_value = True + + mock_response = MagicMock() + mock_response.choices[0].message.content = json.dumps({"gist": "user prefers Python for data analysis"}) + mock_completion.return_value = mock_response + + messages = [ + {"role": "user", "content": "I really prefer Python over R for data work."}, + {"role": "assistant", "content": "Got it, Python it is."}, + ] + result = generate_gist(messages, conversation_id="test-123") + + assert isinstance(result, GistResult) + assert len(result.gists) == 1 + assert "Python" in result.gists[0] + assert result.conversation_id == "test-123" + assert result.message_count == 2 + assert result.chunk_count == 1 + + @patch("kaizen.llm.gist.gist.get_supported_openai_params") + @patch("kaizen.llm.gist.gist.supports_response_schema") + @patch("kaizen.llm.gist.gist.completion") + def test_returns_empty_on_no_user_signal(self, mock_completion, mock_supports, mock_params): + mock_params.return_value = ["response_format"] + mock_supports.return_value = True + + mock_response = MagicMock() + mock_response.choices[0].message.content = json.dumps({"gist": "no user signal"}) + mock_completion.return_value = mock_response + + messages = [ + {"role": "user", "content": "What time is it?"}, + {"role": "assistant", "content": "It's 3pm."}, + ] + result = generate_gist(messages) + + assert result.gists == [] + + def test_empty_messages_returns_empty(self): + result = generate_gist([]) + assert result.gists == [] + assert result.message_count == 0 + + @patch("kaizen.llm.gist.gist.get_supported_openai_params") + @patch("kaizen.llm.gist.gist.supports_response_schema") + @patch("kaizen.llm.gist.gist.completion") + def test_retries_on_parse_failure(self, mock_completion, mock_supports, mock_params): + mock_params.return_value = ["response_format"] + mock_supports.return_value = True + + # First two calls fail, third succeeds + bad_response = MagicMock() + bad_response.choices[0].message.content = "not json" + + good_response = MagicMock() + good_response.choices[0].message.content = json.dumps({"gist": "user likes cats"}) + + mock_completion.side_effect = [bad_response, bad_response, good_response] + + result = generate_gist([{"role": "user", "content": "I love cats"}]) + assert len(result.gists) == 1 + assert "cats" in result.gists[0] + + @patch("kaizen.llm.gist.gist.get_supported_openai_params") + @patch("kaizen.llm.gist.gist.supports_response_schema") + @patch("kaizen.llm.gist.gist.completion") + def test_fallback_without_constrained_decoding(self, mock_completion, mock_supports, mock_params): + mock_params.return_value = [] # No response_format support + mock_supports.return_value = False + + mock_response = MagicMock() + mock_response.choices[0].message.content = json.dumps({"gist": "user is a backend engineer"}) + mock_completion.return_value = mock_response + + result = generate_gist([{"role": "user", "content": "I work on backend systems"}]) + assert len(result.gists) == 1 + assert "backend" in result.gists[0]