feat: add purpose-directed gist memory for user personalization#102
feat: add purpose-directed gist memory for user personalization#102jayaramkr wants to merge 2 commits intoAgentToolkit:mainfrom
Conversation
Implement Innovation 1 & 2 from the gist memory disclosure: storage-optimized and use-optimized gisting for conversation memory. Core module (kaizen/llm/gist/): - generate_gist() with rolling cumulative chunking (context budget: 64k) - Purpose-directed Jinja2 prompt optimized for extracting user attributes - 3-retry pattern with constrained decoding support - "no user signal" filtering for low-signal conversations Schema & config: - GistResponse (Pydantic) + GistResult (frozen dataclass) - KAIZEN_GIST_MODEL, KAIZEN_GIST_CONTEXT_BUDGET, KAIZEN_GIST_TRIGGER_INTERVAL KaizenClient methods: - store_gists(): rolling consolidation (delete-and-replace per conversation_id) - retrieve_gists(): semantic search over gist entities - retrieve_gist_with_source(): gists paired with original messages MCP server tools: - store_gist: generate and store gists from conversation JSON - get_gists: retrieve relevant gists by semantic query Claude Code plugin (Kaizen Lite): - /kaizen:gist skill for inline gist generation - Recall hook extended to inject gists in a separate section Demo scenario (demo/gist-memory/): - Buried preference recall test (Section 9.1 probe from disclosure) - Session 1: preference embedded in unrelated K8s conversation - Session 2: preference recall verification prompts
📝 WalkthroughWalkthroughThis pull request introduces a "gist memory" feature that extracts and stores user preferences buried in multi-turn conversations, enabling later recall via semantic search. The implementation spans configuration, client APIs, LLM-based generation, MCP tools, plugin integration, documentation, and tests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
kaizen/llm/gist/prompts/generate_gist.jinja2 (1)
7-8: Consider the fallback behavior for unshortened conversations.Line 8 instructs the LLM to return original messages if it cannot shorten the conversation. For chunked conversations near the context budget limit, this could result in gists that are nearly as large as the input, potentially defeating the purpose of gisting and causing storage bloat.
Consider whether a different fallback (e.g., a minimal metadata-only response, or explicitly filtering out such chunks) would better serve the storage-optimization goal.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kaizen/llm/gist/prompts/generate_gist.jinja2` around lines 7 - 8, The current fallback in the generate_gist.jinja2 prompt ("If you are not able to shorten the conversation, just give me the original messages.") can produce gists as large as the input; change the fallback to return a minimal metadata-only response or explicitly mark/drop such chunks to avoid storage bloat. Update the template (generate_gist.jinja2) to replace the "just give me the original messages" instruction with a clear alternative such as "if you cannot shorten the conversation, return only a minimal metadata placeholder (e.g., 'unshortened_chunk' plus participant IDs and timestamps) or mark the chunk to be skipped" so downstream code that consumes the prompt can either store the small metadata placeholder or drop the chunk rather than storing the full original messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@demo/gist-memory/README.md`:
- Around line 47-49: The fenced code block in the README.md currently has no
language tag which triggers MD040; update the block that contains "user prefers
Python over R for data analysis; finds pandas more intuitive than tidyverse;
works with Kubernetes networking (Cilium, CNI plugins)" by adding a language
specifier (e.g., change the opening ``` to ```text or ```plaintext) so the
markdown linter treats it as plain text and the MD040 warning is resolved.
In `@demo/gist-memory/session1_script.md`:
- Around line 1-5: The header text incorrectly references "Message 4" as
containing the buried preference; update that reference to the correct message
number ("Message 5" or "Message 5 (User)") in the "Session 1: Preference
Embedding" block so the line that currently reads the buried preference is in
Message 4 matches the actual buried preference location (Message 5 (User) at
line 19).
- Around line 57-62: The fenced code block under "Expected Gist Output" lacks a
language specifier; update the block fence that wraps the expected gist (the
triple-backtick block containing "user prefers Python over R...") to include a
language tag (e.g., change ``` to ```text or ```txt) so the Markdown linter no
longer flags it.
In `@kaizen/frontend/client/kaizen_client.py`:
- Around line 303-389: The rolling replacement in store_gists is not atomic: the
search/delete/insert sequence (search_entities -> delete_entity_by_id ->
update_entities) can interleave across concurrent calls using the same
conversation_id and produce duplicate gist entities; update the store_gists
docstring (and add an inline comment above the delete+insert block) to
explicitly state this non-atomic behavior, give the concurrency
example/acceptable degradation, and note the assumption that the current
single-user MCP context accepts possible duplicate gists rather than
implementing locking or transactional semantics.
In `@kaizen/frontend/mcp/mcp_server.py`:
- Around line 272-277: The return JSON block containing conversation_id and
gists uses formatting that fails Ruff; fix by reformatting the function
containing that return (the block referencing conversation_id, updates, and the
list comprehension {"id": u.id, "content": u.content} for u in updates) to
comply with Ruff style—either run `ruff format
kaizen/frontend/mcp/mcp_server.py` or apply the equivalent formatting changes so
the return dict and list comprehension are properly spaced and wrapped.
In `@kaizen/llm/gist/gist.py`:
- Around line 114-123: The variable constrained_decoding_supported can be
non-boolean because supported_params may be a list; update the computation to
yield a strict bool before passing to _generate_single_gist: explicitly compute
supports_response_format by checking that supported_params is truthy and that
"response_format" is in it, compute response_schema_enabled via
supports_response_schema, then set constrained_decoding_supported =
bool(supports_response_format and response_schema_enabled) (or use an explicit
isinstance/boolean check) so the value is always a bool when used by
_generate_single_gist; refer to get_supported_openai_params, supported_params,
supports_response_format, supports_response_schema, response_schema_enabled, and
constrained_decoding_supported to locate the change.
---
Nitpick comments:
In `@kaizen/llm/gist/prompts/generate_gist.jinja2`:
- Around line 7-8: The current fallback in the generate_gist.jinja2 prompt ("If
you are not able to shorten the conversation, just give me the original
messages.") can produce gists as large as the input; change the fallback to
return a minimal metadata-only response or explicitly mark/drop such chunks to
avoid storage bloat. Update the template (generate_gist.jinja2) to replace the
"just give me the original messages" instruction with a clear alternative such
as "if you cannot shorten the conversation, return only a minimal metadata
placeholder (e.g., 'unshortened_chunk' plus participant IDs and timestamps) or
mark the chunk to be skipped" so downstream code that consumes the prompt can
either store the small metadata placeholder or drop the chunk rather than
storing the full original messages.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8b964908-1475-4200-ab42-57238d49f8af
📒 Files selected for processing (14)
demo/gist-memory/README.mddemo/gist-memory/session1_script.mddemo/gist-memory/session2_script.mdkaizen/config/kaizen.pykaizen/config/llm.pykaizen/frontend/client/kaizen_client.pykaizen/frontend/mcp/mcp_server.pykaizen/llm/gist/__init__.pykaizen/llm/gist/gist.pykaizen/llm/gist/prompts/generate_gist.jinja2kaizen/schema/gist.pyplatform-integrations/claude/plugins/kaizen-lite/skills/gist/SKILL.mdplatform-integrations/claude/plugins/kaizen-lite/skills/recall/scripts/retrieve_entities.pytests/unit/test_gist.py
| ``` | ||
| user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking (Cilium, CNI plugins) | ||
| ``` |
There was a problem hiding this comment.
Add a language specifier to the fenced code block.
The code block is missing a language identifier, which triggers a markdownlint warning (MD040). Since this shows plain text output, use text or plaintext.
📝 Suggested fix
-```
+```text
user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking (Cilium, CNI plugins)</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 47-47: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@demo/gist-memory/README.md` around lines 47 - 49, The fenced code block in
the README.md currently has no language tag which triggers MD040; update the
block that contains "user prefers Python over R for data analysis; finds pandas
more intuitive than tidyverse; works with Kubernetes networking (Cilium, CNI
plugins)" by adding a language specifier (e.g., change the opening ``` to
```text or ```plaintext) so the markdown linter treats it as plain text and the
MD040 warning is resolved.
| # Session 1: Preference Embedding | ||
|
|
||
| Use these messages in order. The buried preference is in **Message 4**. | ||
|
|
||
| --- |
There was a problem hiding this comment.
Fix inconsistent message number reference.
Line 3 states the buried preference is in "Message 4", but the actual buried preference appears in "Message 5 (User)" at line 19.
📝 Proposed fix
# Session 1: Preference Embedding
-Use these messages in order. The buried preference is in **Message 4**.
+Use these messages in order. The buried preference is in **Message 5**.
---📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Session 1: Preference Embedding | |
| Use these messages in order. The buried preference is in **Message 4**. | |
| --- | |
| # Session 1: Preference Embedding | |
| Use these messages in order. The buried preference is in **Message 5**. | |
| --- |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@demo/gist-memory/session1_script.md` around lines 1 - 5, The header text
incorrectly references "Message 4" as containing the buried preference; update
that reference to the correct message number ("Message 5" or "Message 5 (User)")
in the "Session 1: Preference Embedding" block so the line that currently reads
the buried preference is in Message 4 matches the actual buried preference
location (Message 5 (User) at line 19).
| ## Expected Gist Output | ||
|
|
||
| The gist should surface the buried preference: | ||
| ``` | ||
| user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking; troubleshooting CoreDNS; large cluster environment | ||
| ``` |
There was a problem hiding this comment.
Add language specifier to the expected output code block.
The fenced code block is missing a language specifier, which triggers a Markdown lint warning.
📝 Proposed fix
## Expected Gist Output
The gist should surface the buried preference:
-```
+```text
user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking; troubleshooting CoreDNS; large cluster environment</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.21.0)</summary>
[warning] 60-60: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @demo/gist-memory/session1_script.md around lines 57 - 62, The fenced code
block under "Expected Gist Output" lacks a language specifier; update the block
fence that wraps the expected gist (the triple-backtick block containing "user
prefers Python over R...") to include a language tag (e.g., change totext or ```txt) so the Markdown linter no longer flags it.
</details>
<!-- fingerprinting:phantom:poseidon:ocelot -->
<!-- This is an auto-generated comment by CodeRabbit -->
| def store_gists( | ||
| self, | ||
| namespace_id: str, | ||
| messages: list[dict], | ||
| conversation_id: str | None = None, | ||
| metadata: dict[str, Any] | None = None, | ||
| ) -> list[EntityUpdate]: | ||
| """Generate purpose-directed gists from conversation messages and store them. | ||
|
|
||
| Implements rolling consolidation: deletes any existing gists for the same | ||
| conversation_id before storing new ones, so the latest gist always reflects | ||
| the full session. | ||
| """ | ||
| if not messages: | ||
| return [] | ||
|
|
||
| conversation_id = conversation_id or str(uuid.uuid4()) | ||
| self.ensure_namespace(namespace_id) | ||
|
|
||
| # Delete existing gists for this conversation (rolling replacement) | ||
| existing = self.search_entities( | ||
| namespace_id=namespace_id, | ||
| query=None, | ||
| filters={"type": "gist", "metadata.conversation_id": conversation_id}, | ||
| limit=100, | ||
| ) | ||
| for entity in existing: | ||
| try: | ||
| self.delete_entity_by_id(namespace_id, entity.id) | ||
| except Exception: | ||
| logger.warning("Failed to delete old gist %s during rolling replacement", entity.id, exc_info=True) | ||
|
|
||
| # Generate gists | ||
| result = generate_gist(messages, conversation_id=conversation_id) | ||
|
|
||
| if not result.gists: | ||
| return [] | ||
|
|
||
| # Store gist entities | ||
| base_metadata: dict[str, Any] = dict(metadata or {}) | ||
| base_metadata["conversation_id"] = conversation_id | ||
| base_metadata["message_count"] = result.message_count | ||
|
|
||
| gist_entities = [] | ||
| for i, gist_text in enumerate(result.gists): | ||
| gist_metadata = dict(base_metadata) | ||
| gist_metadata["chunk_index"] = i | ||
| gist_metadata["chunk_count"] = result.chunk_count | ||
| gist_entities.append(Entity(type="gist", content=gist_text, metadata=gist_metadata)) | ||
|
|
||
| updates = self.update_entities(namespace_id, gist_entities, enable_conflict_resolution=False) | ||
|
|
||
| # Store original messages as gist_source for durable retrieval | ||
| source_entities = [] | ||
| for i, msg in enumerate(messages): | ||
| content = msg.get("content", "") | ||
| if isinstance(content, list): | ||
| content = str(content) | ||
| source_entities.append( | ||
| Entity( | ||
| type="gist_source", | ||
| content=content, | ||
| metadata={ | ||
| "conversation_id": conversation_id, | ||
| "message_index": i, | ||
| "role": msg.get("role", "unknown"), | ||
| }, | ||
| ) | ||
| ) | ||
|
|
||
| if source_entities: | ||
| # Delete existing sources for this conversation first | ||
| existing_sources = self.search_entities( | ||
| namespace_id=namespace_id, | ||
| query=None, | ||
| filters={"type": "gist_source", "metadata.conversation_id": conversation_id}, | ||
| limit=1000, | ||
| ) | ||
| for entity in existing_sources: | ||
| try: | ||
| self.delete_entity_by_id(namespace_id, entity.id) | ||
| except Exception: | ||
| logger.warning("Failed to delete old gist_source %s", entity.id, exc_info=True) | ||
|
|
||
| self.update_entities(namespace_id, source_entities, enable_conflict_resolution=False) | ||
|
|
||
| return updates |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there are any other callers of store_gists that might invoke it concurrently
rg -n "store_gists" --type=py -C2Repository: AgentToolkit/kaizen
Length of output: 802
🏁 Script executed:
cat -n kaizen/frontend/mcp/mcp_server.py | sed -n '260,280p'Repository: AgentToolkit/kaizen
Length of output: 949
🏁 Script executed:
# Also check if there are any async/concurrent patterns in the MCP server
rg -n "async|concurrent|thread|lock|pool" kaizen/frontend/mcp/mcp_server.pyRepository: AgentToolkit/kaizen
Length of output: 152
🏁 Script executed:
# Check the broader context of how the MCP server handles requests
head -50 kaizen/frontend/mcp/mcp_server.pyRepository: AgentToolkit/kaizen
Length of output: 1696
🏁 Script executed:
# Check the backend implementation for any locking mechanisms
rg -n "def delete_entity_by_id|def search_entities|def update_entities" kaizen/frontend/client/kaizen_client.py -A 5Repository: AgentToolkit/kaizen
Length of output: 1085
🏁 Script executed:
# Check if there's a storage backend with its own locking
find . -name "*.py" -type f | xargs grep -l "delete_entity_by_id" | head -5Repository: AgentToolkit/kaizen
Length of output: 194
🏁 Script executed:
# Check the filesystem backend for delete and search implementations
rg -n "def delete_entity_by_id|def search_entities|def update_entities|lock|Lock" kaizen/backend/filesystem.py -B 2 -A 8Repository: AgentToolkit/kaizen
Length of output: 5080
🏁 Script executed:
# Get size of filesystem backend to decide how much to read
wc -l kaizen/backend/filesystem.pyRepository: AgentToolkit/kaizen
Length of output: 95
Document the non-atomic nature of the rolling replacement logic in store_gists.
The search-delete-insert sequence is not atomic. While the backend uses per-operation locking, concurrent calls with the same conversation_id can interleave, potentially resulting in duplicate gists. For example:
- Thread A searches and finds gist X
- Thread B searches and finds gist X
- Thread A deletes X, inserts Y
- Thread B deletes nothing (X already gone), inserts Z
Result: both Y and Z coexist for the same conversation.
This is likely acceptable for the current single-user MCP tool context, but should be documented in the docstring or with a code comment explaining the assumption and acceptable degradation mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@kaizen/frontend/client/kaizen_client.py` around lines 303 - 389, The rolling
replacement in store_gists is not atomic: the search/delete/insert sequence
(search_entities -> delete_entity_by_id -> update_entities) can interleave
across concurrent calls using the same conversation_id and produce duplicate
gist entities; update the store_gists docstring (and add an inline comment above
the delete+insert block) to explicitly state this non-atomic behavior, give the
concurrency example/acceptable degradation, and note the assumption that the
current single-user MCP context accepts possible duplicate gists rather than
implementing locking or transactional semantics.
| return json.dumps({ | ||
| "success": True, | ||
| "conversation_id": conversation_id, | ||
| "gists_stored": len(updates), | ||
| "gists": [{"id": u.id, "content": u.content} for u in updates], | ||
| }) |
There was a problem hiding this comment.
Ruff formatting check failed.
The pipeline indicates this file needs reformatting. Run ruff format kaizen/frontend/mcp/mcp_server.py to fix.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@kaizen/frontend/mcp/mcp_server.py` around lines 272 - 277, The return JSON
block containing conversation_id and gists uses formatting that fails Ruff; fix
by reformatting the function containing that return (the block referencing
conversation_id, updates, and the list comprehension {"id": u.id, "content":
u.content} for u in updates) to comply with Ruff style—either run `ruff format
kaizen/frontend/mcp/mcp_server.py` or apply the equivalent formatting changes so
the return dict and list comprehension are properly spaced and wrapped.
| supported_params = get_supported_openai_params( | ||
| model=llm_settings.gist_model, | ||
| custom_llm_provider=llm_settings.custom_llm_provider, | ||
| ) | ||
| supports_response_format = supported_params and "response_format" in supported_params | ||
| response_schema_enabled = supports_response_schema( | ||
| model=llm_settings.gist_model, | ||
| custom_llm_provider=llm_settings.custom_llm_provider, | ||
| ) | ||
| constrained_decoding_supported = supports_response_format and response_schema_enabled |
There was a problem hiding this comment.
Fix type narrowing for constrained_decoding_supported.
The pipeline reports a mypy error: constrained_decoding_supported has type list[Any] | bool | None but _generate_single_gist expects bool. The and expression doesn't guarantee a boolean result.
🐛 Proposed fix
supported_params = get_supported_openai_params(
model=llm_settings.gist_model,
custom_llm_provider=llm_settings.custom_llm_provider,
)
- supports_response_format = supported_params and "response_format" in supported_params
+ supports_response_format = bool(supported_params and "response_format" in supported_params)
response_schema_enabled = supports_response_schema(
model=llm_settings.gist_model,
custom_llm_provider=llm_settings.custom_llm_provider,
)
- constrained_decoding_supported = supports_response_format and response_schema_enabled
+ constrained_decoding_supported = bool(supports_response_format and response_schema_enabled)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| supported_params = get_supported_openai_params( | |
| model=llm_settings.gist_model, | |
| custom_llm_provider=llm_settings.custom_llm_provider, | |
| ) | |
| supports_response_format = supported_params and "response_format" in supported_params | |
| response_schema_enabled = supports_response_schema( | |
| model=llm_settings.gist_model, | |
| custom_llm_provider=llm_settings.custom_llm_provider, | |
| ) | |
| constrained_decoding_supported = supports_response_format and response_schema_enabled | |
| supported_params = get_supported_openai_params( | |
| model=llm_settings.gist_model, | |
| custom_llm_provider=llm_settings.custom_llm_provider, | |
| ) | |
| supports_response_format = bool(supported_params and "response_format" in supported_params) | |
| response_schema_enabled = supports_response_schema( | |
| model=llm_settings.gist_model, | |
| custom_llm_provider=llm_settings.custom_llm_provider, | |
| ) | |
| constrained_decoding_supported = bool(supports_response_format and response_schema_enabled) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@kaizen/llm/gist/gist.py` around lines 114 - 123, The variable
constrained_decoding_supported can be non-boolean because supported_params may
be a list; update the computation to yield a strict bool before passing to
_generate_single_gist: explicitly compute supports_response_format by checking
that supported_params is truthy and that "response_format" is in it, compute
response_schema_enabled via supports_response_schema, then set
constrained_decoding_supported = bool(supports_response_format and
response_schema_enabled) (or use an explicit isinstance/boolean check) so the
value is always a bool when used by _generate_single_gist; refer to
get_supported_openai_params, supported_params, supports_response_format,
supports_response_schema, response_schema_enabled, and
constrained_decoding_supported to locate the change.
|
can we hold off merging this for now? |
Implement Innovation 1 & 2 from the gist memory disclosure: storage-optimized and use-optimized gisting for conversation memory.
Core module (kaizen/llm/gist/):
Schema & config:
KaizenClient methods:
MCP server tools:
Claude Code plugin (Kaizen Lite):
Demo scenario (demo/gist-memory/):
Summary by CodeRabbit
New Features
Documentation
Configuration
Tests