Skip to content

feat: add purpose-directed gist memory for user personalization#102

Open
jayaramkr wants to merge 2 commits intoAgentToolkit:mainfrom
jayaramkr:feat/gist-memory
Open

feat: add purpose-directed gist memory for user personalization#102
jayaramkr wants to merge 2 commits intoAgentToolkit:mainfrom
jayaramkr:feat/gist-memory

Conversation

@jayaramkr
Copy link
Copy Markdown
Collaborator

@jayaramkr jayaramkr commented Mar 23, 2026

Implement Innovation 1 & 2 from the gist memory disclosure: storage-optimized and use-optimized gisting for conversation memory.

Core module (kaizen/llm/gist/):

  • generate_gist() with rolling cumulative chunking (context budget: 64k)
  • Purpose-directed Jinja2 prompt optimized for extracting user attributes
  • 3-retry pattern with constrained decoding support
  • "no user signal" filtering for low-signal conversations

Schema & config:

  • GistResponse (Pydantic) + GistResult (frozen dataclass)
  • KAIZEN_GIST_MODEL, KAIZEN_GIST_CONTEXT_BUDGET, KAIZEN_GIST_TRIGGER_INTERVAL

KaizenClient methods:

  • store_gists(): rolling consolidation (delete-and-replace per conversation_id)
  • retrieve_gists(): semantic search over gist entities
  • retrieve_gist_with_source(): gists paired with original messages

MCP server tools:

  • store_gist: generate and store gists from conversation JSON
  • get_gists: retrieve relevant gists by semantic query

Claude Code plugin (Kaizen Lite):

  • /kaizen:gist skill for inline gist generation
  • Recall hook extended to inject gists in a separate section

Demo scenario (demo/gist-memory/):

  • Buried preference recall test (Section 9.1 probe from disclosure)
  • Session 1: preference embedded in unrelated K8s conversation
  • Session 2: preference recall verification prompts

Summary by CodeRabbit

  • New Features

    • Gist Memory: capture and store user preferences and signals from conversations
    • New MCP tools for storing and retrieving conversation gists
    • Claude plugin skill for automated gist generation and extraction
  • Documentation

    • Complete Gist Memory demo guide with setup instructions and verification workflows
    • Gist skill specification for Claude plugins
  • Configuration

    • Added configurable gist context budget and trigger interval settings
  • Tests

    • Comprehensive unit tests for gist generation functionality

JAYARAM RADHAKRISHNAN and others added 2 commits March 23, 2026 12:49
Implement Innovation 1 & 2 from the gist memory disclosure:
storage-optimized and use-optimized gisting for conversation memory.

Core module (kaizen/llm/gist/):
- generate_gist() with rolling cumulative chunking (context budget: 64k)
- Purpose-directed Jinja2 prompt optimized for extracting user attributes
- 3-retry pattern with constrained decoding support
- "no user signal" filtering for low-signal conversations

Schema & config:
- GistResponse (Pydantic) + GistResult (frozen dataclass)
- KAIZEN_GIST_MODEL, KAIZEN_GIST_CONTEXT_BUDGET, KAIZEN_GIST_TRIGGER_INTERVAL

KaizenClient methods:
- store_gists(): rolling consolidation (delete-and-replace per conversation_id)
- retrieve_gists(): semantic search over gist entities
- retrieve_gist_with_source(): gists paired with original messages

MCP server tools:
- store_gist: generate and store gists from conversation JSON
- get_gists: retrieve relevant gists by semantic query

Claude Code plugin (Kaizen Lite):
- /kaizen:gist skill for inline gist generation
- Recall hook extended to inject gists in a separate section

Demo scenario (demo/gist-memory/):
- Buried preference recall test (Section 9.1 probe from disclosure)
- Session 1: preference embedded in unrelated K8s conversation
- Session 2: preference recall verification prompts
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a "gist memory" feature that extracts and stores user preferences buried in multi-turn conversations, enabling later recall via semantic search. The implementation spans configuration, client APIs, LLM-based generation, MCP tools, plugin integration, documentation, and tests.

Changes

Cohort / File(s) Summary
Configuration
kaizen/config/kaizen.py, kaizen/config/llm.py
Added gist_context_budget (64000 tokens) and gist_trigger_interval (5) settings to KaizenConfig, plus gist_model field to LLMSettings to configure model used for gist generation.
Core Gist Generation
kaizen/llm/gist/gist.py, kaizen/schema/gist.py, kaizen/llm/gist/prompts/generate_gist.jinja2
Implemented generate_gist() function with token estimation, message chunking, and LLM-based extraction of user signals; added GistResponse Pydantic model and GistResult dataclass for structured output; created Jinja2 prompt template instructing models to produce vector-database-ready gists.
Client API Layer
kaizen/frontend/client/kaizen_client.py
Added store_gists() (persists gist and source entities with rolling replacement), retrieve_gists() (semantic search), and retrieve_gist_with_source() (returns gists with associated source messages) methods to KaizenClient.
MCP Server Integration
kaizen/frontend/mcp/mcp_server.py
Added store_gist and get_gists FastMCP tools wrapping the client layer, enabling conversation gist persistence and retrieval via the MCP protocol.
Plugin Integration
platform-integrations/claude/plugins/kaizen-lite/skills/gist/SKILL.md, platform-integrations/claude/plugins/kaizen-lite/skills/recall/scripts/retrieve_entities.py
Introduced "gist" skill definition for Claude plugin; updated retrieve_entities.py to format gist entities in a separate "Conversation Gists" section during recall.
Demo & Documentation
demo/gist-memory/README.md, demo/gist-memory/session1_script.md, demo/gist-memory/session2_script.md
Added demo scenario ("Buried Preference Recall") with setup instructions for both Lite and Full Kaizen paths, two-session workflow (preference embedding, then verification), and expected gist output examples.
Tests
tests/unit/test_gist.py
Comprehensive unit tests validating token estimation, message chunking, gist generation with and without JSON-schema support, retry logic, and empty input handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

  • move agent integrations into new folder #95: Modifies platform-integrations/claude/plugins/kaizen-lite/skills/recall/scripts/retrieve_entities.py (same file updated here) with related changes to format_entities behavior for entity categorization.

Suggested reviewers

  • visahak
  • illeatmyhat
  • vinodmut

Poem

🐰 A gist of wisdom, tucked away deep,
In conversation's weaving, secrets we keep.
Buried preferences now resurface with care,
Vector dreams help memories find their lair!
─ Hoppy Revision

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 48.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a gist memory feature for user personalization across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
kaizen/llm/gist/prompts/generate_gist.jinja2 (1)

7-8: Consider the fallback behavior for unshortened conversations.

Line 8 instructs the LLM to return original messages if it cannot shorten the conversation. For chunked conversations near the context budget limit, this could result in gists that are nearly as large as the input, potentially defeating the purpose of gisting and causing storage bloat.

Consider whether a different fallback (e.g., a minimal metadata-only response, or explicitly filtering out such chunks) would better serve the storage-optimization goal.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kaizen/llm/gist/prompts/generate_gist.jinja2` around lines 7 - 8, The current
fallback in the generate_gist.jinja2 prompt ("If you are not able to shorten the
conversation, just give me the original messages.") can produce gists as large
as the input; change the fallback to return a minimal metadata-only response or
explicitly mark/drop such chunks to avoid storage bloat. Update the template
(generate_gist.jinja2) to replace the "just give me the original messages"
instruction with a clear alternative such as "if you cannot shorten the
conversation, return only a minimal metadata placeholder (e.g.,
'unshortened_chunk' plus participant IDs and timestamps) or mark the chunk to be
skipped" so downstream code that consumes the prompt can either store the small
metadata placeholder or drop the chunk rather than storing the full original
messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@demo/gist-memory/README.md`:
- Around line 47-49: The fenced code block in the README.md currently has no
language tag which triggers MD040; update the block that contains "user prefers
Python over R for data analysis; finds pandas more intuitive than tidyverse;
works with Kubernetes networking (Cilium, CNI plugins)" by adding a language
specifier (e.g., change the opening ``` to ```text or ```plaintext) so the
markdown linter treats it as plain text and the MD040 warning is resolved.

In `@demo/gist-memory/session1_script.md`:
- Around line 1-5: The header text incorrectly references "Message 4" as
containing the buried preference; update that reference to the correct message
number ("Message 5" or "Message 5 (User)") in the "Session 1: Preference
Embedding" block so the line that currently reads the buried preference is in
Message 4 matches the actual buried preference location (Message 5 (User) at
line 19).
- Around line 57-62: The fenced code block under "Expected Gist Output" lacks a
language specifier; update the block fence that wraps the expected gist (the
triple-backtick block containing "user prefers Python over R...") to include a
language tag (e.g., change ``` to ```text or ```txt) so the Markdown linter no
longer flags it.

In `@kaizen/frontend/client/kaizen_client.py`:
- Around line 303-389: The rolling replacement in store_gists is not atomic: the
search/delete/insert sequence (search_entities -> delete_entity_by_id ->
update_entities) can interleave across concurrent calls using the same
conversation_id and produce duplicate gist entities; update the store_gists
docstring (and add an inline comment above the delete+insert block) to
explicitly state this non-atomic behavior, give the concurrency
example/acceptable degradation, and note the assumption that the current
single-user MCP context accepts possible duplicate gists rather than
implementing locking or transactional semantics.

In `@kaizen/frontend/mcp/mcp_server.py`:
- Around line 272-277: The return JSON block containing conversation_id and
gists uses formatting that fails Ruff; fix by reformatting the function
containing that return (the block referencing conversation_id, updates, and the
list comprehension {"id": u.id, "content": u.content} for u in updates) to
comply with Ruff style—either run `ruff format
kaizen/frontend/mcp/mcp_server.py` or apply the equivalent formatting changes so
the return dict and list comprehension are properly spaced and wrapped.

In `@kaizen/llm/gist/gist.py`:
- Around line 114-123: The variable constrained_decoding_supported can be
non-boolean because supported_params may be a list; update the computation to
yield a strict bool before passing to _generate_single_gist: explicitly compute
supports_response_format by checking that supported_params is truthy and that
"response_format" is in it, compute response_schema_enabled via
supports_response_schema, then set constrained_decoding_supported =
bool(supports_response_format and response_schema_enabled) (or use an explicit
isinstance/boolean check) so the value is always a bool when used by
_generate_single_gist; refer to get_supported_openai_params, supported_params,
supports_response_format, supports_response_schema, response_schema_enabled, and
constrained_decoding_supported to locate the change.

---

Nitpick comments:
In `@kaizen/llm/gist/prompts/generate_gist.jinja2`:
- Around line 7-8: The current fallback in the generate_gist.jinja2 prompt ("If
you are not able to shorten the conversation, just give me the original
messages.") can produce gists as large as the input; change the fallback to
return a minimal metadata-only response or explicitly mark/drop such chunks to
avoid storage bloat. Update the template (generate_gist.jinja2) to replace the
"just give me the original messages" instruction with a clear alternative such
as "if you cannot shorten the conversation, return only a minimal metadata
placeholder (e.g., 'unshortened_chunk' plus participant IDs and timestamps) or
mark the chunk to be skipped" so downstream code that consumes the prompt can
either store the small metadata placeholder or drop the chunk rather than
storing the full original messages.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8b964908-1475-4200-ab42-57238d49f8af

📥 Commits

Reviewing files that changed from the base of the PR and between 6bfdbbb and 6cf4714.

📒 Files selected for processing (14)
  • demo/gist-memory/README.md
  • demo/gist-memory/session1_script.md
  • demo/gist-memory/session2_script.md
  • kaizen/config/kaizen.py
  • kaizen/config/llm.py
  • kaizen/frontend/client/kaizen_client.py
  • kaizen/frontend/mcp/mcp_server.py
  • kaizen/llm/gist/__init__.py
  • kaizen/llm/gist/gist.py
  • kaizen/llm/gist/prompts/generate_gist.jinja2
  • kaizen/schema/gist.py
  • platform-integrations/claude/plugins/kaizen-lite/skills/gist/SKILL.md
  • platform-integrations/claude/plugins/kaizen-lite/skills/recall/scripts/retrieve_entities.py
  • tests/unit/test_gist.py

Comment on lines +47 to +49
```
user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking (Cilium, CNI plugins)
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language specifier to the fenced code block.

The code block is missing a language identifier, which triggers a markdownlint warning (MD040). Since this shows plain text output, use text or plaintext.

📝 Suggested fix
-```
+```text
 user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking (Cilium, CNI plugins)
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 47-47: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/gist-memory/README.md` around lines 47 - 49, The fenced code block in
the README.md currently has no language tag which triggers MD040; update the
block that contains "user prefers Python over R for data analysis; finds pandas
more intuitive than tidyverse; works with Kubernetes networking (Cilium, CNI
plugins)" by adding a language specifier (e.g., change the opening ``` to
```text or ```plaintext) so the markdown linter treats it as plain text and the
MD040 warning is resolved.

Comment on lines +1 to +5
# Session 1: Preference Embedding

Use these messages in order. The buried preference is in **Message 4**.

---
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix inconsistent message number reference.

Line 3 states the buried preference is in "Message 4", but the actual buried preference appears in "Message 5 (User)" at line 19.

📝 Proposed fix
 # Session 1: Preference Embedding
 
-Use these messages in order. The buried preference is in **Message 4**.
+Use these messages in order. The buried preference is in **Message 5**.
 
 ---
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Session 1: Preference Embedding
Use these messages in order. The buried preference is in **Message 4**.
---
# Session 1: Preference Embedding
Use these messages in order. The buried preference is in **Message 5**.
---
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo/gist-memory/session1_script.md` around lines 1 - 5, The header text
incorrectly references "Message 4" as containing the buried preference; update
that reference to the correct message number ("Message 5" or "Message 5 (User)")
in the "Session 1: Preference Embedding" block so the line that currently reads
the buried preference is in Message 4 matches the actual buried preference
location (Message 5 (User) at line 19).

Comment on lines +57 to +62
## Expected Gist Output

The gist should surface the buried preference:
```
user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking; troubleshooting CoreDNS; large cluster environment
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add language specifier to the expected output code block.

The fenced code block is missing a language specifier, which triggers a Markdown lint warning.

📝 Proposed fix
 ## Expected Gist Output
 
 The gist should surface the buried preference:
-```
+```text
 user prefers Python over R for data analysis; finds pandas more intuitive than tidyverse; works with Kubernetes networking; troubleshooting CoreDNS; large cluster environment
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.21.0)</summary>

[warning] 60-60: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @demo/gist-memory/session1_script.md around lines 57 - 62, The fenced code
block under "Expected Gist Output" lacks a language specifier; update the block
fence that wraps the expected gist (the triple-backtick block containing "user
prefers Python over R...") to include a language tag (e.g., change totext or ```txt) so the Markdown linter no longer flags it.


</details>

<!-- fingerprinting:phantom:poseidon:ocelot -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +303 to +389
def store_gists(
self,
namespace_id: str,
messages: list[dict],
conversation_id: str | None = None,
metadata: dict[str, Any] | None = None,
) -> list[EntityUpdate]:
"""Generate purpose-directed gists from conversation messages and store them.

Implements rolling consolidation: deletes any existing gists for the same
conversation_id before storing new ones, so the latest gist always reflects
the full session.
"""
if not messages:
return []

conversation_id = conversation_id or str(uuid.uuid4())
self.ensure_namespace(namespace_id)

# Delete existing gists for this conversation (rolling replacement)
existing = self.search_entities(
namespace_id=namespace_id,
query=None,
filters={"type": "gist", "metadata.conversation_id": conversation_id},
limit=100,
)
for entity in existing:
try:
self.delete_entity_by_id(namespace_id, entity.id)
except Exception:
logger.warning("Failed to delete old gist %s during rolling replacement", entity.id, exc_info=True)

# Generate gists
result = generate_gist(messages, conversation_id=conversation_id)

if not result.gists:
return []

# Store gist entities
base_metadata: dict[str, Any] = dict(metadata or {})
base_metadata["conversation_id"] = conversation_id
base_metadata["message_count"] = result.message_count

gist_entities = []
for i, gist_text in enumerate(result.gists):
gist_metadata = dict(base_metadata)
gist_metadata["chunk_index"] = i
gist_metadata["chunk_count"] = result.chunk_count
gist_entities.append(Entity(type="gist", content=gist_text, metadata=gist_metadata))

updates = self.update_entities(namespace_id, gist_entities, enable_conflict_resolution=False)

# Store original messages as gist_source for durable retrieval
source_entities = []
for i, msg in enumerate(messages):
content = msg.get("content", "")
if isinstance(content, list):
content = str(content)
source_entities.append(
Entity(
type="gist_source",
content=content,
metadata={
"conversation_id": conversation_id,
"message_index": i,
"role": msg.get("role", "unknown"),
},
)
)

if source_entities:
# Delete existing sources for this conversation first
existing_sources = self.search_entities(
namespace_id=namespace_id,
query=None,
filters={"type": "gist_source", "metadata.conversation_id": conversation_id},
limit=1000,
)
for entity in existing_sources:
try:
self.delete_entity_by_id(namespace_id, entity.id)
except Exception:
logger.warning("Failed to delete old gist_source %s", entity.id, exc_info=True)

self.update_entities(namespace_id, source_entities, enable_conflict_resolution=False)

return updates
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if there are any other callers of store_gists that might invoke it concurrently
rg -n "store_gists" --type=py -C2

Repository: AgentToolkit/kaizen

Length of output: 802


🏁 Script executed:

cat -n kaizen/frontend/mcp/mcp_server.py | sed -n '260,280p'

Repository: AgentToolkit/kaizen

Length of output: 949


🏁 Script executed:

# Also check if there are any async/concurrent patterns in the MCP server
rg -n "async|concurrent|thread|lock|pool" kaizen/frontend/mcp/mcp_server.py

Repository: AgentToolkit/kaizen

Length of output: 152


🏁 Script executed:

# Check the broader context of how the MCP server handles requests
head -50 kaizen/frontend/mcp/mcp_server.py

Repository: AgentToolkit/kaizen

Length of output: 1696


🏁 Script executed:

# Check the backend implementation for any locking mechanisms
rg -n "def delete_entity_by_id|def search_entities|def update_entities" kaizen/frontend/client/kaizen_client.py -A 5

Repository: AgentToolkit/kaizen

Length of output: 1085


🏁 Script executed:

# Check if there's a storage backend with its own locking
find . -name "*.py" -type f | xargs grep -l "delete_entity_by_id" | head -5

Repository: AgentToolkit/kaizen

Length of output: 194


🏁 Script executed:

# Check the filesystem backend for delete and search implementations
rg -n "def delete_entity_by_id|def search_entities|def update_entities|lock|Lock" kaizen/backend/filesystem.py -B 2 -A 8

Repository: AgentToolkit/kaizen

Length of output: 5080


🏁 Script executed:

# Get size of filesystem backend to decide how much to read
wc -l kaizen/backend/filesystem.py

Repository: AgentToolkit/kaizen

Length of output: 95


Document the non-atomic nature of the rolling replacement logic in store_gists.

The search-delete-insert sequence is not atomic. While the backend uses per-operation locking, concurrent calls with the same conversation_id can interleave, potentially resulting in duplicate gists. For example:

  1. Thread A searches and finds gist X
  2. Thread B searches and finds gist X
  3. Thread A deletes X, inserts Y
  4. Thread B deletes nothing (X already gone), inserts Z
    Result: both Y and Z coexist for the same conversation.

This is likely acceptable for the current single-user MCP tool context, but should be documented in the docstring or with a code comment explaining the assumption and acceptable degradation mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kaizen/frontend/client/kaizen_client.py` around lines 303 - 389, The rolling
replacement in store_gists is not atomic: the search/delete/insert sequence
(search_entities -> delete_entity_by_id -> update_entities) can interleave
across concurrent calls using the same conversation_id and produce duplicate
gist entities; update the store_gists docstring (and add an inline comment above
the delete+insert block) to explicitly state this non-atomic behavior, give the
concurrency example/acceptable degradation, and note the assumption that the
current single-user MCP context accepts possible duplicate gists rather than
implementing locking or transactional semantics.

Comment on lines +272 to +277
return json.dumps({
"success": True,
"conversation_id": conversation_id,
"gists_stored": len(updates),
"gists": [{"id": u.id, "content": u.content} for u in updates],
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ruff formatting check failed.

The pipeline indicates this file needs reformatting. Run ruff format kaizen/frontend/mcp/mcp_server.py to fix.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kaizen/frontend/mcp/mcp_server.py` around lines 272 - 277, The return JSON
block containing conversation_id and gists uses formatting that fails Ruff; fix
by reformatting the function containing that return (the block referencing
conversation_id, updates, and the list comprehension {"id": u.id, "content":
u.content} for u in updates) to comply with Ruff style—either run `ruff format
kaizen/frontend/mcp/mcp_server.py` or apply the equivalent formatting changes so
the return dict and list comprehension are properly spaced and wrapped.

Comment on lines +114 to +123
supported_params = get_supported_openai_params(
model=llm_settings.gist_model,
custom_llm_provider=llm_settings.custom_llm_provider,
)
supports_response_format = supported_params and "response_format" in supported_params
response_schema_enabled = supports_response_schema(
model=llm_settings.gist_model,
custom_llm_provider=llm_settings.custom_llm_provider,
)
constrained_decoding_supported = supports_response_format and response_schema_enabled
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix type narrowing for constrained_decoding_supported.

The pipeline reports a mypy error: constrained_decoding_supported has type list[Any] | bool | None but _generate_single_gist expects bool. The and expression doesn't guarantee a boolean result.

🐛 Proposed fix
     supported_params = get_supported_openai_params(
         model=llm_settings.gist_model,
         custom_llm_provider=llm_settings.custom_llm_provider,
     )
-    supports_response_format = supported_params and "response_format" in supported_params
+    supports_response_format = bool(supported_params and "response_format" in supported_params)
     response_schema_enabled = supports_response_schema(
         model=llm_settings.gist_model,
         custom_llm_provider=llm_settings.custom_llm_provider,
     )
-    constrained_decoding_supported = supports_response_format and response_schema_enabled
+    constrained_decoding_supported = bool(supports_response_format and response_schema_enabled)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
supported_params = get_supported_openai_params(
model=llm_settings.gist_model,
custom_llm_provider=llm_settings.custom_llm_provider,
)
supports_response_format = supported_params and "response_format" in supported_params
response_schema_enabled = supports_response_schema(
model=llm_settings.gist_model,
custom_llm_provider=llm_settings.custom_llm_provider,
)
constrained_decoding_supported = supports_response_format and response_schema_enabled
supported_params = get_supported_openai_params(
model=llm_settings.gist_model,
custom_llm_provider=llm_settings.custom_llm_provider,
)
supports_response_format = bool(supported_params and "response_format" in supported_params)
response_schema_enabled = supports_response_schema(
model=llm_settings.gist_model,
custom_llm_provider=llm_settings.custom_llm_provider,
)
constrained_decoding_supported = bool(supports_response_format and response_schema_enabled)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@kaizen/llm/gist/gist.py` around lines 114 - 123, The variable
constrained_decoding_supported can be non-boolean because supported_params may
be a list; update the computation to yield a strict bool before passing to
_generate_single_gist: explicitly compute supports_response_format by checking
that supported_params is truthy and that "response_format" is in it, compute
response_schema_enabled via supports_response_schema, then set
constrained_decoding_supported = bool(supports_response_format and
response_schema_enabled) (or use an explicit isinstance/boolean check) so the
value is always a bool when used by _generate_single_gist; refer to
get_supported_openai_params, supported_params, supports_response_format,
supports_response_schema, response_schema_enabled, and
constrained_decoding_supported to locate the change.

@visahak
Copy link
Copy Markdown
Collaborator

visahak commented Mar 23, 2026

can we hold off merging this for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants