Skip to content

feat: adaptive intelligence — background review, memory, skills#8508

Open
michaelneale wants to merge 54 commits intomainfrom
micn/goose-memory-learning
Open

feat: adaptive intelligence — background review, memory, skills#8508
michaelneale wants to merge 54 commits intomainfrom
micn/goose-memory-learning

Conversation

@michaelneale
Copy link
Copy Markdown
Collaborator

@michaelneale michaelneale commented Apr 13, 2026

Summary

Hermes-agent is pretty cool (hermes itself is both in some ways a nicer openclaw, but also has some RL trained models on llama, but underneath it - a pretty cool agent harness which has a done of good things in it).

Inspired by that, this brings over some ideas which seem relevant to goose, maintaining ongoing user preferences and recent memory, and curating skills, adding and adapting skills automatically, and more efficient context use.

This shows up as a platform extension which changes goose when it is on (only).
basically - as you use this, the more personal/smart it gets.

Core Features

  • Background memory review — after N user turns, autonomously reviews conversation for memory-worthy facts (user prefs, env details, corrections)
  • Background skill review — after N tool iterations (complex work), checks if a skill should be created/updated
  • Pre-compression flush — before context compaction, saves anything worth keeping permanently
  • Cross-session pattern detection — queries past sessions to notice recurring themes
  • MEMORY.md + USER.md format used to store things in easily readable way
  • create_skill + patch_skill tools — agent can create/update its own procedural skills
  • Knowledge management guidance in system prompt

Context Management

  • Lower compaction threshold to 50%
  • Proactive tool output pruning
  • a more Structured compaction template
  • Head+tail truncation for large tool results (shell, ACP file reads, automation scripts)

Tests

  • 30+ adaptive_memory unit tests (add/replace/remove, budgets, security, duplicates, section detection)
  • 10 knowledge_review tests (tool filtering, prompt selection, trigger thresholds)
  • Skill test cleanup with RAII guards

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Tool results between 100K-200K chars now get head+tail truncated in-place
(40% head, 60% tail) instead of passing through untouched. Results above
200K still get file-offloaded as before.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Change truncate_output to keep 20 lines from the head (40%) and 30
lines from the tail (60%) with a truncation notice in the middle,
instead of only keeping the last 50 lines. The notice shows how many
lines were dropped and where the full output was saved.

This gives better context when output is truncated, since the beginning
of command output often contains important information (e.g., the first
error in a build log).

Signed-off-by: Michael Neale <michael.neale@gmail.com>
When a file read returns more than 200,000 characters (after applying
line limits from the ACP server), return a helpful message with the
character count, line count, and a preview of the first 50 lines,
instead of the full content. This prevents overwhelming the agent
context with very large files and guides the agent to use line/limit
parameters for targeted reads.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Add head+tail truncation (40% head, 60% tail) to automation_script
tool output. When stdout or stderr exceeds 50,000 characters, the
middle is replaced with a truncation notice preserving the beginning
and end of the output.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
- Add user_profile category with priority rendering in system prompt
- Add size budgets: 1375 chars for user_profile, 2200 chars for other memories
- Budget enforcement at render time (truncates with capacity warnings)
- Budget warnings at write time (warns agent when over budget)
- New replace_memory tool for atomic find-and-replace updates
- Updated instructions for proactive memory saving (no more 'ask before saving')
- Tests for user_profile rendering, budget enforcement, and replace logic

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Skills can now be created from experience and patched when found outdated.
- create_skill: validates name, requires YAML frontmatter, writes to config dir
- patch_skill: find-and-replace in existing skills, rejects builtins and ambiguous matches
- Tests for creation, validation, patching, and error cases

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Tells the agent how to use memory and skills proactively:
- Save user corrections and preferences to user_profile
- Save environment facts to relevant categories
- Don't save task progress or temporary state
- Create skills after complex work, patch skills when found outdated

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
…tion in computercontroller

Signed-off-by: Michael Neale <michael.neale@gmail.com>
- Sanitize memory content: collapse consecutive newlines to prevent
  \n\n delimiter corruption in remember and replace_memory
- create_skill: use OpenOptions::create_new for atomic existence check
  (eliminates TOCTOU race)
- Wrap skill filesystem ops in spawn_blocking to avoid blocking Tokio
- Fix byte vs char inconsistency in ACP file read size check
- Rename GLOBAL_MEMORY_BUDGET to MEMORY_CATEGORY_BUDGET to clarify
  per-category semantics

Signed-off-by: Michael Neale <michael.neale@gmail.com>
@michaelneale michaelneale self-assigned this Apr 13, 2026
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Reads ~/.config/goose/SOUL.md at prompt assembly time and prepends it
to the system prompt. This is a user-editable file (not agent-written)
that lets users customize the agent's personality, tone, and style.

Mirrors Hermes's ~/.hermes/SOUL.md concept. File is optional — if
missing or empty, the system prompt is unchanged. Content is sanitized
for malicious Unicode tags.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Complete rewrite of the memory MCP server to match Hermes conventions:
- Single 'memory' tool with action param (add/replace/remove) + target (memory/user)
- § delimiter between entries (matches Hermes format)
- MEMORY.md for agent notes, USER.md for user profile
- Hard budget rejection: adds that exceed limit are REJECTED (not warned)
- Security scanning: blocks prompt injection patterns and invisible unicode
- Atomic file writes via tempfile + rename (no corruption on crash)
- Duplicate detection: rejects exact duplicate entries
- Frozen snapshot in system prompt with usage indicators
- 10 tests covering CRUD, budgets, security, and format

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Two autonomous memory extraction mechanisms, matching Hermes:

Background review (knowledge_review.rs):
- Spawns after every 5+ turn reply as a background tokio task
- Takes conversation snapshot, finds memory tools via extension_manager
- Calls complete_fast() with review prompt asking for durable facts
- Mini agent loop: model calls memory tools, up to 8 calls
- Runs silently — user never sees it

Pre-compaction flush (wired into compact_messages):
- Before ANY compaction (auto, recovery, or manual), gives model one
  cheap API call with only memory tools
- Prompt: 'session being compressed, save anything worth remembering'
- Catches all three compaction paths via single insertion point
- Falls back gracefully if flush fails

Also fixes clippy: UTF-8 safe string previews, removes unnecessary u32 cast.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
…ured compaction template

Three compression improvements matching Hermes:

1. Lower compaction threshold from 0.8 to 0.5 — compress early, not as
   an emergency. Creates headroom for knowledge management features.

2. Proactive tool result pruning — before calling the LLM summarizer,
   replaces old tool results >200 chars with stub placeholders. Protects
   the most recent 20 messages. This is free (no API cost) and dramatically
   reduces what the summarizer has to process.

3. Structured compaction template — replaces freeform 'remove verbose parts'
   prompt with Hermes-style structured sections: Goal, Constraints &
   Preferences, Progress (Done/In Progress/Blocked), Key Decisions,
   Relevant Files, Next Steps, Critical Context.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Two separate triggers for background review, matching Hermes:
- Memory review: fires after 5+ user turns (unchanged)
- Skill review: fires after 10+ tool iterations in a single reply
  (complex work that required many tool calls = likely worth capturing)

When both trigger at once, sends a combined prompt that asks for both
memory AND skill extraction in one call (avoids double API cost).

Skill review prompt asks specifically: 'Was a non-trivial approach used
that required trial and error, or changing course, or did the user
expect a different method?' — focused on learnable struggle, not routine.

Review agent gets both memory + skill tools (create_skill, patch_skill,
load_skill) so it can create new skills or update existing ones.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
External skills from ~/.agents/, .claude/, .agents/ etc are now
read-only to the agent. patch_skill only works on skills in
~/.config/goose/skills/ (where create_skill writes). This prevents
the agent from modifying user-curated or community-installed skills.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
… old MCP memory

The Hermes-style adaptive memory (USER.md, MEMORY.md, § delimiters,
hard budgets, security scanning) is now a platform extension called
'adaptive_memory' — runs in-process like skills, default-enabled,
with unprefixed 'memory' tool.

The original MCP memory server is restored unchanged. Users who had
the 'memory' builtin extension configured keep the old category-based
system. The new adaptive_memory extension runs alongside it — both
can coexist since they use different tool names (the old one has
remember_memory/retrieve_memories/etc, the new one has a single
'memory' tool with action param).

This means:
- No breaking change for existing memory users
- New adaptive learning is available by default for everyone
- Old MCP memory can still be enabled for project-scoped/categorized storage
- Background review and flush use the new platform extension's tool

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Background review now queries the SQLite session history via
search_chat_history before running the extraction. Recent sessions
(up to 3, excluding current) are summarized and appended to the
review prompt as context.

This lets the review agent notice cross-session patterns:
- 'user corrected me about X in multiple sessions'
- 'this tool quirk keeps coming up'
- 'user always prefers Y approach'

Uses the existing ChatHistorySearch infrastructure — no new
dependencies or schema changes needed.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Fixes:
- UTF-8 safe truncation in build_session_context (was panicking on
  multibyte chars with String::truncate)
- char_count now uses .chars().count() not .len() (budget enforcement
  was counting bytes, not characters — wrong for non-ASCII)
- File locking via fs2: exclusive lock on writes, shared lock on reads
  in mutation paths. Prevents lost updates from concurrent background
  review + flush + foreground writes.

Tests (18 new):
- Security scanning: prompt injection, role hijack, exfiltration,
  invisible unicode, normal content allowed
- File format: § delimiter, read/write roundtrip, empty file, missing file
- Budget: char-not-byte counting, limit values, rejection when full
- Duplicate detection
- Filename/target mapping
- Render block: empty, user profile format, memory format, usage display
- Success text formatting

Signed-off-by: Michael Neale <michael.neale@gmail.com>
- Extract filter_review_tools() for testability, add 10 unit tests:
  tool filtering (memory-only, skill-only, combined, excludes unrelated,
  empty scope), prompt selection logic, trigger thresholds, review limits

- Fix skill tests: use _test- prefixed names with TestSkillGuard (RAII
  cleanup) so tests don't leave artifacts in real config dir on panic

Signed-off-by: Michael Neale <michael.neale@gmail.com>
@michaelneale michaelneale changed the title feat: adaptive agent P0 — memory categories, tool result caps, skill creation feat: Hermes-style adaptive intelligence — background review, memory, skills Apr 14, 2026
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
- Simplify knowledge_review threshold tests to use assert_eq! instead
  of assert! on constant expressions (clippy::assertions_on_constants)
- Re-record all_platform_extensions snapshot without local Block skills
  so it matches CI environment (only goose-doc-guide)

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
@michaelneale michaelneale changed the title feat: Hermes-style adaptive intelligence — background review, memory, skills feat: adaptive intelligence — background review, memory, skills Apr 16, 2026
@michaelneale michaelneale marked this pull request as ready for review April 16, 2026 01:42
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2cb6c8fef

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +154 to +156
messages_to_compact,
extension_manager.is_some(),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate pruning behind adaptive-memory enablement

compact_messages now sets prune_tool_results from extension_manager.is_some(), but both runtime call sites always pass Some(...) (crates/goose/src/agents/agent.rs:1145 and crates/goose/src/agents/execute_commands.rs:104), so proactive tool-output pruning is effectively always on even when Adaptive Memory is disabled. This makes compaction behavior diverge from the feature gating intent and can silently remove historical tool output in sessions that never enabled the extension.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably valid problem yes...

Comment on lines +341 to +343
for tool_request in &tool_requests {
tool_calls_made += 1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce tool-call cap inside review request loop

MAX_REVIEW_TOOL_CALLS is checked only at the top of the outer loop, so a single model response containing many tool requests can execute well past the configured cap before the next iteration check runs. In the background reviewer this can cause unexpected extra writes/cost (e.g., multiple memory/skill mutations) despite the configured limit.

Useful? React with 👍 / 👎.

Comment on lines +359 to +361
let goose_skills_dir = Paths::config_dir().join("skills");
if !skill.path.starts_with(&goose_skills_dir) {
return Ok(CallToolResult::error(vec![Content::text(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Canonicalize skill path before goose-dir restriction

The patch_skill restriction uses skill.path.starts_with(Paths::config_dir()/skills) on a non-canonical path, which is bypassable via symlinked directories under ~/.config/goose/skills. In that case the prefix check passes, but the later write follows the symlink and can modify files outside goose-managed skills, defeating the safety boundary this change is trying to enforce.

Useful? React with 👍 / 👎.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
… per response, canonicalize skill paths

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d386ea410

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +161 to +163
.search_chat_history(
"*", // broad search
Some(5),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use a real wildcard when searching prior sessions

build_session_context passes "*" into search_chat_history, but the search layer tokenizes words and wraps each token as LIKE '%token%', so this becomes '%*%' (literal asterisk) instead of a broad match. In normal chats this returns little or nothing, so cross-session context is effectively missing and the background reviewer cannot detect recurring patterns as intended.

Useful? React with 👍 / 👎.

Comment thread crates/goose/src/agents/agent.rs Outdated
Comment on lines +1831 to +1835
if let Ok(provider) = self.provider().await {
// Always include memory when skill review fires — complex work
// often surfaces environment facts worth remembering too.
let review_memory = true;
super::knowledge_review::spawn_background_review(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate background review when adaptive memory is disabled

This always enables memory review (review_memory = true) whenever skill thresholds fire, without checking whether the adaptive-memory extension is enabled. In sessions where adaptive memory is off, complex replies can still trigger hidden review completions (the skill filter still admits load_skill), causing extra model/tool cost even though the feature is supposed to be inactive.

Useful? React with 👍 / 👎.

…bled, fix wildcard session search

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 41f2d91a69

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/goose/src/context_mgmt/mod.rs Outdated
Comment on lines +357 to +359
return MessageContent::text(
"[Old tool output cleared to save context space]",
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep pruned tool outputs removable during compaction retries

When adaptive-memory pruning is enabled, this rewrites long ToolResponse items into plain text stubs. The retry path later calls filter_tool_responses(...), which only removes messages that still contain MessageContent::ToolResponse, so these rewritten entries can no longer be dropped even at the 100% removal attempt. In tool-heavy sessions this can leave too much history in the summarization prompt and cause compaction to keep failing with context-length errors instead of recovering.

Useful? React with 👍 / 👎.

Comment on lines +268 to +271
if !content.starts_with("---") {
return Ok(CallToolResult::error(vec![Content::text(
"Skill content must start with YAML frontmatter (---\\nname: ...\\ndescription: ...\\n---)",
)]));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject malformed SKILL.md frontmatter in create_skill

This validation only checks that content starts with ---, so create_skill can report success for malformed frontmatter (e.g., missing description or closing delimiter). Those files are later skipped by skill discovery because parse_frontmatter/SkillMetadata parsing fails, which means the tool says the skill was created but it is not actually loadable or usable.

Useful? React with 👍 / 👎.

…emove stubs

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cbbe3732e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1834 to +1838
if adaptive_memory_enabled && (should_review_memory || should_review_skills) {
if should_review_memory {
self.turns_since_memory_review.store(0, std::sync::atomic::Ordering::Relaxed);
}
if let Ok(provider) = self.provider().await {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip hidden reviews for context-owning providers

The new background-review path runs for any provider when Adaptive Memory is enabled, but it does not exclude providers that keep their own session history (e.g. Claude Code / Gemini CLI via Provider::manages_own_context). In those environments, spawn_background_review(...) uses the same session_id, so the hidden memory/skill review turns are appended to the provider’s internal thread and can change subsequent user-visible responses. Guard this branch with !provider.manages_own_context() (or use an isolated review session id) so autonomous review does not pollute the active conversation state.

Useful? React with 👍 / 👎.

Comment on lines +392 to +394
let new_content = content.replacen(old_text.as_str(), &new_text, 1);
std::fs::write(&skill_path, &new_content)
.map_err(|e| format!("Failed to write skill: {}", e))?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sanitize patched skill text before writing to disk

patch_skill writes new_text directly into SKILL.md without any content safety checks, so a malicious prompt can persist prompt-injection or secret-exfiltration instructions into long-lived skills that are later loaded into model context. This commit already added threat scanning for adaptive memory entries, but the new skill mutation path bypasses equivalent validation; apply the same scanner (or a skill-specific one) before persisting generated skill content.

Useful? React with 👍 / 👎.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f723fe8ff3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

let user_msg_count = conversation
.messages()
.iter()
.filter(|m| matches!(m.role, rmcp::model::Role::User) && m.is_agent_visible())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count only real user turns before memory flush

flush_memories_before_compaction currently counts every agent-visible Role::User message, but tool results are also stored as user-role messages (with_tool_response). In tool-heavy sessions, a single human turn can exceed this threshold and trigger hidden flush reviews much earlier than intended, adding avoidable model/tool cost and weakening the “skip small chats” guard. Count only user text turns (or otherwise exclude tool-response user messages) for this gate.

Useful? React with 👍 / 👎.

Comment on lines +337 to +340
if name.is_empty() || old_text.is_empty() {
return Ok(CallToolResult::error(vec![Content::text(
"Missing required parameters: name and old_text",
)]));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require new_text in patch_skill validation

handle_patch_skill validates name and old_text but not new_text, so a malformed tool call missing new_text falls back to "" and replacen silently deletes the matched section while reporting success. That can unintentionally corrupt SKILL.md content; missing replacement text should be rejected before patching.

Useful? React with 👍 / 👎.

Comment on lines +393 to +394
std::fs::write(&skill_path, &new_content)
.map_err(|e| format!("Failed to write skill: {}", e))?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Block SKILL.md symlinks in patch_skill writes

The directory check canonicalizes skill.path, but the write path is still skill.path.join("SKILL.md") and std::fs::write follows symlinks. If SKILL.md is a symlink, patching can modify a file outside goose-managed skills despite the earlier boundary check. Reject symlinked skill files (or use no-follow open semantics) before writing.

Useful? React with 👍 / 👎.

…ent, count real user turns, require new_text, block symlinked SKILL.md

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09561a4927

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +269 to +270
if parse_skill_content(content, PathBuf::new()).is_none() {
return Ok(CallToolResult::error(vec![Content::text(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce frontmatter name match in create_skill

handle_create_skill validates that content parses, but it never checks that the parsed frontmatter name matches the name argument used for the directory path. A tool call like name="foo" with frontmatter name: bar will report success for foo, but discovery/loading keys off bar, which can make the created skill effectively unreachable under the returned name and can also create duplicate-name collisions in later discovery/patch flows.

Useful? React with 👍 / 👎.

Comment on lines +352 to +353
let skills = discover_skills(working_dir);
let skill = skills.iter().find(|s| s.name == name);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prefer goose-managed skill when resolving patch target

patch_skill resolves by first name match from discover_skills, but discovery order prioritizes local skill dirs before ~/.config/goose/skills. If a local skill shares the same name as a goose-managed skill, this picks the local one and then immediately rejects it as “externally installed,” so the goose-managed skill cannot be patched at all. This breaks adaptive updates for common skill names in repos that define local skills.

Useful? React with 👍 / 👎.

Comment on lines +220 to +222
matches!(m.role, rmcp::model::Role::User)
&& m.is_agent_visible()
&& m.content
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count only user-visible turns before memory flush

The pre-compaction flush threshold currently counts any agent-visible Role::User text message. Compaction summaries are also stored as user-role, agent-only text messages, so repeated compactions can satisfy the >= 3 gate even when the user has not sent three real turns, triggering extra hidden flush model calls and cost. The threshold should key off user-visible human turns (or an equivalent origin check) instead.

Useful? React with 👍 / 👎.

…, prefer goose-managed skill for patch, count user-visible turns only

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5b1f5a9b4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +523 to +525
// Rollback: delete the skill directory
let skill_dir = Paths::config_dir().join("skills").join(skill_name);
if let Err(e) = std::fs::remove_dir_all(&skill_dir) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Roll back blocked patches without deleting whole skill

When adversary review blocks a skill write, this rollback path unconditionally calls remove_dir_all on the skill directory. That is safe for a brand-new create_skill, but for patch_skill it deletes the entire pre-existing skill (including supporting files) rather than reverting just the attempted edit, so a false-positive BLOCK can permanently erase valid skills. Persist the pre-patch content and restore it on BLOCK, and only delete directories for newly created skills.

Useful? React with 👍 / 👎.

Comment on lines +78 to +82
if let (Some(ext_mgr), Some(wd)) = (extension_manager, working_dir) {
if let Err(e) = crate::agents::knowledge_review::flush_memories_before_compaction(
provider,
ext_mgr,
conversation,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip flush reviews for context-owning providers

compact_messages now always attempts a pre-compaction memory flush whenever these optional args are present, and /compact now always passes them. The flush path issues hidden provider.complete(...) calls using the live session_id; for providers that manage their own conversation state, those hidden turns can be appended to the real thread and influence later user-visible replies. Gate this flush behind !provider.manages_own_context() (or use an isolated review session id).

Useful? React with 👍 / 👎.

* main: (37 commits)
  polish: refine sidebar activity indicators, add placeholder token, and tidy search field (#8606)
  feat: add /edit command to cli for on-demand prompt editing (#8566)
  docs(mcp): add Rendex MCP Server extension tutorial (#8541)
  Lifei/delete tauri backend acp (#8582)
  chore: set goose binaries as executable in package.json (#8589)
  feat: add Novita AI as declarative provider (#8432)
  feat: add Kimi Code provider with OAuth device flow authentication (#8466)
  fix: chat loading-state model placeholder (#8431)
  fix: expand tool calls by default when Response Style is Detailed (#8478)
  fix: create logs dir before writing llm request log (#8522)
  fix: enable token usage tracking and configurable stream timeout for Ollama provider (#8493)
  fix tauri-plugin-dialog version constraint to match other plugins (#8542)
  call goose serve from tauri frontend via goose-acp client (#8549)
  failed the script when bundle:default fails and cleanup "alpha"  (#8580)
  pass globally unique conversation identifier as sessionId in databricks api call (#8576)
  fix: use sqlx chrono decode for thread timestamps instead of manual parsing (#8575)
  docs: remove stale gemini-acp references (#8572)
  show individual untracked files in git changes widget (#8574)
  fix: update publishing flow to include new sdk dir (#8573)
  fix: remove double border on content in chat (#8545)
  ...
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca215b0831

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

.flat_map(|m| m.content.iter())
.filter(|c| matches!(c, MessageContent::ToolRequest(_)))
.count()
.saturating_sub(pre_turn_tool_count);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Track skill-review calls without relying on total message count

Compute final_tool_count from a per-turn counter instead of total_tool_requests - pre_turn_tool_count. In this same method, recovery compaction can replace conversation with a summarized history, which may drop earlier tool-request messages; after that, saturating_sub(pre_turn_tool_count) can collapse to 0 even when this turn executed many tools. When that happens, should_review_skills stays false and the adaptive skill-review loop is silently skipped in exactly the long/complex turns where it is most needed.

Useful? React with 👍 / 👎.

)
.await;

if adaptive_memory_enabled && (should_review_memory || should_review_skills) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard background reviews when the turn was cancelled

This branch can still run after the reply loop exits due cancellation (is_token_cancelled(...) checks earlier in the function), so a user-interrupted turn may still increment review counters and spawn hidden memory/skill review model calls. That causes unexpected post-cancel cost and can mutate persistent memory/skills even though the user explicitly stopped execution. Add a cancellation guard before incrementing counters or spawning spawn_background_review.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 185c11a9aa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +432 to +434
let new_content = content.replacen(old_text.as_str(), &new_text, 1);
std::fs::write(&skill_path, &new_content)
.map_err(|e| format!("Failed to write skill: {}", e))?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate patched skill content before persisting

handle_patch_skill writes the replaced SKILL.md text immediately, but never re-validates that the result still has valid skill frontmatter. If a patch accidentally alters/removes delimiters or required fields, the tool reports success while discover_skills/load_skill later stop seeing that skill because parsing fails. Please parse/validate the post-patch content (and enforce expected metadata like name consistency) before writing it to disk.

Useful? React with 👍 / 👎.

Comment on lines +578 to +579
} else if let Err(e) = std::fs::remove_dir_all(&skill_dir) {
warn!("Failed to rollback blocked skill '{}': {}", skill_name, e);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve existing skill dirs on blocked create rollback

On a blocked create_skill, rollback always calls remove_dir_all on the skill directory. Because creation only requires SKILL.md to be new, this can delete pre-existing files if the directory already existed (for example supporting assets left there before SKILL.md was created). Rollback should only delete artifacts created by the current attempt (or only delete the new SKILL.md), otherwise a false-positive block can erase unrelated data.

Useful? React with 👍 / 👎.

…g docs

- Add session_summary.rs: format_conversation, truncate_around_matches,
  parallel LLM summarization (ported from Hermes session_search_tool.py)
- chatrecall search mode now summarizes by default (summarize param, default true),
  falls back to raw results if provider unavailable
- Replace build_session_context 200-char snippet dumping with full conversation
  formatting and intelligent truncation (8KB budget)
- Remove dead code: write_entries, read_entries_locked, head_tail_truncate
- Add dedup to read_entries via HashSet
- Remove planning docs (HERMES_LEARNING_AGENT.md, gap analysis, P0 plan)

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08a27dee3c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +179 to +181
model_config,
session_id,
SUMMARIZE_SYSTEM_PROMPT,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid passing historical IDs to provider summaries

summarize_single_session calls provider.complete with the matched past session’s session_id, so chat-recall summarization runs inside those historical provider threads instead of an isolated context. On providers that manage their own context by session id (e.g., Claude Code/Gemini CLI), this can append hidden recap prompts/results to old conversations and change what users see when they later resume those sessions. Use an isolated review session id (or skip LLM summarization when manages_own_context() is true).

Useful? React with 👍 / 👎.

let results = session_manager
.search_chat_history(
"*",
Some(5),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Increase review-context search window beyond five rows

build_review_context asks for only Some(5) results, but search_chat_history applies that as a SQL message-row limit (ORDER BY m.timestamp DESC LIMIT ?), not a session limit. In practice, a single busy recent session can consume all five rows, so the “cross-session” context often collapses to one session and misses recurring patterns from other recent sessions. Use a larger row window or a session-level limit to keep this feature effective.

Useful? React with 👍 / 👎.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d5497d3b9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +225 to +226
let user_entries = read_entries("USER.md");
let memory_entries = read_entries("MEMORY.md");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Sanitize persisted memory before building system instructions

AdaptiveMemoryClient::new loads USER.md/MEMORY.md directly via read_entries, but threat scanning is only enforced on the memory tool write path. If those files are modified through any other route (manual edit, shell/write-file tools, migrated state), injected directives are reintroduced into prompt context on startup and persist across sessions. Filter or reject unsafe entries when reading persisted memory, not just when writing through the tool.

Useful? React with 👍 / 👎.

Comment on lines +303 to +306
let mut file = std::fs::OpenOptions::new()
.write(true)
.create_new(true)
.open(&skill_path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject symlinked target paths in create_skill writes

create_skill writes SKILL.md using OpenOptions on skill_path without canonicalizing or rejecting symlinked directories/files first. If ~/.config/goose/skills/<name> is a symlink, this follows the link and writes outside goose-managed skills, which bypasses the safety boundary this feature is trying to enforce. Add a canonical-path/no-symlink check before opening the file.

Useful? React with 👍 / 👎.

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 429f99f665

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +72 to +73
MessageContent::Thinking(t) => {
parts.push(format!("[THINKING]: {}", t.thinking));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Omit thinking traces from recall transcripts

format_conversation now emits MessageContent::Thinking verbatim, and the new chatrecall search flow defaults to summarize=true, so these transcripts are sent to a model and returned to users as session summaries. In thinking-enabled sessions this can leak or paraphrase hidden chain-of-thought that was never user-visible in the original chat, which is a privacy/safety regression in the new recall pipeline. Exclude thinking/redacted-thinking (and ideally other non-user-visible content) from user-facing summarization inputs.

Useful? React with 👍 / 👎.

- Omit Thinking/RedactedThinking from format_conversation() so
  chain-of-thought never leaks into summarization input
- Use ephemeral session ID (summary_{id}) for provider.complete()
  so providers that track context by session ID don't pollute
  historical sessions

Signed-off-by: Michael Neale <michael.neale@gmail.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e843871542

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +180 to +184
let ephemeral_id = format!("summary_{}", session_id);

match provider
.complete(
model_config,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip recall summarization for context-owning providers

This path still issues hidden provider.complete(...) calls during chat-recall summarization for providers that keep their own conversation state, so the summarizer can mutate the active thread and affect later user-visible replies. Fresh evidence: GeminiCliProvider::stream ignores _session_id and always resumes a shared CLI session (-r from cli_session_id), which means the summary_{session_id} value here does not isolate these calls. Please gate this summarization call behind !provider.manages_own_context() (or run it through an isolated provider/session).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant