feat: adaptive intelligence — background review, memory, skills#8508
feat: adaptive intelligence — background review, memory, skills#8508michaelneale wants to merge 54 commits intomainfrom
Conversation
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Tool results between 100K-200K chars now get head+tail truncated in-place (40% head, 60% tail) instead of passing through untouched. Results above 200K still get file-offloaded as before. Signed-off-by: Michael Neale <michael.neale@gmail.com>
Change truncate_output to keep 20 lines from the head (40%) and 30 lines from the tail (60%) with a truncation notice in the middle, instead of only keeping the last 50 lines. The notice shows how many lines were dropped and where the full output was saved. This gives better context when output is truncated, since the beginning of command output often contains important information (e.g., the first error in a build log). Signed-off-by: Michael Neale <michael.neale@gmail.com>
When a file read returns more than 200,000 characters (after applying line limits from the ACP server), return a helpful message with the character count, line count, and a preview of the first 50 lines, instead of the full content. This prevents overwhelming the agent context with very large files and guides the agent to use line/limit parameters for targeted reads. Signed-off-by: Michael Neale <michael.neale@gmail.com>
Add head+tail truncation (40% head, 60% tail) to automation_script tool output. When stdout or stderr exceeds 50,000 characters, the middle is replaced with a truncation notice preserving the beginning and end of the output. Signed-off-by: Michael Neale <michael.neale@gmail.com>
- Add user_profile category with priority rendering in system prompt - Add size budgets: 1375 chars for user_profile, 2200 chars for other memories - Budget enforcement at render time (truncates with capacity warnings) - Budget warnings at write time (warns agent when over budget) - New replace_memory tool for atomic find-and-replace updates - Updated instructions for proactive memory saving (no more 'ask before saving') - Tests for user_profile rendering, budget enforcement, and replace logic Signed-off-by: Michael Neale <michael.neale@gmail.com>
Skills can now be created from experience and patched when found outdated. - create_skill: validates name, requires YAML frontmatter, writes to config dir - patch_skill: find-and-replace in existing skills, rejects builtins and ambiguous matches - Tests for creation, validation, patching, and error cases Signed-off-by: Michael Neale <michael.neale@gmail.com>
Tells the agent how to use memory and skills proactively: - Save user corrections and preferences to user_profile - Save environment facts to relevant categories - Don't save task progress or temporary state - Create skills after complex work, patch skills when found outdated Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
…tion in computercontroller Signed-off-by: Michael Neale <michael.neale@gmail.com>
- Sanitize memory content: collapse consecutive newlines to prevent \n\n delimiter corruption in remember and replace_memory - create_skill: use OpenOptions::create_new for atomic existence check (eliminates TOCTOU race) - Wrap skill filesystem ops in spawn_blocking to avoid blocking Tokio - Fix byte vs char inconsistency in ACP file read size check - Rename GLOBAL_MEMORY_BUDGET to MEMORY_CATEGORY_BUDGET to clarify per-category semantics Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Reads ~/.config/goose/SOUL.md at prompt assembly time and prepends it to the system prompt. This is a user-editable file (not agent-written) that lets users customize the agent's personality, tone, and style. Mirrors Hermes's ~/.hermes/SOUL.md concept. File is optional — if missing or empty, the system prompt is unchanged. Content is sanitized for malicious Unicode tags. Signed-off-by: Michael Neale <michael.neale@gmail.com>
Complete rewrite of the memory MCP server to match Hermes conventions: - Single 'memory' tool with action param (add/replace/remove) + target (memory/user) - § delimiter between entries (matches Hermes format) - MEMORY.md for agent notes, USER.md for user profile - Hard budget rejection: adds that exceed limit are REJECTED (not warned) - Security scanning: blocks prompt injection patterns and invisible unicode - Atomic file writes via tempfile + rename (no corruption on crash) - Duplicate detection: rejects exact duplicate entries - Frozen snapshot in system prompt with usage indicators - 10 tests covering CRUD, budgets, security, and format Signed-off-by: Michael Neale <michael.neale@gmail.com>
Two autonomous memory extraction mechanisms, matching Hermes: Background review (knowledge_review.rs): - Spawns after every 5+ turn reply as a background tokio task - Takes conversation snapshot, finds memory tools via extension_manager - Calls complete_fast() with review prompt asking for durable facts - Mini agent loop: model calls memory tools, up to 8 calls - Runs silently — user never sees it Pre-compaction flush (wired into compact_messages): - Before ANY compaction (auto, recovery, or manual), gives model one cheap API call with only memory tools - Prompt: 'session being compressed, save anything worth remembering' - Catches all three compaction paths via single insertion point - Falls back gracefully if flush fails Also fixes clippy: UTF-8 safe string previews, removes unnecessary u32 cast. Signed-off-by: Michael Neale <michael.neale@gmail.com>
…ured compaction template Three compression improvements matching Hermes: 1. Lower compaction threshold from 0.8 to 0.5 — compress early, not as an emergency. Creates headroom for knowledge management features. 2. Proactive tool result pruning — before calling the LLM summarizer, replaces old tool results >200 chars with stub placeholders. Protects the most recent 20 messages. This is free (no API cost) and dramatically reduces what the summarizer has to process. 3. Structured compaction template — replaces freeform 'remove verbose parts' prompt with Hermes-style structured sections: Goal, Constraints & Preferences, Progress (Done/In Progress/Blocked), Key Decisions, Relevant Files, Next Steps, Critical Context. Signed-off-by: Michael Neale <michael.neale@gmail.com>
Two separate triggers for background review, matching Hermes: - Memory review: fires after 5+ user turns (unchanged) - Skill review: fires after 10+ tool iterations in a single reply (complex work that required many tool calls = likely worth capturing) When both trigger at once, sends a combined prompt that asks for both memory AND skill extraction in one call (avoids double API cost). Skill review prompt asks specifically: 'Was a non-trivial approach used that required trial and error, or changing course, or did the user expect a different method?' — focused on learnable struggle, not routine. Review agent gets both memory + skill tools (create_skill, patch_skill, load_skill) so it can create new skills or update existing ones. Signed-off-by: Michael Neale <michael.neale@gmail.com>
External skills from ~/.agents/, .claude/, .agents/ etc are now read-only to the agent. patch_skill only works on skills in ~/.config/goose/skills/ (where create_skill writes). This prevents the agent from modifying user-curated or community-installed skills. Signed-off-by: Michael Neale <michael.neale@gmail.com>
… old MCP memory The Hermes-style adaptive memory (USER.md, MEMORY.md, § delimiters, hard budgets, security scanning) is now a platform extension called 'adaptive_memory' — runs in-process like skills, default-enabled, with unprefixed 'memory' tool. The original MCP memory server is restored unchanged. Users who had the 'memory' builtin extension configured keep the old category-based system. The new adaptive_memory extension runs alongside it — both can coexist since they use different tool names (the old one has remember_memory/retrieve_memories/etc, the new one has a single 'memory' tool with action param). This means: - No breaking change for existing memory users - New adaptive learning is available by default for everyone - Old MCP memory can still be enabled for project-scoped/categorized storage - Background review and flush use the new platform extension's tool Signed-off-by: Michael Neale <michael.neale@gmail.com>
Background review now queries the SQLite session history via search_chat_history before running the extraction. Recent sessions (up to 3, excluding current) are summarized and appended to the review prompt as context. This lets the review agent notice cross-session patterns: - 'user corrected me about X in multiple sessions' - 'this tool quirk keeps coming up' - 'user always prefers Y approach' Uses the existing ChatHistorySearch infrastructure — no new dependencies or schema changes needed. Signed-off-by: Michael Neale <michael.neale@gmail.com>
Fixes: - UTF-8 safe truncation in build_session_context (was panicking on multibyte chars with String::truncate) - char_count now uses .chars().count() not .len() (budget enforcement was counting bytes, not characters — wrong for non-ASCII) - File locking via fs2: exclusive lock on writes, shared lock on reads in mutation paths. Prevents lost updates from concurrent background review + flush + foreground writes. Tests (18 new): - Security scanning: prompt injection, role hijack, exfiltration, invisible unicode, normal content allowed - File format: § delimiter, read/write roundtrip, empty file, missing file - Budget: char-not-byte counting, limit values, rejection when full - Duplicate detection - Filename/target mapping - Render block: empty, user profile format, memory format, usage display - Success text formatting Signed-off-by: Michael Neale <michael.neale@gmail.com>
- Extract filter_review_tools() for testability, add 10 unit tests: tool filtering (memory-only, skill-only, combined, excludes unrelated, empty scope), prompt selection logic, trigger thresholds, review limits - Fix skill tests: use _test- prefixed names with TestSkillGuard (RAII cleanup) so tests don't leave artifacts in real config dir on panic Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
- Simplify knowledge_review threshold tests to use assert_eq! instead of assert! on constant expressions (clippy::assertions_on_constants) - Re-record all_platform_extensions snapshot without local Block skills so it matches CI environment (only goose-doc-guide) Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f2cb6c8fef
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| messages_to_compact, | ||
| extension_manager.is_some(), | ||
| ) |
There was a problem hiding this comment.
Gate pruning behind adaptive-memory enablement
compact_messages now sets prune_tool_results from extension_manager.is_some(), but both runtime call sites always pass Some(...) (crates/goose/src/agents/agent.rs:1145 and crates/goose/src/agents/execute_commands.rs:104), so proactive tool-output pruning is effectively always on even when Adaptive Memory is disabled. This makes compaction behavior diverge from the feature gating intent and can silently remove historical tool output in sessions that never enabled the extension.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
probably valid problem yes...
| for tool_request in &tool_requests { | ||
| tool_calls_made += 1; | ||
|
|
There was a problem hiding this comment.
Enforce tool-call cap inside review request loop
MAX_REVIEW_TOOL_CALLS is checked only at the top of the outer loop, so a single model response containing many tool requests can execute well past the configured cap before the next iteration check runs. In the background reviewer this can cause unexpected extra writes/cost (e.g., multiple memory/skill mutations) despite the configured limit.
Useful? React with 👍 / 👎.
| let goose_skills_dir = Paths::config_dir().join("skills"); | ||
| if !skill.path.starts_with(&goose_skills_dir) { | ||
| return Ok(CallToolResult::error(vec![Content::text( |
There was a problem hiding this comment.
Canonicalize skill path before goose-dir restriction
The patch_skill restriction uses skill.path.starts_with(Paths::config_dir()/skills) on a non-canonical path, which is bypassable via symlinked directories under ~/.config/goose/skills. In that case the prefix check passes, but the later write follows the symlink and can modify files outside goose-managed skills, defeating the safety boundary this change is trying to enforce.
Useful? React with 👍 / 👎.
Signed-off-by: Michael Neale <michael.neale@gmail.com>
… per response, canonicalize skill paths Signed-off-by: Michael Neale <michael.neale@gmail.com>
Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0d386ea410
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| .search_chat_history( | ||
| "*", // broad search | ||
| Some(5), |
There was a problem hiding this comment.
Use a real wildcard when searching prior sessions
build_session_context passes "*" into search_chat_history, but the search layer tokenizes words and wraps each token as LIKE '%token%', so this becomes '%*%' (literal asterisk) instead of a broad match. In normal chats this returns little or nothing, so cross-session context is effectively missing and the background reviewer cannot detect recurring patterns as intended.
Useful? React with 👍 / 👎.
| if let Ok(provider) = self.provider().await { | ||
| // Always include memory when skill review fires — complex work | ||
| // often surfaces environment facts worth remembering too. | ||
| let review_memory = true; | ||
| super::knowledge_review::spawn_background_review( |
There was a problem hiding this comment.
Gate background review when adaptive memory is disabled
This always enables memory review (review_memory = true) whenever skill thresholds fire, without checking whether the adaptive-memory extension is enabled. In sessions where adaptive memory is off, complex replies can still trigger hidden review completions (the skill filter still admits load_skill), causing extra model/tool cost even though the feature is supposed to be inactive.
Useful? React with 👍 / 👎.
…bled, fix wildcard session search Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 41f2d91a69
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return MessageContent::text( | ||
| "[Old tool output cleared to save context space]", | ||
| ); |
There was a problem hiding this comment.
Keep pruned tool outputs removable during compaction retries
When adaptive-memory pruning is enabled, this rewrites long ToolResponse items into plain text stubs. The retry path later calls filter_tool_responses(...), which only removes messages that still contain MessageContent::ToolResponse, so these rewritten entries can no longer be dropped even at the 100% removal attempt. In tool-heavy sessions this can leave too much history in the summarization prompt and cause compaction to keep failing with context-length errors instead of recovering.
Useful? React with 👍 / 👎.
| if !content.starts_with("---") { | ||
| return Ok(CallToolResult::error(vec![Content::text( | ||
| "Skill content must start with YAML frontmatter (---\\nname: ...\\ndescription: ...\\n---)", | ||
| )])); |
There was a problem hiding this comment.
Reject malformed SKILL.md frontmatter in create_skill
This validation only checks that content starts with ---, so create_skill can report success for malformed frontmatter (e.g., missing description or closing delimiter). Those files are later skipped by skill discovery because parse_frontmatter/SkillMetadata parsing fails, which means the tool says the skill was created but it is not actually loadable or usable.
Useful? React with 👍 / 👎.
…emove stubs Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cbbe3732e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if adaptive_memory_enabled && (should_review_memory || should_review_skills) { | ||
| if should_review_memory { | ||
| self.turns_since_memory_review.store(0, std::sync::atomic::Ordering::Relaxed); | ||
| } | ||
| if let Ok(provider) = self.provider().await { |
There was a problem hiding this comment.
Skip hidden reviews for context-owning providers
The new background-review path runs for any provider when Adaptive Memory is enabled, but it does not exclude providers that keep their own session history (e.g. Claude Code / Gemini CLI via Provider::manages_own_context). In those environments, spawn_background_review(...) uses the same session_id, so the hidden memory/skill review turns are appended to the provider’s internal thread and can change subsequent user-visible responses. Guard this branch with !provider.manages_own_context() (or use an isolated review session id) so autonomous review does not pollute the active conversation state.
Useful? React with 👍 / 👎.
| let new_content = content.replacen(old_text.as_str(), &new_text, 1); | ||
| std::fs::write(&skill_path, &new_content) | ||
| .map_err(|e| format!("Failed to write skill: {}", e))?; |
There was a problem hiding this comment.
Sanitize patched skill text before writing to disk
patch_skill writes new_text directly into SKILL.md without any content safety checks, so a malicious prompt can persist prompt-injection or secret-exfiltration instructions into long-lived skills that are later loaded into model context. This commit already added threat scanning for adaptive memory entries, but the new skill mutation path bypasses equivalent validation; apply the same scanner (or a skill-specific one) before persisting generated skill content.
Useful? React with 👍 / 👎.
Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f723fe8ff3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let user_msg_count = conversation | ||
| .messages() | ||
| .iter() | ||
| .filter(|m| matches!(m.role, rmcp::model::Role::User) && m.is_agent_visible()) |
There was a problem hiding this comment.
Count only real user turns before memory flush
flush_memories_before_compaction currently counts every agent-visible Role::User message, but tool results are also stored as user-role messages (with_tool_response). In tool-heavy sessions, a single human turn can exceed this threshold and trigger hidden flush reviews much earlier than intended, adding avoidable model/tool cost and weakening the “skip small chats” guard. Count only user text turns (or otherwise exclude tool-response user messages) for this gate.
Useful? React with 👍 / 👎.
| if name.is_empty() || old_text.is_empty() { | ||
| return Ok(CallToolResult::error(vec![Content::text( | ||
| "Missing required parameters: name and old_text", | ||
| )])); |
There was a problem hiding this comment.
Require new_text in patch_skill validation
handle_patch_skill validates name and old_text but not new_text, so a malformed tool call missing new_text falls back to "" and replacen silently deletes the matched section while reporting success. That can unintentionally corrupt SKILL.md content; missing replacement text should be rejected before patching.
Useful? React with 👍 / 👎.
| std::fs::write(&skill_path, &new_content) | ||
| .map_err(|e| format!("Failed to write skill: {}", e))?; |
There was a problem hiding this comment.
Block SKILL.md symlinks in patch_skill writes
The directory check canonicalizes skill.path, but the write path is still skill.path.join("SKILL.md") and std::fs::write follows symlinks. If SKILL.md is a symlink, patching can modify a file outside goose-managed skills despite the earlier boundary check. Reject symlinked skill files (or use no-follow open semantics) before writing.
Useful? React with 👍 / 👎.
…ent, count real user turns, require new_text, block symlinked SKILL.md Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 09561a4927
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if parse_skill_content(content, PathBuf::new()).is_none() { | ||
| return Ok(CallToolResult::error(vec![Content::text( |
There was a problem hiding this comment.
Enforce frontmatter name match in create_skill
handle_create_skill validates that content parses, but it never checks that the parsed frontmatter name matches the name argument used for the directory path. A tool call like name="foo" with frontmatter name: bar will report success for foo, but discovery/loading keys off bar, which can make the created skill effectively unreachable under the returned name and can also create duplicate-name collisions in later discovery/patch flows.
Useful? React with 👍 / 👎.
| let skills = discover_skills(working_dir); | ||
| let skill = skills.iter().find(|s| s.name == name); |
There was a problem hiding this comment.
Prefer goose-managed skill when resolving patch target
patch_skill resolves by first name match from discover_skills, but discovery order prioritizes local skill dirs before ~/.config/goose/skills. If a local skill shares the same name as a goose-managed skill, this picks the local one and then immediately rejects it as “externally installed,” so the goose-managed skill cannot be patched at all. This breaks adaptive updates for common skill names in repos that define local skills.
Useful? React with 👍 / 👎.
| matches!(m.role, rmcp::model::Role::User) | ||
| && m.is_agent_visible() | ||
| && m.content |
There was a problem hiding this comment.
Count only user-visible turns before memory flush
The pre-compaction flush threshold currently counts any agent-visible Role::User text message. Compaction summaries are also stored as user-role, agent-only text messages, so repeated compactions can satisfy the >= 3 gate even when the user has not sent three real turns, triggering extra hidden flush model calls and cost. The threshold should key off user-visible human turns (or an equivalent origin check) instead.
Useful? React with 👍 / 👎.
…, prefer goose-managed skill for patch, count user-visible turns only Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c5b1f5a9b4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Rollback: delete the skill directory | ||
| let skill_dir = Paths::config_dir().join("skills").join(skill_name); | ||
| if let Err(e) = std::fs::remove_dir_all(&skill_dir) { |
There was a problem hiding this comment.
Roll back blocked patches without deleting whole skill
When adversary review blocks a skill write, this rollback path unconditionally calls remove_dir_all on the skill directory. That is safe for a brand-new create_skill, but for patch_skill it deletes the entire pre-existing skill (including supporting files) rather than reverting just the attempted edit, so a false-positive BLOCK can permanently erase valid skills. Persist the pre-patch content and restore it on BLOCK, and only delete directories for newly created skills.
Useful? React with 👍 / 👎.
| if let (Some(ext_mgr), Some(wd)) = (extension_manager, working_dir) { | ||
| if let Err(e) = crate::agents::knowledge_review::flush_memories_before_compaction( | ||
| provider, | ||
| ext_mgr, | ||
| conversation, |
There was a problem hiding this comment.
Skip flush reviews for context-owning providers
compact_messages now always attempts a pre-compaction memory flush whenever these optional args are present, and /compact now always passes them. The flush path issues hidden provider.complete(...) calls using the live session_id; for providers that manage their own conversation state, those hidden turns can be appended to the real thread and influence later user-visible replies. Gate this flush behind !provider.manages_own_context() (or use an isolated review session id).
Useful? React with 👍 / 👎.
* main: (37 commits) polish: refine sidebar activity indicators, add placeholder token, and tidy search field (#8606) feat: add /edit command to cli for on-demand prompt editing (#8566) docs(mcp): add Rendex MCP Server extension tutorial (#8541) Lifei/delete tauri backend acp (#8582) chore: set goose binaries as executable in package.json (#8589) feat: add Novita AI as declarative provider (#8432) feat: add Kimi Code provider with OAuth device flow authentication (#8466) fix: chat loading-state model placeholder (#8431) fix: expand tool calls by default when Response Style is Detailed (#8478) fix: create logs dir before writing llm request log (#8522) fix: enable token usage tracking and configurable stream timeout for Ollama provider (#8493) fix tauri-plugin-dialog version constraint to match other plugins (#8542) call goose serve from tauri frontend via goose-acp client (#8549) failed the script when bundle:default fails and cleanup "alpha" (#8580) pass globally unique conversation identifier as sessionId in databricks api call (#8576) fix: use sqlx chrono decode for thread timestamps instead of manual parsing (#8575) docs: remove stale gemini-acp references (#8572) show individual untracked files in git changes widget (#8574) fix: update publishing flow to include new sdk dir (#8573) fix: remove double border on content in chat (#8545) ...
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca215b0831
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| .flat_map(|m| m.content.iter()) | ||
| .filter(|c| matches!(c, MessageContent::ToolRequest(_))) | ||
| .count() | ||
| .saturating_sub(pre_turn_tool_count); |
There was a problem hiding this comment.
Track skill-review calls without relying on total message count
Compute final_tool_count from a per-turn counter instead of total_tool_requests - pre_turn_tool_count. In this same method, recovery compaction can replace conversation with a summarized history, which may drop earlier tool-request messages; after that, saturating_sub(pre_turn_tool_count) can collapse to 0 even when this turn executed many tools. When that happens, should_review_skills stays false and the adaptive skill-review loop is silently skipped in exactly the long/complex turns where it is most needed.
Useful? React with 👍 / 👎.
| ) | ||
| .await; | ||
|
|
||
| if adaptive_memory_enabled && (should_review_memory || should_review_skills) { |
There was a problem hiding this comment.
Guard background reviews when the turn was cancelled
This branch can still run after the reply loop exits due cancellation (is_token_cancelled(...) checks earlier in the function), so a user-interrupted turn may still increment review counters and spawn hidden memory/skill review model calls. That causes unexpected post-cancel cost and can mutate persistent memory/skills even though the user explicitly stopped execution. Add a cancellation guard before incrementing counters or spawning spawn_background_review.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 185c11a9aa
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let new_content = content.replacen(old_text.as_str(), &new_text, 1); | ||
| std::fs::write(&skill_path, &new_content) | ||
| .map_err(|e| format!("Failed to write skill: {}", e))?; |
There was a problem hiding this comment.
Validate patched skill content before persisting
handle_patch_skill writes the replaced SKILL.md text immediately, but never re-validates that the result still has valid skill frontmatter. If a patch accidentally alters/removes delimiters or required fields, the tool reports success while discover_skills/load_skill later stop seeing that skill because parsing fails. Please parse/validate the post-patch content (and enforce expected metadata like name consistency) before writing it to disk.
Useful? React with 👍 / 👎.
| } else if let Err(e) = std::fs::remove_dir_all(&skill_dir) { | ||
| warn!("Failed to rollback blocked skill '{}': {}", skill_name, e); |
There was a problem hiding this comment.
Preserve existing skill dirs on blocked create rollback
On a blocked create_skill, rollback always calls remove_dir_all on the skill directory. Because creation only requires SKILL.md to be new, this can delete pre-existing files if the directory already existed (for example supporting assets left there before SKILL.md was created). Rollback should only delete artifacts created by the current attempt (or only delete the new SKILL.md), otherwise a false-positive block can erase unrelated data.
Useful? React with 👍 / 👎.
…g docs - Add session_summary.rs: format_conversation, truncate_around_matches, parallel LLM summarization (ported from Hermes session_search_tool.py) - chatrecall search mode now summarizes by default (summarize param, default true), falls back to raw results if provider unavailable - Replace build_session_context 200-char snippet dumping with full conversation formatting and intelligent truncation (8KB budget) - Remove dead code: write_entries, read_entries_locked, head_tail_truncate - Add dedup to read_entries via HashSet - Remove planning docs (HERMES_LEARNING_AGENT.md, gap analysis, P0 plan) Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08a27dee3c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| model_config, | ||
| session_id, | ||
| SUMMARIZE_SYSTEM_PROMPT, |
There was a problem hiding this comment.
Avoid passing historical IDs to provider summaries
summarize_single_session calls provider.complete with the matched past session’s session_id, so chat-recall summarization runs inside those historical provider threads instead of an isolated context. On providers that manage their own context by session id (e.g., Claude Code/Gemini CLI), this can append hidden recap prompts/results to old conversations and change what users see when they later resume those sessions. Use an isolated review session id (or skip LLM summarization when manages_own_context() is true).
Useful? React with 👍 / 👎.
| let results = session_manager | ||
| .search_chat_history( | ||
| "*", | ||
| Some(5), |
There was a problem hiding this comment.
Increase review-context search window beyond five rows
build_review_context asks for only Some(5) results, but search_chat_history applies that as a SQL message-row limit (ORDER BY m.timestamp DESC LIMIT ?), not a session limit. In practice, a single busy recent session can consume all five rows, so the “cross-session” context often collapses to one session and misses recurring patterns from other recent sessions. Use a larger row window or a session-level limit to keep this feature effective.
Useful? React with 👍 / 👎.
Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d5497d3b9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let user_entries = read_entries("USER.md"); | ||
| let memory_entries = read_entries("MEMORY.md"); |
There was a problem hiding this comment.
Sanitize persisted memory before building system instructions
AdaptiveMemoryClient::new loads USER.md/MEMORY.md directly via read_entries, but threat scanning is only enforced on the memory tool write path. If those files are modified through any other route (manual edit, shell/write-file tools, migrated state), injected directives are reintroduced into prompt context on startup and persist across sessions. Filter or reject unsafe entries when reading persisted memory, not just when writing through the tool.
Useful? React with 👍 / 👎.
| let mut file = std::fs::OpenOptions::new() | ||
| .write(true) | ||
| .create_new(true) | ||
| .open(&skill_path) |
There was a problem hiding this comment.
Reject symlinked target paths in create_skill writes
create_skill writes SKILL.md using OpenOptions on skill_path without canonicalizing or rejecting symlinked directories/files first. If ~/.config/goose/skills/<name> is a symlink, this follows the link and writes outside goose-managed skills, which bypasses the safety boundary this feature is trying to enforce. Add a canonical-path/no-symlink check before opening the file.
Useful? React with 👍 / 👎.
Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 429f99f665
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| MessageContent::Thinking(t) => { | ||
| parts.push(format!("[THINKING]: {}", t.thinking)); |
There was a problem hiding this comment.
Omit thinking traces from recall transcripts
format_conversation now emits MessageContent::Thinking verbatim, and the new chatrecall search flow defaults to summarize=true, so these transcripts are sent to a model and returned to users as session summaries. In thinking-enabled sessions this can leak or paraphrase hidden chain-of-thought that was never user-visible in the original chat, which is a privacy/safety regression in the new recall pipeline. Exclude thinking/redacted-thinking (and ideally other non-user-visible content) from user-facing summarization inputs.
Useful? React with 👍 / 👎.
- Omit Thinking/RedactedThinking from format_conversation() so
chain-of-thought never leaks into summarization input
- Use ephemeral session ID (summary_{id}) for provider.complete()
so providers that track context by session ID don't pollute
historical sessions
Signed-off-by: Michael Neale <michael.neale@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e843871542
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let ephemeral_id = format!("summary_{}", session_id); | ||
|
|
||
| match provider | ||
| .complete( | ||
| model_config, |
There was a problem hiding this comment.
Skip recall summarization for context-owning providers
This path still issues hidden provider.complete(...) calls during chat-recall summarization for providers that keep their own conversation state, so the summarizer can mutate the active thread and affect later user-visible replies. Fresh evidence: GeminiCliProvider::stream ignores _session_id and always resumes a shared CLI session (-r from cli_session_id), which means the summary_{session_id} value here does not isolate these calls. Please gate this summarization call behind !provider.manages_own_context() (or run it through an isolated provider/session).
Useful? React with 👍 / 👎.
Summary
Hermes-agent is pretty cool (hermes itself is both in some ways a nicer openclaw, but also has some RL trained models on llama, but underneath it - a pretty cool agent harness which has a done of good things in it).
Inspired by that, this brings over some ideas which seem relevant to goose, maintaining ongoing user preferences and recent memory, and curating skills, adding and adapting skills automatically, and more efficient context use.
This shows up as a platform extension which changes goose when it is on (only).
basically - as you use this, the more personal/smart it gets.
Core Features
Context Management
Tests