Agent capability gaps: long-task continuity, permissions, tool reliability, docs#4
Merged
Merged
Conversation
…liability, docs CLI dispatch & harness hygiene: - unknown subcommand errors with a "did you mean" suggestion instead of burning an LLM call - doctor exits non-zero on failures; --ask-for-approval rejects unknown values - env-var-ignored notice respects --quiet/log level; config-write failures are a clean line, not a Node stack - skill-candidate triviality gate (skip all-readonly / clarifying-question runs) - JSONL session-file count cap Long-horizon tasks: - task-frame no longer latches paused_resumable after a worked-around tool error (false-incomplete + suppressed skill learning) - max_turns truncation is surfaced to the human CLI with how-to-continue - fork keys use sub-second precision to avoid same-second collisions Permissions & safety: - device mutations are never blanket-trusted by "always" (next device command re-prompts) - headless auto-approve leaves a stderr audit line; broad trustedTools wildcard is warned at set time Tools & providers: - ros2 tools honor ROS_DOMAIN_ID; board exec timeout legibility; ros2 sample window is configurable - pi-ai provider threads prompt-cache token usage so cache metrics are observable - base URL validated at set time; resume of a missing session key errors instead of faking "Resuming" - runtime prompt forbids spawning a desktop GUI to "open a terminal" and reasserts no unverified success claims Interaction & docs: - session pickers show a title derived from the first message; in-TUI /help lists /resume - root README rewritten to current capabilities; agent README command lists corrected - 13 new red-before-green specs; readme-accuracy spec pins doc claims to live source Verified: npm run verify green (boundaries + hygiene + build + typecheck + lint + test). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes a broad set of usability/harness capability gaps found by experiencing Moss end-to-end, then fixing with red-before-green tests. Long-horizon multi-session task continuity was verified working hands-on (resume carries full context across separate processes).
What changed
doctornon-zero on failure,--ask-for-approvalvalidation, env-var notice respects log level, clean config-write errors, skill-candidate triviality gate, session-file count cap.paused_resumableafter a worked-around tool error;max_turnstruncation surfaced with how-to-continue; fork-key sub-second collision fixed.trustedToolswildcard warned at set time.ROS_DOMAIN_ID; board exec timeout legibility; ros2 sample window; pi-ai prompt-cache token usage observable; base-URL validation; honest resume-missing-key; runtime prompt forbids GUI "open a terminal" + reasserts no unverified success./helplists/resume; root README rewritten to current capabilities; agent README corrected; 13 new specs + areadme-accuracyspec pinning doc claims to source.Verification
npm run verifygreen (boundaries + hygiene + build + typecheck + lint + test; agent suite 166 files).smoke:moss-cliis blocked only by the local gitignored gateway-secret guard, not a regression.🤖 Generated with Claude Code