Skip to content

Agent capability gaps: long-task continuity, permissions, tool reliability, docs#4

Merged
QiaolongLi1201 merged 2 commits into
mainfrom
codex/agent-capability-gap-fixes
Jun 13, 2026
Merged

Agent capability gaps: long-task continuity, permissions, tool reliability, docs#4
QiaolongLi1201 merged 2 commits into
mainfrom
codex/agent-capability-gap-fixes

Conversation

@QiaolongLi1201

Copy link
Copy Markdown
Collaborator

Summary

Closes a broad set of usability/harness capability gaps found by experiencing Moss end-to-end, then fixing with red-before-green tests. Long-horizon multi-session task continuity was verified working hands-on (resume carries full context across separate processes).

What changed

  • CLI/harness: unknown-subcommand guard (no billable typo), doctor non-zero on failure, --ask-for-approval validation, env-var notice respects log level, clean config-write errors, skill-candidate triviality gate, session-file count cap.
  • Long-horizon tasks: task-frame no longer false-reports paused_resumable after a worked-around tool error; max_turns truncation surfaced with how-to-continue; fork-key sub-second collision fixed.
  • Permissions/safety: device mutations never blanket-trusted by "always"; headless auto-approve audit line; broad trustedTools wildcard warned at set time.
  • Tools/providers: ros2 ROS_DOMAIN_ID; board exec timeout legibility; ros2 sample window; pi-ai prompt-cache token usage observable; base-URL validation; honest resume-missing-key; runtime prompt forbids GUI "open a terminal" + reasserts no unverified success.
  • Interaction/docs: session pickers show a title; in-TUI /help lists /resume; root README rewritten to current capabilities; agent README corrected; 13 new specs + a readme-accuracy spec pinning doc claims to source.

Verification

npm run verify green (boundaries + hygiene + build + typecheck + lint + test; agent suite 166 files). smoke:moss-cli is blocked only by the local gitignored gateway-secret guard, not a regression.

🤖 Generated with Claude Code

d-robotics and others added 2 commits June 13, 2026 00:59
…liability, docs

CLI dispatch & harness hygiene:
- unknown subcommand errors with a "did you mean" suggestion instead of burning an LLM call
- doctor exits non-zero on failures; --ask-for-approval rejects unknown values
- env-var-ignored notice respects --quiet/log level; config-write failures are a clean line, not a Node stack
- skill-candidate triviality gate (skip all-readonly / clarifying-question runs)
- JSONL session-file count cap

Long-horizon tasks:
- task-frame no longer latches paused_resumable after a worked-around tool error (false-incomplete + suppressed skill learning)
- max_turns truncation is surfaced to the human CLI with how-to-continue
- fork keys use sub-second precision to avoid same-second collisions

Permissions & safety:
- device mutations are never blanket-trusted by "always" (next device command re-prompts)
- headless auto-approve leaves a stderr audit line; broad trustedTools wildcard is warned at set time

Tools & providers:
- ros2 tools honor ROS_DOMAIN_ID; board exec timeout legibility; ros2 sample window is configurable
- pi-ai provider threads prompt-cache token usage so cache metrics are observable
- base URL validated at set time; resume of a missing session key errors instead of faking "Resuming"
- runtime prompt forbids spawning a desktop GUI to "open a terminal" and reasserts no unverified success claims

Interaction & docs:
- session pickers show a title derived from the first message; in-TUI /help lists /resume
- root README rewritten to current capabilities; agent README command lists corrected
- 13 new red-before-green specs; readme-accuracy spec pins doc claims to live source

Verified: npm run verify green (boundaries + hygiene + build + typecheck + lint + test).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@QiaolongLi1201 QiaolongLi1201 merged commit 89b376d into main Jun 13, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant