You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ROADMAP.md
+12Lines changed: 12 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6410,3 +6410,15 @@ Original filing (2026-04-18): the session emitted `SessionStart hook (completed)
6410
6410
6411
6411
444. **No broad-cwd safety guard for `--resume` — `claw --resume latest` from `/` attempts to `mkdir /.claw/sessions/<fingerprint>/` and is only stopped by the read-only filesystem at root; from any writable system directory (`/tmp`, `/var/tmp`, `$HOME` itself) it silently creates `.claw/sessions/<fingerprint>/` droppings; exit code is 0 (success) on the read-only filesystem error path** — dogfooded 2026-05-11 by Jobdori on `b2048856` in response to Clawhip pinpoint nudge at `1503373639884607629`. Reproduction: `cd / && claw --resume latest --output-format json` returns `{"error":"failed to restore session: Read-only file system (os error 30)","hint":null,"kind":"session_load_failed","type":"error"}` exit **0**. The OS permission denial is the only thing preventing claw from creating `/.claw/sessions/<fingerprint>/` in the root filesystem. Compare with `cd /tmp && claw --resume latest --output-format json`: silently creates `/tmp/.claw/sessions/<fingerprint>/` partition (confirmed by `ls /tmp/.claw` showing a directory from a prior dogfood session at `13:31` — the May 11 11:00 pinpoint #435 dropping is still there 10+ hours later, despite documented cleanup). Same dogfood session: `cd $HOME && claw --resume latest` would silently create `~/.claw/sessions/<fingerprint>/` (the user's home claw config dir). The shorthand prompt path has a broad-cwd guard (`claw is running from a very broad directory (/). The agent can read and search everything under this path. Use --allow-broad-cwd to proceed anyway`) — but the guard does NOT fire on `--resume`, `--status`, or `claw status` invocations. Inconsistent safety surface: the dangerous path (LLM prompt with full tool access) has a guard, but session-management paths that create filesystem artifacts in broad locations have none. **Three sibling findings in same probe:** (a) **exit-code 0 on filesystem error** (`session_load_failed` envelope returns exit code 0): the read-only-filesystem error from `/.claw` creation path is an unrecoverable failure but the process exits 0 — same exit-parity bug as #422/#435; (b) **stale filesystem droppings**: `/tmp/.claw/` from a 13:31 dogfood session at HEAD `6c0c305a` is still present at 21:30 (10 hours later, 6+ HEADs later). The "deferred cleanup" or "lazy creation" fix prescribed in #435 hasn't landed; (c) **broad-cwd guard misfires on resume**: the existing guard from `run` path (visible in `claw --help` as "Use --allow-broad-cwd to proceed anyway") never fires on `--resume`. Either both paths should guard, or the guard should be promoted to a global pre-check. **Required fix shape:** (a) extend the broad-cwd guard to `--resume`, `claw status`, `claw doctor`, and every command that may create filesystem artifacts; `cd / && claw --resume latest` must fail fast with `kind:"broad_cwd_blocked"` before any filesystem operation; (b) `cd $HOME && claw` should warn that the workspace is your home directory and ask for `--allow-broad-cwd` (the LLM with full filesystem access in `$HOME` is the same blast radius as in `/`); (c) exit code 1 for `session_load_failed` regardless of underlying cause; (d) deliver #435's "defer fingerprint directory creation to first successful save" fix — failed `--resume` must not leave filesystem droppings; (e) cleanup `/tmp/.claw/` style scratch-dir artifacts via a `claw doctor --cleanup` or similar opt-in mechanism; (f) regression test: failed `--resume` does not create any directories under cwd. **Why this matters:** users running claw as part of CI/cron from system directories silently accumulate `.claw/sessions/<fingerprint>/` artifacts in /tmp, /var, /opt, $HOME, etc. Running as root from / would (with a writable root) silently pollute the root filesystem. The broad-cwd guard exists but only covers one entry point. Cross-references #427 (broad-cwd guard fires on resume too — actually it doesn't, that note in #427 was inaccurate), #428 (default permission_mode danger-full-access — compounds with this: full access + no broad-cwd guard = serious blast radius), #435 (filesystem side effects on failed resume), #422 (exit-code parity). Source: Jobdori live dogfood, `b2048856`, 2026-05-11.
6412
6412
6413
+
6414
+
445. **Skill name-vs-directory mismatch is silently accepted — `.claw/skills/wrong-name/SKILL.md` with frontmatter `name: actually-different-name` loads as "actually-different-name" without any warning; users who reference the skill by directory name (`claw skills run wrong-name`) get `skill_not_found` while `skills list` shows it under the frontmatter name; sibling: loose `.md` files at the skills-dir root and subdirs without `SKILL.md` are silently dropped** — dogfooded 2026-05-11 by Jobdori on `9e1eafd0` in response to Clawhip pinpoint nudge at `1503381189539528897`. Reproduction: create `.claw/skills/wrong-name/SKILL.md` with frontmatter `---\nname: actually-different-name\ndescription: Skill where dir name and frontmatter name disagree\n---`. Run `claw skills list --output-format json` → the skill is listed with `name: "actually-different-name"` (the frontmatter value), no warning about the dir-vs-name mismatch. Users who type `claw skills run wrong-name` (the dirname they know from `ls`) get a `skill_not_found` error; `claw skills run actually-different-name` works. The two names are decoupled with no surfaced relationship. **Three sibling silent-drop bugs in same probe:** (a) **subdir without SKILL.md silently skipped**: `.claw/skills/no-skill-md/` containing only `README.md` (no `SKILL.md`) is silently skipped from `skills list`. No `invalid_skills:[{path, reason:"missing_SKILL.md"}]` array, no warning, just absent from output. (b) **Loose `.md` at skills dir root silently dropped**: `.claw/skills/loose-skill.md` (not inside a per-skill subdirectory) is silently ignored. Discovery only walks `.claw/skills/*/SKILL.md` — no support for flat `.claw/skills/<name>.md`. (c) **Workspace + user skills merged without per-source filter**: `skills list` returns 74 entries including all `~/.claw/skills/*` user-home skills alongside the project skills. There's no `--scope workspace` flag to limit output to just project-local skills; automation has to filter by `source.id == "project_claw"` post-hoc. **Required fix shape:** (a) when SKILL.md frontmatter `name` differs from the parent directory name, emit a `skills_metadata_drift:[{dir_name, frontmatter_name, path}]` array OR enforce `name = dir_name` as a hard rule; if neither, at minimum a stderr warning on each invocation; (b) skill subdirectories without `SKILL.md` should surface as `invalid_skills:[{path, reason}]` in `skills list --output-format json` (same pattern as #440 MCP servers, #441 hooks, #442 agents); (c) support loose `.md` files at skills-dir root OR document explicitly that only subdirectories with `SKILL.md` are discovered; (d) add `--scope workspace|user|all` flag to `skills list` for filtering; (e) regression test: dir/frontmatter mismatch triggers a deterministic warning or error; subdirs without SKILL.md show in invalid array. **Why this matters:** skill discovery is a security-relevant surface — a user's `claw skills run X` could end up running a different skill than they thought if dir-name and frontmatter-name diverge. The silent drops mean users can't tell why their skill files aren't recognized, leading to "I copied the example and it doesn't work" forum questions. Cross-references #440 (MCP all-or-nothing), #441 (hooks all-or-nothing), #442 (agents need TOML, .md dropped), #431 (skills install raw OS error). Source: Jobdori live dogfood, `9e1eafd0`, 2026-05-11.
6415
+
6416
+
6417
+
446. **Config is loaded 2-3 times per command invocation; each load re-emits identical deprecation warnings without deduplication — `status` triggers 3× `enabledPlugins` warning, `doctor`/`mcp` trigger 2× each, only `version` (config-free) emits 0** — dogfooded 2026-05-11 by Jobdori on `5a4cc506` in response to Clawhip pinpoint nudge at `1503388740595224717`. Reproduction: with a `~/.claw/settings.json` containing the deprecated `enabledPlugins` key, run each command from a fresh empty cwd and count `warning: ... is deprecated` lines on stderr — `claw status 2>&1 >/dev/null | grep -c deprecated` returns **3**, `claw doctor` returns **2**, `claw mcp` returns **2**, `claw version` returns **0**. Each duplicate is byte-identical (same file path, same line number, same field name). The pattern proves the config-load pipeline is invoked 2-3 times within a single command process; warnings are emitted at each load without checking a `warned_files: HashSet<PathBuf>` deduplication set. **Three sibling implications:** (a) **load-count varies by command** — status:3, doctor:2, mcp:2, version:0 — suggesting each command implements its own config-load call rather than going through a shared cached loader; (b) **noise pollution**: users running `claw status` once see the same 64-character warning 3 times in their terminal scrollback, making real warnings (other config errors, real deprecations) lost in the duplicate noise; (c) **performance signal**: 3× config load means 3× JSON parsing of `~/.claw/settings.json`, `~/.claw.json`, `$CLAW_CONFIG_HOME/settings.json`, and the project-local `.claw.json` / `.claw/settings.json` / `.claw/settings.local.json`. For a workspace with 5 config files, that's 15 redundant disk reads per status invocation. Earlier roadmap entries observed 3× (#424) and 4× (#425) warning counts at different HEADs; the count keeps fluctuating, suggesting the underlying issue is config-load fan-out that nobody has refactored. **Required fix shape:** (a) introduce a `ConfigLoader` cache scoped to the command-process lifetime: first load reads files and emits warnings; subsequent calls hit the cache and emit zero warnings; (b) move config validation/warnings to a single canonical entry point (`ConfigLoader::load_with_diagnostics()` returns `(RuntimeConfig, Vec<Warning>)` exactly once); (c) every command that needs config goes through the cached loader instead of re-reading from disk; (d) `doctor --output-format json` exposes `config_load_count:int` field so we can regression-test that loads are deduplicated; (e) regression test: any single command invocation emits each deprecation warning at most once. **Why this matters:** repeated identical warnings train users to ignore stderr noise. Real warnings (a new deprecation, a config error from a different file, an MCP server failure) get drowned out by 3-4 copies of the same notice. The 15-disk-read worst case is wasted I/O that adds startup latency. The fact that count fluctuates between HEADs (3 at `6c0c305a`, 4 at `d7dbe951`, back to 3 at `5a4cc506`) suggests dev velocity is moving config loads around without an architectural fix. Cross-references #424 (deprecation warning 3×), #425 (deprecation warning 4×), #421 (cwd canonicalization — possibly tied to per-load symlink resolution), #428 (default permission_mode loaded from same config files). Source: Jobdori live dogfood, `5a4cc506`, 2026-05-11.
6418
+
6419
+
6420
+
447. **All JSON error envelopes go to STDERR not STDOUT; stdout is empty (0 bytes) on every `--output-format json` failure — breaks the standard automation pattern `output=$(claw cmd --output-format json)` which captures nothing on error and forces ugly `2>&1` redirects to even see the JSON** — dogfooded 2026-05-11 by Jobdori on `5ab969e7` in response to Clawhip pinpoint nudge at `1503396289071808523`. Reproduction (stderr-vs-stdout discipline audit): `claw --no-such-flag --output-format json >stdout.txt 2>stderr.txt` → stdout = **0 bytes**, stderr = 115 bytes containing `{"error":"unknown option: --no-such-flag","hint":"Run \`claw --help\` for usage.","kind":"cli_parse","type":"error"}`. Same pattern across four error envelopes probed: (a) `cli_parse` → stdout 0 / stderr 115; (b) `missing_credentials` → stdout 0 / stderr 853 (includes deprecation warnings ahead of envelope); (c) `session_load_failed` → stdout 0 / stderr 322; (d) `invalid_model_syntax` → stdout 0 / stderr 199. Success paths route correctly: `claw status --output-format json` → stdout 1496 / stderr 0. **The asymmetry is wrong on two axes:** (a) **JSON-format outputs should always go to stdout regardless of success/failure**: every major CLI in this class (kubectl, gh, aws, jq, terraform `-json`, `npm --json`) emits JSON on stdout for both ok and error paths; consumers parse `stdout | jq .kind` and switch on the kind to detect errors. claw's split forces consumers to capture both streams or use `2>&1` which then includes deprecation prose alongside the JSON envelope and breaks parsing. (b) **Deprecation/info warnings leak into the JSON error envelope on stderr**: when stderr is the only path to get the JSON, the deprecation warning prefix (`warning: ... enabledPlugins ... is deprecated`) precedes the JSON, making `tail -1 stderr.txt | jq .` fragile. **Three sibling problems:** (i) **breaks the canonical Bash idiom** `if ! output=$(cmd --output-format json); then echo "$output" | jq .error; fi` — `$output` is empty on error so the `jq` call sees nothing. (ii) **forces N-line stderr parsing**: to get the JSON envelope from stderr, automation must read until EOF, then skip leading `warning:` lines, then parse only the last `{...}` JSON. This is a brittle heuristic that breaks if more warnings are added. (iii) **inconsistent with text mode**: text-mode error output ALSO goes to stderr (e.g., `claw --no-such-flag` → stderr `[error-kind: cli_parse]\nerror: ...`) — that's correct for text mode (stderr is the diagnostic channel). The bug is JSON mode inheriting the same routing. **Required fix shape:** (a) JSON error envelopes go to STDOUT when `--output-format json` is active; (b) keep text-mode error output on stderr (no change for text path); (c) deprecation/info warnings should ALSO go to stderr in JSON mode (they're diagnostic prose, not part of the JSON contract) — separate channels: JSON envelope on stdout, prose warnings on stderr; (d) add `--quiet` / `--no-warn` flag to fully suppress stderr warnings for clean automation; (e) regression test: every `--output-format json` failure path emits the JSON envelope on stdout, exit non-zero, no JSON ever on stderr. **Why this matters:** the entire point of `--output-format json` is enabling automation. Splitting JSON success vs error across stdout vs stderr defeats the purpose — automation must capture both, dedupe sources, and parse mixed streams. Cross-references #422 (exit-code parity across error envelopes), #424 (deprecation warnings noise), #428 (envelope vs prose tension), #446 (multi-load deprecation duplication). Source: Jobdori live dogfood, `5ab969e7`, 2026-05-11.
6421
+
6422
+
6423
+
448. **`sandbox --output-format json` has contradictory state flags — `enabled:true, supported:false, active:false, filesystem_active:true, allowed_mounts:[]`: claim that sandbox is "enabled" while OS doesn't support namespace isolation and `allowed_mounts:[]` is empty contradicts `filesystem_active:true filesystem_mode:"workspace-only"`** — dogfooded 2026-05-11 by Jobdori on `7244a82b` in response to Clawhip pinpoint nudge at `1503403842920779917` (using fresh-current-main runner at `/tmp/claw-dog-1430` per gajae's 14:00 protocol switch). Reproduction: `claw sandbox --output-format json` on macOS (where `unshare` is unavailable) returns `{"active":false,"active_namespace":false,"active_network":false,"allowed_mounts":[],"enabled":true,"fallback_reason":"namespace isolation unavailable (requires Linux with \`unshare\`)","filesystem_active":true,"filesystem_mode":"workspace-only","in_container":false,"kind":"sandbox","markers":[],"requested_namespace":true,"requested_network":false,"supported":false}`. **Three contradictions in the same envelope:** (a) `enabled:true` AND `supported:false`: what does "enabled" mean if the OS doesn't support sandboxing? Read literally, sandbox is *enabled but unsupported* — semantic nonsense. The likely intent is "user requested sandbox in config" but the field name `enabled` says "is ON". A better name would be `requested:true` or `config_intent:true`, with `enabled` reserved for the actually-active state. (b) `filesystem_active:true, filesystem_mode:"workspace-only"` AND `allowed_mounts:[]`: if the filesystem fence is active in workspace-only mode, the workspace directory itself MUST be an allowed mount. An empty `allowed_mounts:[]` array combined with `filesystem_active:true` means either (i) the fence is being misreported (it's not really active), (ii) the workspace is implicit and `allowed_mounts` only lists *additional* mounts, or (iii) the fence has no allowed paths and nothing is readable — all three are inconsistent with the user-facing summary. (c) `active:false` AND `filesystem_active:true`: the top-level `active` field is a single boolean summary, but it disagrees with `filesystem_active:true` (one component is active). Either `active` is "all components active" (then it should be `false` when any component is off) or "any component active" (then it should be `true` when filesystem is). The current value is `false` despite filesystem being active. **Sibling: no `claw sandbox --help`**: `claw sandbox status` and `claw sandbox --help` go to LLM-prompt fallback or hang (gajae confirmed at 13:00 that `sandbox status` returns typed `cli_parse` but `sandbox --help` is bounded — schema is non-uniform across help paths). **Required fix shape:** (a) rename `enabled` to `requested` or `config_intent` to disambiguate from "currently active"; (b) make `allowed_mounts` explicitly include the workspace when filesystem_mode is "workspace-only" (`allowed_mounts:[{path:"<cwd>",writable:true,reason:"workspace_root"}]`); (c) document the `active` aggregate semantics: pick either "all" or "any" composition rule and document the choice; (d) add `active_components:["filesystem"]` array as a richer alternative to the single boolean — surfaces exactly which sandbox subsystems are live; (e) regression test: when `filesystem_mode == "workspace-only"`, `allowed_mounts` MUST contain the cwd and `active` must agree with the documented composition rule. **Why this matters:** sandbox is the trust surface — automation that checks `sandbox.active == true` before running a risky LLM prompt sees `false` (no namespace, no network) and assumes no isolation, but `filesystem_active:true` means there IS partial isolation. The mixed signal forces consumers to OR all `*_active` fields together. Cross-references #428 (default permission_mode=danger-full-access — paired with sandbox-not-active means zero isolation), #444 (no broad-cwd guard — sandbox is the only safety net and its status is unclear). Source: Jobdori live dogfood, `7244a82b`, 2026-05-11.
0 commit comments