Skip to content

fix(sandbox): preserve mutable OpenClaw config perms after doctor --fix (#4538)#4610

Merged
cv merged 14 commits into
NVIDIA:mainfrom
yimoj:fix/4538-preserve-mutable-openclaw-perms
Jun 5, 2026
Merged

fix(sandbox): preserve mutable OpenClaw config perms after doctor --fix (#4538)#4610
cv merged 14 commits into
NVIDIA:mainfrom
yimoj:fix/4538-preserve-mutable-openclaw-perms

Conversation

@yimoj
Copy link
Copy Markdown
Contributor

@yimoj yimoj commented Jun 1, 2026

Summary

openclaw doctor --fix enforces a single-user 700/600 state layout, which silently breaks NemoClaw's mutable config contract: the gateway UID shares the sandbox group, so /sandbox/.openclaw must stay setgid + group-writable (2770) and openclaw.json group-writable (660), or gateway-side config writes (control-UI toggles, etc.) EACCES. This preserves that contract after NemoClaw's own rebuild repair step and adds a detect/repair path for sandboxes a user already tightened.

Related Issue

Fixes #4538

Changes

  • Add shields.inspectMutableConfigPerms / repairMutableConfigPerms — pure contract logic in src/lib/shields/mutable-config-perms.ts, sandbox-bound wrappers in src/lib/shields/index.ts. Repair reuses unlockAgentConfig and refuses to weaken an active shields-up lock or a corrupt-state sandbox.
  • rebuild re-normalizes mutable perms right after its post-upgrade openclaw doctor --fix step.
  • nemoclaw <name> doctor gains a Config permissions check that detects the 700/600 drift and warns; nemoclaw <name> doctor --fix repairs it without a rebuild. --fix is rejected together with --json so JSON automation can't trigger a silent permission mutation.
  • Document that a sandbox restart self-heals the drift (the startup normalizer already restores the contract), and document the supported gateway write probe (setpriv … --init-groups / gosu gateway, not su).
  • Tests: contract predicates, inspect/repair orchestration, the doctor check rendering, and a shell-level regression that a tree tightened to 700/600 is restored to 2770/660 on startup.

Type of Change

  • Code change with doc updates

Verification

  • npm test passes (all failures isolated to the host umask 027 and the unrelated flaky CLI-dispatch tests; the affected files pass in isolation with plugin deps installed)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • Targeted suites green: src/lib/shields/, src/lib/actions/sandbox/doctor-config-perms.test.ts, test/repro-2681-group-writable.test.ts

Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

  • New Features

    • Added a --fix option to nemoclaw doctor to detect and (when requested) repair OpenClaw sandbox config permission drift; --fix is mutually exclusive with --json.
  • Bug Fixes

    • Doctor and rebuild now detect overly restrictive config permissions, attempt post-restore repair/verification, and surface incomplete steps with remediation hints; sandbox restart re-applies the mutable contract.
  • Documentation

    • Expanded command reference and troubleshooting with usage, warnings, non-rebuild repair flow, and probing guidance (prefer setpriv/gosu).
  • Tests

    • Added unit and regression tests covering detection, repair, verification, unlock/symlink edge cases, and rebuild behavior.

…ix (NVIDIA#4538)

OpenClaw's `openclaw doctor --fix` enforces a single-user 700/600 state
layout, which silently tightens NemoClaw's mutable config contract: the
gateway UID shares the `sandbox` group, so `/sandbox/.openclaw` must stay
setgid + group-writable (2770) and `openclaw.json` group-writable (660),
or gateway-side config writes (control-UI toggles, etc.) EACCES.

- Add `shields.inspectMutableConfigPerms` / `repairMutableConfigPerms`
  (pure logic in `shields/mutable-config-perms.ts`, sandbox-bound wrappers
  in `shields/index.ts`). Repair reuses `unlockAgentConfig` and refuses to
  weaken an active shields-up lock.
- Re-normalize mutable perms in `rebuild` right after its post-upgrade
  `openclaw doctor --fix` step.
- Add a `Config permissions` check to `nemoclaw <name> doctor` that detects
  the 700/600 drift; `nemoclaw <name> doctor --fix` repairs it without a
  rebuild. `--fix` is rejected with `--json` so JSON automation cannot
  trigger a silent permission mutation.
- Note in the startup normalizer and docs that a restart self-heals the
  drift, and document the supported gateway write probe (gosu / setpriv
  --init-groups, not su).

Adds unit coverage for the contract predicates, inspect/repair
orchestration, and the doctor check, plus a shell-level regression that a
tree tightened to 700/600 is restored to 2770/660 on startup.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 1, 2026

This repository limits contributors to 10 open pull requests. Please close or merge existing PRs before opening new ones.

@github-actions github-actions Bot closed this Jun 1, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b5fbfc7b-c627-44cd-b588-df56d08981b3

📥 Commits

Reviewing files that changed from the base of the PR and between eb46838 and 534a58a.

📒 Files selected for processing (5)
  • docs/reference/commands-nemohermes.mdx
  • docs/reference/commands.mdx
  • docs/reference/troubleshooting.mdx
  • scripts/nemoclaw-start.sh
  • src/lib/cli/public-display-defaults.ts
✅ Files skipped from review due to trivial changes (4)
  • docs/reference/commands-nemohermes.mdx
  • src/lib/cli/public-display-defaults.ts
  • scripts/nemoclaw-start.sh
  • docs/reference/commands.mdx
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/reference/troubleshooting.mdx

📝 Walkthrough

Walkthrough

Adds a mutually-exclusive --fix flag to sandbox doctor, implements OpenClaw mutable-config permission inspection and repair, exposes shields wrappers, wires repair into rebuild/post-restore, adds unit and regression tests, and updates docs and script comments describing detection and normalization of tightened 700/600 → expected 2770/660 permissions.

Changes

Mutable Config Permission Detection and Repair

Layer / File(s) Summary
Mutable config permission contract and predicates
src/lib/shields/mutable-config-perms.ts
Defines required directory (setgid + group-writable 2770) and file (660) mode constants, exported types for inspection/repair results, parseStatModeOwner, and strict dir/file predicates.
Shields: unlock helper & wrappers
src/lib/shields/index.ts
Embeds a Python no-symlink-follow unlock script, implements unlockConfigPathsNoSymlinkFollow, replaces inline per-file unlock loop, and exposes inspectMutableConfigPerms / repairMutableConfigPerms.
Shields tests
src/lib/shields/mutable-config-perms.test.ts
Adds predicate/parsing tests and scenario tests for inspection and repair behavior, posture gating, sensitive files, and error paths.
Doctor check factory
src/lib/actions/sandbox/doctor-config-perms.ts
Adds buildConfigPermsCheck (injected inspect/repair) with probe-failure handling, --fix repair + re-inspect flow, and final-status composition.
Doctor unit tests
src/lib/actions/sandbox/doctor-config-perms.test.ts
Comprehensive unit tests covering applies=false, probe errors, intact/tightened scenarios, repair success/failure, skipped repair, and re-inspection failure handling.
CLI flag and doctor orchestration
src/commands/sandbox/doctor.ts, src/lib/actions/sandbox/doctor.ts, src/lib/cli/public-display-defaults.ts
Adds mutually-exclusive --fix flag (exclusive with --json), updates usage/examples, parses --fix in runSandboxDoctor, rejects --fix + --json, wires the config-perms check into the doctor checks, and advertises --fix in CLI metadata.
Rebuild post-upgrade permission repair
src/lib/actions/sandbox/rebuild.ts
After post-restore doctor/seed steps, calls shields.repairMutableConfigPerms and logs skipped/verified/applied/incomplete outcomes; treats unverified permission repair as an incomplete rebuild and prints remediation hints.
Docs, scripts comments, and regression tests
docs/reference/commands.mdx, docs/reference/troubleshooting.mdx, docs/reference/commands-nemohermes.mdx, scripts/lib/sandbox-init.sh, scripts/nemoclaw-start.sh, test/*
Documents --fix behavior, expected permission contract, restart/rebuild recovery notes, verification probe guidance (use setpriv --init-groups/gosu), expands comments about normalization self-healing, adjusts test harness docker fake, and adds regression tests verifying normalization restores setgid/group-writable modes and that unlock refuses symlinked targets.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4155: Touches unlockAgentConfig/shields unlock behavior and overlaps at the same unlock/permission responsibility.
  • NVIDIA/NemoClaw#4707: Edits init_step_down_prefixes() region in sandbox-init scripts overlapping verification/step-down logic.

Suggested labels

fix, Sandbox

Suggested reviewers

  • ericksoa
  • cjagwani

"🐰 A tiny hop, a careful fix,
setgid returns, no permission tricks.
--fix hums softly, scripts stand guard,
symlinks refused, the checks on guard.
Configs restored — the rabbit's glad."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main fix: preserving mutable OpenClaw config permissions after running doctor --fix, directly addressing issue #4538.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@yimoj yimoj reopened this Jun 2, 2026
@yimoj
Copy link
Copy Markdown
Contributor Author

yimoj commented Jun 2, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
docs/reference/troubleshooting.mdx (1)

870-870: ⚡ Quick win

Replace colon with a period or em dash.

This line uses a colon to introduce an explanation rather than a list, which violates the formatting rule that colons should only introduce lists.

📝 Suggested fix
-Do not probe with `su -s /bin/sh gateway ...`: `su` does not initialize the gateway's supplementary groups the same way, so a group-write probe can spuriously report `EACCES` even when the mutable contract is intact.
+Do not probe with `su -s /bin/sh gateway ...`.
+`su` does not initialize the gateway's supplementary groups the same way, so a group-write probe can spuriously report `EACCES` even when the mutable contract is intact.

As per coding guidelines: "Colons should only introduce a list. Flag colons used as general punctuation between clauses."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/reference/troubleshooting.mdx` at line 870, Replace the colon after the
clause "Do not probe with `su -s /bin/sh gateway ...`" with a period or em dash
so the sentence reads as a standalone statement (e.g., "Do not probe with `su -s
/bin/sh gateway ...`. `su` does not initialize..." or use an em dash), ensuring
the colon is not used as general punctuation in the troubleshooting note.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/actions/sandbox/doctor-config-perms.ts`:
- Around line 84-97: The failure detail string currently ignores after.reason
when inspect(sandboxName) throws and after.reason is set; update the
construction of the detail (the return block using before, after, repairResult)
to include after.reason as a fallback before falling back to "unknown" — e.g.,
when fixed is false and after.applies is true with no repairResult.errors,
append or use after.reason (from MutableConfigPermsInspection) so the message
becomes `repair incomplete: ${after.applies ? (after.issues.join("; ") ||
repairResult.errors.join("; ") || after.reason) : repairResult.errors.join("; ")
|| after.reason || "unknown"}` (adjust formatting to match code style),
referencing the inspect call, the after variable, and repairResult to locate the
change.
- Around line 33-38: The inspect(sandboxName) call currently swallows exceptions
and returns null; change the catch to capture the thrown error (e) and return a
diagnostic check object indicating a warning/failure with the error message so
the probe is surfaced instead of skipped—e.g., in the catch for
inspect(sandboxName) create and return a Check/Inspection result (matching the
surrounding return type used by doctor-config-perms) with status "warn" or
"fail" and include e.message (and e.stack if available) in the details; keep
using MutableConfigPermsInspection/inspect names to locate the code path to
modify.

In `@src/lib/actions/sandbox/rebuild.ts`:
- Around line 853-861: The current handling of
shields.repairMutableConfigPerms(sandboxName) logs skipped or incomplete repairs
but still allows the rebuild to succeed; change the control flow in the
surrounding rebuild logic (where permRepair is computed) so that if
permRepair.applied === false OR permRepair.verified === false you treat the
rebuild as failed/incomplete: surface an explicit error to the user (using
log/console.error) and return or set the overall rebuild result to an error
state instead of continuing to success. Update any result/exit code or the
function that returns the rebuild outcome so callers see a non-success status
when repairMutableConfigPerms does not verify successfully (refer to
permRepair.applied, permRepair.verified, and permRepair.errors and the
surrounding rebuild function that currently prints success).

In `@src/lib/shields/mutable-config-perms.ts`:
- Around line 61-77: The predicates are too permissive; instead of testing
selected bits they must validate the full normalized mode equals the exact
contract (dir: 02770 / octal 0o2770, file: 00660 / octal 0o0660). Normalize the
input with mode.padStart(4,'0') (or parse to an int with
Number.parseInt(mode.padStart(4,'0'),8)) and then compare against the exact
expected value in dirSatisfiesMutableContract and fileSatisfiesMutableContract
(use '2770'/'0660' or numeric 0o2770/0o0660) so only exact matches pass.

---

Nitpick comments:
In `@docs/reference/troubleshooting.mdx`:
- Line 870: Replace the colon after the clause "Do not probe with `su -s /bin/sh
gateway ...`" with a period or em dash so the sentence reads as a standalone
statement (e.g., "Do not probe with `su -s /bin/sh gateway ...`. `su` does not
initialize..." or use an em dash), ensuring the colon is not used as general
punctuation in the troubleshooting note.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d85c2253-5f6a-4178-87f8-ce03299953ca

📥 Commits

Reviewing files that changed from the base of the PR and between ef1d4af and ef73178.

📒 Files selected for processing (14)
  • docs/reference/commands.mdx
  • docs/reference/troubleshooting.mdx
  • scripts/lib/sandbox-init.sh
  • scripts/nemoclaw-start.sh
  • src/commands/sandbox/doctor.ts
  • src/lib/actions/sandbox/doctor-config-perms.test.ts
  • src/lib/actions/sandbox/doctor-config-perms.ts
  • src/lib/actions/sandbox/doctor.ts
  • src/lib/actions/sandbox/rebuild.ts
  • src/lib/cli/public-display-defaults.ts
  • src/lib/shields/index.ts
  • src/lib/shields/mutable-config-perms.test.ts
  • src/lib/shields/mutable-config-perms.ts
  • test/repro-2681-group-writable.test.ts

Comment thread src/lib/actions/sandbox/doctor-config-perms.ts
Comment thread src/lib/actions/sandbox/doctor-config-perms.ts Outdated
Comment thread src/lib/actions/sandbox/rebuild.ts Outdated
Comment thread src/lib/shields/mutable-config-perms.ts
yimoj added 2 commits June 2, 2026 02:50
…utable-openclaw-perms

# Conflicts:
#	src/lib/shields/index.ts
…#4538)

- Validate the exact 2770/660 contract in the mutable-config predicates so
  owner-tightened (2670) and world-widened (2777/666) modes are flagged, not
  just selected group bits (CodeRabbit).
- doctor: surface an unexpected inspection-probe exception as a warn check
  instead of silently dropping the check; preserve the re-inspection failure
  reason in the repair-incomplete detail (CodeRabbit).
- rebuild: run the mutable-permission repair LAST in the OpenClaw post-restore
  sequence — after the WeChat seed helper, which atomically rewrites
  openclaw.json 0600 and would otherwise undo the just-verified repair (Codex).
- rebuild: when the final result is incomplete, report a failed state restore
  and an unverified permission repair independently so neither recovery hint is
  lost when both occur (Codex).

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@yimoj
Copy link
Copy Markdown
Contributor Author

yimoj commented Jun 2, 2026

Pushed merge with upstream/main (resolved the shields/index.ts import conflict) plus fixes for all review feedback:

  • Exact contract validationdir/fileSatisfiesMutableContract now require the full normalized 2770/660, so owner-tightened (2670) and world-widened (2777/666) modes are flagged, not just group bits.
  • doctor probe — an unexpected inspection exception now renders a warn check instead of silently disappearing; the re-inspection failure reason is preserved in the repair incomplete detail.
  • rebuild ordering — the mutable-permission repair now runs after the WeChat seed helper (which atomically rewrites openclaw.json 0600), so the verified repair is no longer undone.
  • rebuild summary — a failed state restore and an unverified permission repair are now reported independently, so neither recovery hint is lost when both occur.

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

@yimoj Thanks for the thorough update! The fixes look well-considered — particularly tightening the contract predicates to exact 2770/660 (catching both under-permissioned and over-permissioned modes), surfacing inspection exceptions as warn rather than swallowing them silently, and fixing the rebuild ordering so the post-WeChat-seed repair isn't immediately undone. Triggering a full review now.

(´• ω •`) ♡

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/actions/sandbox/rebuild.ts`:
- Around line 886-899: Wrap the call to
shields.repairMutableConfigPerms(sandboxName) in a try/catch so unexpected
throws from validateName or getShieldsPosture (which may run
recoverExpiredAutoRestoreInline) do not abort the rebuild; on catch set
mutablePermsRepairUnverified = true and emit a warning/error log that includes
the caught error message/stack, then continue (do not rethrow). Keep existing
success/verified/verified=false handling for the normal permRepair return value
(repairMutableConfigPermsCore / unlockAgentConfig remain unchanged).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6e1277ae-e963-4508-a222-d429394bc960

📥 Commits

Reviewing files that changed from the base of the PR and between ef73178 and fe70a38.

📒 Files selected for processing (10)
  • docs/reference/commands.mdx
  • docs/reference/troubleshooting.mdx
  • scripts/nemoclaw-start.sh
  • src/lib/actions/sandbox/doctor-config-perms.test.ts
  • src/lib/actions/sandbox/doctor-config-perms.ts
  • src/lib/actions/sandbox/rebuild.ts
  • src/lib/shields/index.ts
  • src/lib/shields/mutable-config-perms.test.ts
  • src/lib/shields/mutable-config-perms.ts
  • test/repro-2681-group-writable.test.ts
💤 Files with no reviewable changes (1)
  • test/repro-2681-group-writable.test.ts
✅ Files skipped from review due to trivial changes (2)
  • scripts/nemoclaw-start.sh
  • docs/reference/commands.mdx
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/lib/shields/index.ts
  • docs/reference/troubleshooting.mdx
  • src/lib/shields/mutable-config-perms.ts
  • src/lib/shields/mutable-config-perms.test.ts

Comment thread src/lib/actions/sandbox/rebuild.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/reference/commands.mdx`:
- Around line 484-491: Update the doctor command synopsis block and flag table
so the example is copyable and accurately shows the flags are mutually
exclusive: replace the console code fence (with `$` prompt) with a bash code
fence containing "nemoclaw my-assistant doctor [--json | --fix]" and update the
`--fix` flag description to append "Mutually exclusive with `--json`"; ensure
this change targets the synopsis and the table entries for the doctor command
(the doctor command example and the `--json`/`--fix` flag rows).

In `@docs/reference/troubleshooting.mdx`:
- Around line 855-868: Change the two "console" fenced blocks to copyable shell
blocks by switching the fences from ```console to ```bash (or ```sh) and remove
the leading "$" prompt characters; specifically update the "nemoclaw <sandbox>
doctor --fix" block and the verification block that uses setpriv/gosu so the
commands are plain, copyable lines (e.g., use "nemoclaw <sandbox> doctor --fix",
"setpriv --reuid=... -- sh -c 'echo ok >> /sandbox/.openclaw/openclaw.json'",
and the gosu variant) and ensure the file path /sandbox/.openclaw/openclaw.json
and the utilities setpriv/gosu are preserved as shown.

In `@src/lib/actions/sandbox/rebuild.ts`:
- Around line 888-891: The current branch treats every permRepair.applied ===
false as a benign skip; update the handling in the rebuild flow so that after
calling repairMutableConfigPerms() you check permRepair.reason and only
log-and-continue for the explicit "shields-up" (intentional locked) reason,
while any other non-applied reasons should mark the rebuild as incomplete (e.g.,
call the existing downgrade/fail path or set the rebuild status to failure).
Concretely, in the block that inspects permRepair from
repairMutableConfigPerms(), replace the unconditional no-op/log with a
conditional that compares permRepair.reason to the explicit shields-up sentinel
used by repairMutableConfigPerms(), logging and continuing only for that
sentinel and invoking the downgrade/error path for all other reasons so the
mutable OpenClaw contract restoration is treated as incomplete.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ec644f24-c385-45ac-b652-41c5055e0eaf

📥 Commits

Reviewing files that changed from the base of the PR and between 96946a1 and fe70a38.

📒 Files selected for processing (14)
  • docs/reference/commands.mdx
  • docs/reference/troubleshooting.mdx
  • scripts/lib/sandbox-init.sh
  • scripts/nemoclaw-start.sh
  • src/commands/sandbox/doctor.ts
  • src/lib/actions/sandbox/doctor-config-perms.test.ts
  • src/lib/actions/sandbox/doctor-config-perms.ts
  • src/lib/actions/sandbox/doctor.ts
  • src/lib/actions/sandbox/rebuild.ts
  • src/lib/cli/public-display-defaults.ts
  • src/lib/shields/index.ts
  • src/lib/shields/mutable-config-perms.test.ts
  • src/lib/shields/mutable-config-perms.ts
  • test/repro-2681-group-writable.test.ts

Comment thread docs/reference/commands.mdx Outdated
Comment thread docs/reference/troubleshooting.mdx Outdated
Comment thread src/lib/actions/sandbox/rebuild.ts Outdated
- rebuild: guard `shields.repairMutableConfigPerms()` in try/catch so an
  unexpected throw from the wrapper (validateName / getShieldsPosture inline
  auto-restore) cannot abort the rebuild — treat it as unverified and continue.
- rebuild: distinguish skip reasons — a "locked" (shields-up) skip stays benign,
  but an "unreadable" shields-state skip means the contract may still be broken,
  so it now downgrades the rebuild result. Add a `skipReason` discriminant to
  MutableConfigRepairResult to avoid brittle reason-string matching.
- docs: make the `doctor` synopsis copyable (`bash` fence, no `$`) and show
  `--json | --fix` are mutually exclusive; switch the troubleshooting repair and
  gateway-probe blocks to copyable `bash` fences.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@yimoj
Copy link
Copy Markdown
Contributor Author

yimoj commented Jun 2, 2026

Addressed all four review threads in commit 07c1f54ab (replied inline and resolved each):

  • rebuild — guard against wrapper throws: shields.repairMutableConfigPerms() is now wrapped in try/catch; an unexpected throw is treated as unverified and the rebuild continues.
  • rebuild — skip-reason handling: added a skipReason discriminant; only the intentional shields-up (locked) skip is benign, while an unreadable corrupt-shields-state skip now downgrades the rebuild.
  • docs (commands): doctor synopsis is a copyable bash block showing [--json | --fix] are mutually exclusive.
  • docs (troubleshooting): repair and setpriv/gosu probe blocks switched to copyable bash fences.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

@yimoj Thanks for the follow-up — all four changes look solid:

  • Rebuild guard: wrapping shields.repairMutableConfigPerms() in try/catch and treating unexpected throws as unverified keeps the rebuild from hard-failing on a permission-repair edge case, which is the right failure mode here.
  • skipReason discriminant: distinguishing the intentional locked skip from the unreadable corrupt-state skip is an important correctness improvement — silently treating a corrupt shields state as benign would have hidden real problems.
  • Docs copyable blocks: switching both the doctor synopsis and the setpriv/gosu probe to fenced bash blocks makes them immediately copy-pasteable, which is a nice UX improvement for an operations-focused doc.

Triggering the review now.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@yimoj yimoj added the v0.0.57 Release target label Jun 2, 2026
@prekshivyas
Copy link
Copy Markdown
Contributor

@yimoj thanks for the upstream merge + review fixes here. Heads-up: it's still showing CONFLICTING against main as of the latest head — looks like a fresh conflict landed after your merge (main's been moving quickly during the v0.0.57 push). Mind re-syncing once more? Happy to review as soon as it's clean. Thanks!

@cv cv added v0.0.58 Release target and removed v0.0.57 Release target labels Jun 3, 2026
…utable-openclaw-perms

# Conflicts:
#	docs/reference/commands.mdx
#	src/lib/actions/sandbox/doctor.ts
@yimoj
Copy link
Copy Markdown
Contributor Author

yimoj commented Jun 3, 2026

@prekshivyas resolved.

@wscurran wscurran added area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery area: security Security controls, permissions, secrets, or hardening bug-fix PR fixes a bug or regression labels Jun 3, 2026
@wscurran
Copy link
Copy Markdown
Contributor

wscurran commented Jun 3, 2026

…utable-openclaw-perms

# Conflicts:
#	src/commands/sandbox/doctor.ts
cjagwani added a commit to kagura-agent/NemoClaw that referenced this pull request Jun 3, 2026
…4628 review)

Address @cv's PR NVIDIA#4628 review feedback: the 700/600 heal was too generic.
If a future caller routes a mutable-sandbox OpenClaw config dir (which
uses the 2770/660 contract per NVIDIA#4538 / PR NVIDIA#4610) through
`ensureConfigDir`, the walker would silently normalize it to 700/600 —
exactly the NVIDIA#4538 EACCES regression.

Add three predicates:
- `hostNemoclawDir()` — canonical `${HOME}/.nemoclaw`
- `isHostNemoclawRoot(dirPath)` — gate for the sibling sweep
- `isMutableSandboxConfigPath(targetPath)` — gate for the file-level heal

Wire them through:
- `ensureConfigDir`: skip dir-mode chmod when path is mutable-sandbox.
  Run `healRootLevelFiles` only when path IS the host nemoclaw root.
- `readConfigFile`: skip per-file chmod when path is mutable-sandbox.

Tests:
- 2 new scope-boundary tests (negative: arbitrary dir, no heal;
  positive: host nemoclaw root, heal fires).
- Existing heal tests (multi-sibling + symlink positive control)
  updated to use the new `withHome` helper so they place files under
  `<HOME>/.nemoclaw` instead of an arbitrary tmp dir — preserves the
  NVIDIA#4546 acceptance while respecting the new scope boundary.

18/18 tests pass.

Co-authored-by: kagura-agent <kagura.agent.ai@gmail.com>
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@cv cv added v0.0.59 Release target and removed v0.0.58 Release target labels Jun 4, 2026
Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@yimoj
Copy link
Copy Markdown
Contributor Author

yimoj commented Jun 4, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Reviews resumed.

yimoj and others added 4 commits June 4, 2026 05:07
Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
NVIDIA#4538)

Two acceptance gaps from NVIDIA#4538 not closed by the original PR:

- Troubleshooting: contrast the three relevant perm states (mutable
  default 2770/660, shields-up locked 444/755 root, the 700/600 drift)
  so issue ask NVIDIA#3, "docs cover both mutable and locked-down", is
  actually answered. Prior copy only documented mutable.

- mutable-config-perms: explain why NemoClaw uses 2770/660 vs the
  issue's expected 2775/664 (gateway shares the sandbox group, so the
  "other" bit is intentionally dropped). The predicate test already
  rejects 664; the rationale belonged in code.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
…VIDIA#4538)

Third acceptance gap from NVIDIA#4538 audit: inspect/repair asymmetry.

`unlockAgentConfig` (the repair callback) re-permissions both
`openclaw.json` AND the sensitive-file set (`.config-hash`, etc.).
`inspectMutableConfigPerms` only stat'd dir + main config, so the
user-facing `nemoclaw <name> doctor` reported "Config permissions OK"
even when `.config-hash` had drifted to `644 root:root`. Running
`--fix` would then silently mutate a file the check never named.

- Add optional `sensitiveFiles` to `MutableConfigTarget`.
- Iterate sensitive files in inspect, applying the same file
  contract; tolerate missing files (e.g. `.config-hash` before the
  first shields-up cycle).
- `resolveAgentConfig` already populates `sensitiveFiles`, so the
  wrapper in `shields/index.ts` needs no change — structural typing
  carries the field through.
- Two new tests: drift detection + missing-file tolerance.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@cjagwani cjagwani requested a review from cv June 4, 2026 20:51
@cv cv added v0.0.60 Release target and removed v0.0.59 Release target labels Jun 4, 2026
Copy link
Copy Markdown
Contributor

@cjagwani cjagwani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Pushed two follow-ups to close the audit gaps: a docs paragraph on the three perm states (mutable 2770/660, shields-up 444/755 root, the 700/600 drift) plus a comment on why 660 not 664, and a fix so inspect actually catches .config-hash drift since --fix was silently mutating it. Two new tests cover the inspect side.

Verified on a brev T4 box: 2770/660 fresh, drifts to 700/600 after openclaw doctor --fix, comes back to 2770/660 after nemoclaw doctor --fix.

@cv cv merged commit 9eb020e into NVIDIA:main Jun 5, 2026
29 checks passed
miyoungc added a commit that referenced this pull request Jun 6, 2026
## Summary
- Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the
dev announcement from discussion #4877.
- Fills the source-doc gaps found during release-prep review across
inference, policy tiers, command behavior, security boundaries, Hermes
dashboard/tooling, runtime context, and troubleshooting.
- Refreshes generated agent skills under `.agents/skills/` from the
current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`.

## Source summary
- #4037 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
system-only runtime context that stays out of visible chat.
- #4875 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
try-first sandbox network/filesystem guidance and clearer failure
classification.
- #4788 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents shared OpenClaw
device-approval policy for startup and connect.
- #4768 -> `docs/reference/network-policies.mdx`,
`docs/network-policy/integration-policy-examples.mdx`,
`docs/get-started/quickstart.mdx`,
`docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`:
Documents `weather`, `public-reference`, and Hermes managed-tool gateway
preset behavior.
- #3788 and #4864 -> `docs/reference/network-policies.mdx`,
`docs/reference/commands.mdx`: Documents non-interactive policy-tier
fail-fast behavior and interactive prompt fallback.
- #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware
default sandbox resolution for `list`, `status`, and `tunnel` commands.
- #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel
status` behavior.
- #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy
preset descriptions in `policy-list`.
- #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents
package-managed OpenShell gateway service and Docker-driver
gateway-marker behavior.
- #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent
gateway/dashboard cleanup isolation by sandbox name and port.
- #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU
patch rollback behavior.
- #4610 -> `docs/reference/troubleshooting.mdx`,
`docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission
guidance aligned and removes skipped experimental wording.
- #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling
for custom `onboard --from <Dockerfile>` contexts in generated skills.
- #4870 -> `docs/reference/commands.mdx`,
`docs/manage-sandboxes/runtime-controls.mdx`: Documents
`NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage.
- #4641 -> `docs/inference/inference-options.mdx`,
`docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM
platform-digest pulls and served-model id adoption.
- #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents
stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash
coverage.
- #4852 -> `docs/inference/use-local-inference.mdx`,
`docs/reference/troubleshooting.mdx`: Documents Ollama model fit
filtering, 16K context floor, cold-load retry, and failed-model
exclusion.
- #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents
API-family sync, Hermes `api_mode`, and Bedrock Runtime exception.
- #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents
Nemotron managed-inference native tool-search fallback.
- #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents
interactive multimodal input prompting.
- #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass
normalization in generated troubleshooting coverage.
- #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents
prebuilt Hermes dashboard assets and TUI recovery without runtime
rebuilds.
- #4854 -> `docs/inference/switch-inference-providers.mdx`,
`docs/reference/commands.mdx`: Documents Hermes proxy API-key
placeholder preservation during inference switches.
- #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`.agents/skills/`: Keeps messaging enrollment behavior aligned with
manifest-hook implementation.
- #4771 -> `docs/security/best-practices.mdx`,
`docs/security/credential-storage.mdx`: Documents Hermes
placeholder-only secret boundary for sandbox-visible runtime files.
- #4787 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents expanded memory scanner
examples for OpenAI project keys and Slack app-level tokens.
- #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill
install mirroring into the agent home directory.
- #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep
structure and generated `.agents/skills/` refresh as the template for
this release.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/
--prefix nemoclaw-user --doc-platform fern-mdx --dry-run`
- `npm run docs`
- `git diff --check`
- skip-term scan across `docs/`, `.agents/skills/`, and `skills/`
- `npm run build:cli`
- `npm run typecheck:cli`
- Commit and pre-push hook suites, including markdownlint, gitleaks,
env-var docs gate, docs-to-skills verification, and skills YAML tests

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* DeepSeek-V4-Flash now available as default inference model for DGX
Station.
* Hermes dashboard improved with dedicated port and OAuth-authenticated
tool gateway selection.
* Added weather and public-reference policy presets for expanded agent
capabilities.
* Enhanced Ollama model selection with GPU memory filtering and
automatic retry for timeouts.

* **Bug Fixes**
  * Improved policy tier validation to prevent invalid configurations.
* Better sandbox cleanup scoping by port to prevent conflicts across
deployments.
  * Added GPU patch failure recovery with automatic rollback.

* **Documentation**
* Expanded troubleshooting guides for inference, security, and sandbox
lifecycle.
  * Added .dockerignore best practices for custom deployments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery area: security Security controls, permissions, secrets, or hardening bug-fix PR fixes a bug or regression v0.0.60 Release target

Projects

None yet

5 participants