fix(onboard): early-validate NEMOCLAW_POLICY_TIER before preflight (#3741)#3788
Conversation
…3741) commands.md promises that "if the value does not match a known tier, onboarding exits with an error listing the valid options." In practice the validation only ran inside `selectPolicyTier()`, which is called after preflight + gateway + inference setup have all completed. A user with `NEMOCLAW_POLICY_TIER=invalid_tier` in their environment got through steps [1/8] preflight, [2/8] gateway, and [3/8] inference before the wizard finally noticed the typo — and a CI pipeline that pre-populates the API key could pass entirely unnoticed. Extract the env-name parsing + validation into `resolvePolicyTierFromEnv` and gate it twice: 1. Early in `onboard()` (right after the existing NEMOCLAW_PROVIDER fail-fast block) when the env var is explicitly set, so an invalid tier exits before any preflight side effects. 2. Inside `selectPolicyTier()` non-interactive branch, so callers that bypass the early gate keep the same contract. The early gate is intentionally a no-op when the env var is unset so the interactive wizard prompt still drives the default flow. Test plan - Added `rejects unknown NEMOCLAW_POLICY_TIER with a clear error and non-zero exit (#3741)` to test/policy-tiers-onboard.test.ts — exits status 1, prints the canonical "Unknown policy tier: invalid_tier. Valid: restricted, balanced, open" to stderr, and never reaches the unreachable marker. - `npx vitest run test/policy-tiers-onboard.test.ts` — 23/23 pass. - Manual: `NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 NEMOCLAW_POLICY_TIER=invalid_tier node bin/nemoclaw.js onboard` — immediate exit with the expected error, no [1/8] preflight output. Signed-off-by: Shawn Xie <shaxie@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
💤 Files with no reviewable changes (3)
📝 WalkthroughWalkthroughCentralizes NEMOCLAW_POLICY_TIER handling in a new helper that normalizes and validates the env var, integrates it into selectPolicyTier(), and invokes an early validator in onboard() to fail fast on non-empty invalid values. Adds an integration test asserting the fast-fail behavior. ChangesEarly NEMOCLAW_POLICY_TIER validation
Sequence Diagram(s)sequenceDiagram
participant Onboard
participant PolicyTierEnv
Onboard->>PolicyTierEnv: validatePolicyTierEnvEarly() (if env set & non-blank)
PolicyTierEnv->>PolicyTierEnv: normalize value, getTier()
PolicyTierEnv-->>Onboard: return normalized tier OR call process.exit(1) with error
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 8039-8052: The large explanatory docblock above the function that
resolves NEMOCLAW_POLICY_TIER should be shortened to a compact summary that
preserves the contract (default "balanced", exits with status 1 on unknown
value, safe/pure aside from process.exit) and references the function name
resolve/validate behavior so callers know its guarantees; move the longer
historical/contextual paragraphs (notes about where it's called, issue numbers,
and extended rationale) into an external docs/ or issues/ entry and remove the
extra lines in this file (also apply the same trimming to the similar block at
the other location reported around the selectPolicyTier/use-site lines). Ensure
the shortened comment still mentions accepted-options behavior and that callers
can rely on early fail-fast semantics.
- Around line 8053-8063: resolvePolicyTierFromEnv() currently trims the env var
then treats an originally-whitespace-only value as empty, bypassing the earlier
explicit-env validation; update the function to detect when
process.env.NEMOCLAW_POLICY_TIER is set but trims to an empty string and treat
that as an explicit invalid value: use the raw env presence check
(process.env.NEMOCLAW_POLICY_TIER !== undefined) combined with the trimmed name
=== "" (or check raw.trim().length === 0) and, when true, log the same "Unknown
policy tier" message (using tiers.listTiers() for valid names) and exit(1); keep
the existing tiers.getTier(name) validation for non-empty trimmed names so all
other behavior is unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b3e7b1a2-b109-4126-be99-9f385463b36b
📒 Files selected for processing (2)
src/lib/onboard.tstest/policy-tiers-onboard.test.ts
Addresses CI feedback on #3788: - onboard-entrypoint-budget check fails on the original commit because src/lib/onboard.ts grew by +39/-7 lines. Per the workflow contract, src/lib/onboard.ts must be net-neutral or smaller; growth belongs under src/lib/onboard/**. - Move resolvePolicyTierFromEnv + a small validatePolicyTierEnvEarly helper into src/lib/onboard/policy-tier-env.ts. onboard.ts now imports them and the entrypoint diff becomes +4/-7 (net -3). - Also addresses CodeRabbit's edge case: a whitespace-only NEMOCLAW_POLICY_TIER (e.g. an accidentally-quoted blank in a CI env) used to bypass the early gate (`.trim() !== ""` is false), and then trip the validator only at the policy-tier wizard step. The new helper treats whitespace-only as unset and falls back to "balanced", matching the gate's intent. Test plan - `npx vitest run test/policy-tiers-onboard.test.ts` — 23/23 pass, including the existing "rejects unknown NEMOCLAW_POLICY_TIER" case. - Manual end-to-end: - `NEMOCLAW_POLICY_TIER=invalid_tier nemoclaw onboard ...` exits with "Unknown policy tier: invalid_tier. Valid: ..." before any preflight output. - `NEMOCLAW_POLICY_TIER=" " nemoclaw onboard ...` proceeds past validation (whitespace = unset = balanced). - `NEMOCLAW_POLICY_TIER=restricted nemoclaw onboard ...` proceeds past validation as before. Signed-off-by: Shawn Xie <shaxie@nvidia.com>
|
Thanks @coderabbitai — both findings addressed in the follow-up commit:
Existing positive/negative cases unchanged:
|
|
Tip For best results, initiate chat on the files or code changes.
All edge cases covered and 23/23 green — this is in great shape. 🐇 ✨ |
PR Review AdvisorFindings: 0 needs attention, 0 worth checking, 0 nice ideas Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
|
🌿 Preview your docs: https://nvidia-preview-pr-3788.docs.buildwithfern.com/nemoclaw |
prekshivyas
left a comment
There was a problem hiding this comment.
Re-reviewed against the current head (7c5b963) — refreshing the earlier approval after the rework + main merge. The net change is small and focused (policy-tier-env.ts + one call in onboard.ts + docs):
validatePolicyTierEnvEarly()runs the check only whenNEMOCLAW_POLICY_TIERis explicitly set to a non-blank value; unset/blank is a no-op that flows through to the interactive default. So it fails fast only on an explicit invalid value (listing valid tiers via the canonicalgetTier/listTiers) and can't break the common unset path.- It's now called at the very top of
onboard()— beforeensureUsageNoticeConsent,getRequestedProviderHint, andacquireOnboardLock— so a typo'd tier errors immediately rather than after consent/preflight/lock. The post-approval reorder (from after-provider-hint to before-consent) is a refinement in the same fail-fast direction, not a change of intent. Main's #4454 FSM refactor is downstream of this pre-FSM gate and doesn't interact; the merge is clean.
policy-tiers-onboard.test.ts covers the early-validate behavior. CI green. Good to merge.
## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
commands.mdpromisesNEMOCLAW_POLICY_TIERvalidation is upfront, but the check only ran insideselectPolicyTier()— which executes after preflight, gateway start, and inference setup have already had side effects.resolvePolicyTierFromEnv()helper and gate it both early inonboard()(mirrors the existingNEMOCLAW_PROVIDERfail-fast block right above) and insideselectPolicyTier().Bug reproduction (pre-fix, current main)
```
NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1
NEMOCLAW_POLICY_TIER=invalid_tier NEMOCLAW_PROVIDER=ollama
NEMOCLAW_MODEL=qwen2.5:0.5b NEMOCLAW_SANDBOX_NAME=tier-test
CHAT_UI_URL=http://127.0.0.1:18789 nemoclaw onboard
[1/8] Preflight checks ← runs anyway
✓ Docker is running … ✓ NVIDIA GPU detected …
[2/8] Starting OpenShell gateway ← runs anyway
[3/8] Configuring inference (NIM) ← runs anyway
✓ Using Ollama on localhost:11434 (proxy on :11435)
Loading Ollama model: qwen2.5:0.5b …
(eventually fails for an unrelated reason — the bogus tier is never surfaced)
```
Behavior post-fix (same command)
```
Unknown policy tier: invalid_tier. Valid: restricted, balanced, open
```
Exit 1. Nothing else printed. No preflight, no gateway probe, no Ollama systemd override.
Test plan
rejects unknown NEMOCLAW_POLICY_TIER with a clear error and non-zero exit (#3741)intest/policy-tiers-onboard.test.ts— uses the existingrunScript/buildPreambleharness, asserts exit code 1, asserts stderr matches the canonical message, asserts the unreachable marker is not printed.npx vitest run test/policy-tiers-onboard.test.ts— 23/23 pass.node bin/nemoclaw.js onboard): verified both directions — invalid tier exits before [1/8] preflight, valid tier (restricted) proceeds normally past validation, no env var defaults tobalancedand proceeds normally.Fixes #3741.
Signed-off-by: Shawn Xie shaxie@nvidia.com
Summary by CodeRabbit
Bug Fixes
Documentation
Tests