From 6b753bf4ebf69d157170c477b96a9ae8daa19b16 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Wed, 27 May 2026 15:18:22 -0700 Subject: [PATCH 01/26] docs(onboard): document FSM migration target Signed-off-by: Carlos Villela --- src/lib/onboard/machine/README.md | 111 ++++++++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 src/lib/onboard/machine/README.md diff --git a/src/lib/onboard/machine/README.md b/src/lib/onboard/machine/README.md new file mode 100644 index 0000000000..752e8959f4 --- /dev/null +++ b/src/lib/onboard/machine/README.md @@ -0,0 +1,111 @@ + + + +# Onboard finite-state machine + +This directory contains the transitional onboarding finite-state-machine (FSM) layer. The current implementation records coarse state snapshots and emits machine events while the legacy `src/lib/onboard.ts` entrypoint is split into explicit state handlers. + +## Target architecture + +The target shape is a machine-driven onboarding runner: + +1. Normalize CLI flags, environment, session locking, and consent in `src/lib/onboard.ts`. +2. Build an onboarding context that contains sanitized operator choices, runtime dependencies, and mutable values returned by states. +3. Enter `runOnboardMachine(context)`. +4. Dispatch the current machine state to a handler. +5. Let the handler return an explicit state result such as advance, retry, branch, complete, or failed. +6. Apply the result through `OnboardRuntime`, which validates the transition, updates the persisted session snapshot, and emits redacted machine events. +7. Continue until the machine reaches `complete` or `failed`. + +In that final shape, `src/lib/onboard.ts` should be a thin entrypoint. State handlers should own state-specific prompts, resume validation, repair decisions, and side effects. + +## State ownership + +Machine states are coarse user-visible onboarding phases, not every subprocess or probe inside a phase. The current vocabulary is intentionally limited to major boundaries: + +- `init` +- `preflight` +- `gateway` +- `provider_selection` +- `inference` +- `sandbox` +- `openclaw` or `agent_setup` +- `policies` +- `finalizing` +- `post_verify` +- `complete` or `failed` + +A state handler may perform many smaller operations, but it should expose only stable, redacted state transitions and context updates to the FSM. + +## Session steps versus machine state + +The persisted onboarding session still tracks step-level progress for resumability. Step recording is older than the FSM and is currently used as a compatibility bridge. + +Long term: + +- `OnboardRuntime` should own machine transitions and machine revision increments. +- Session step helpers should record only step status (`pending`, `in_progress`, `complete`, `failed`, `skipped`). +- State handlers should return explicit results instead of implicitly moving the machine by calling step helpers. + +Until that migration completes, step helpers may still infer machine snapshots for compatibility with older sessions and tests. + +## Handler contract + +Each state handler should eventually follow this shape: + +```ts +type OnboardStateHandler = (context: OnboardContext) => Promise; +``` + +A handler should: + +- validate whether the state can be resumed or skipped; +- run state-local repairs before declaring a cached step reusable; +- perform the phase side effects; +- return the next state explicitly; +- keep secrets out of returned metadata and event context. + +A handler should not: + +- mutate the machine snapshot directly; +- jump to states outside the declared transition graph; +- rely on console output as the only observable diagnostic; +- store raw credentials, provider URLs with secrets, or other sensitive values in machine context. + +## Runtime responsibilities + +`OnboardRuntime` is the intended authority for: + +- validating transitions against `transitions.ts`; +- applying safe session context updates; +- marking terminal states; +- emitting redacted lifecycle, state, repair, resume-conflict, and hook events; +- preserving compatibility with normalized older sessions. + +The runtime should reject invalid transitions before they can be persisted. + +## Event semantics + +Machine events are diagnostics and automation hooks. They must be safe to write to JSONL logs and attach to CI/E2E artifacts. + +Event payloads should include only stable, redacted context such as: + +- selected agent; +- sandbox name; +- provider and model names; +- endpoint origin, not full secret-bearing URLs; +- credential environment variable name, not credential value; +- policy presets and messaging channel names. + +Observers and hooks must not change onboarding behavior. A failing hook should emit hook failure diagnostics and let onboarding continue. + +## Migration stages + +The FSM migration is considered complete when: + +1. state metadata is defined once and derived by session, event, progress, and transition code; +2. live onboarding emits `onboard.started`, `onboard.resumed`, `resume.conflict`, terminal, state, skip, repair, and context events consistently; +3. handlers return explicit state results; +4. the runner applies all handler results through `OnboardRuntime`; +5. step helpers no longer implicitly own machine transitions; +6. `src/lib/onboard.ts` contains entrypoint setup and dependency wiring rather than state sequencing. From fb1b32d0a8725934d3c49e77ca375abdcedf2c81 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Wed, 27 May 2026 15:19:42 -0700 Subject: [PATCH 02/26] refactor(onboard): centralize machine state metadata Signed-off-by: Carlos Villela --- src/lib/onboard/machine/definition.test.ts | 85 ++++++++++++++++ src/lib/onboard/machine/definition.ts | 108 +++++++++++++++++++++ src/lib/onboard/machine/types.ts | 43 +++----- 3 files changed, 208 insertions(+), 28 deletions(-) create mode 100644 src/lib/onboard/machine/definition.test.ts create mode 100644 src/lib/onboard/machine/definition.ts diff --git a/src/lib/onboard/machine/definition.test.ts b/src/lib/onboard/machine/definition.test.ts new file mode 100644 index 0000000000..7fffa49ec5 --- /dev/null +++ b/src/lib/onboard/machine/definition.test.ts @@ -0,0 +1,85 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import { describe, expect, it } from "vitest"; + +import { + getOnboardMachineStateDefinition, + ONBOARD_MACHINE_NON_TERMINAL_STATE_IDS, + ONBOARD_MACHINE_STATE_DEFINITIONS, + ONBOARD_MACHINE_STATE_IDS, + ONBOARD_MACHINE_TERMINAL_STATE_IDS, +} from "./definition"; + +const expectedStateOrder = [ + "init", + "preflight", + "gateway", + "provider_selection", + "inference", + "sandbox", + "agent_setup", + "openclaw", + "policies", + "finalizing", + "post_verify", + "complete", + "failed", +]; + +describe("onboard machine definition", () => { + it("is the canonical ordered state catalog", () => { + expect(ONBOARD_MACHINE_STATE_IDS).toEqual(expectedStateOrder); + expect(ONBOARD_MACHINE_STATE_DEFINITIONS.map((definition) => definition.state)).toEqual( + expectedStateOrder, + ); + }); + + it("derives terminal and non-terminal state catalogs from the same vocabulary", () => { + const terminalFromDefinitions = ONBOARD_MACHINE_STATE_DEFINITIONS.filter( + (definition) => definition.terminal, + ).map((definition) => definition.state); + const nonTerminalFromDefinitions = ONBOARD_MACHINE_STATE_DEFINITIONS.filter( + (definition) => !definition.terminal, + ).map((definition) => definition.state); + + expect(ONBOARD_MACHINE_TERMINAL_STATE_IDS).toEqual(terminalFromDefinitions); + expect(ONBOARD_MACHINE_NON_TERMINAL_STATE_IDS).toEqual(nonTerminalFromDefinitions); + }); + + it("keeps resumable step names unique", () => { + const stepNames = ONBOARD_MACHINE_STATE_DEFINITIONS.flatMap((definition) => + "stepName" in definition ? [definition.stepName] : [], + ); + + expect(new Set(stepNames).size).toBe(stepNames.length); + expect(stepNames).toEqual([ + "preflight", + "gateway", + "provider_selection", + "inference", + "sandbox", + "agent_setup", + "openclaw", + "policies", + ]); + }); + + it("keeps progress metadata attached only to state-backed steps", () => { + for (const definition of ONBOARD_MACHINE_STATE_DEFINITIONS) { + if (!("progress" in definition)) continue; + expect("stepName" in definition).toBe(true); + expect(definition.progress.total).toBe(8); + expect(definition.progress.number).toBeGreaterThanOrEqual(1); + expect(definition.progress.number).toBeLessThanOrEqual(definition.progress.total); + expect(definition.progress.title).not.toHaveLength(0); + } + }); + + it("looks up definitions by state", () => { + expect(getOnboardMachineStateDefinition("gateway")).toMatchObject({ + state: "gateway", + stepName: "gateway", + }); + }); +}); diff --git a/src/lib/onboard/machine/definition.ts b/src/lib/onboard/machine/definition.ts new file mode 100644 index 0000000000..0f873edf0b --- /dev/null +++ b/src/lib/onboard/machine/definition.ts @@ -0,0 +1,108 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +/** + * Canonical metadata for the coarse onboard finite-state machine. + * + * Keep this file free of imports from the rest of the machine package so the + * core state vocabulary can be reused by type, transition, event, session, and + * progress helpers without introducing circular dependencies. + */ + +export const ONBOARD_MACHINE_STATE_DEFINITIONS = [ + { state: "init", terminal: false }, + { + state: "preflight", + terminal: false, + stepName: "preflight", + progress: { number: 1, total: 8, title: "Preflight checks" }, + }, + { + state: "gateway", + terminal: false, + stepName: "gateway", + progress: { number: 2, total: 8, title: "Starting OpenShell gateway" }, + }, + { + state: "provider_selection", + terminal: false, + stepName: "provider_selection", + progress: { number: 3, total: 8, title: "Configuring inference (NIM)" }, + }, + { + state: "inference", + terminal: false, + stepName: "inference", + progress: { number: 4, total: 8, title: "Setting up inference provider" }, + }, + { + state: "sandbox", + terminal: false, + stepName: "sandbox", + progress: { number: 6, total: 8, title: "Creating sandbox" }, + }, + { + state: "agent_setup", + terminal: false, + stepName: "agent_setup", + progress: { number: 7, total: 8, title: "Setting up agent inside sandbox" }, + }, + { + state: "openclaw", + terminal: false, + stepName: "openclaw", + progress: { number: 7, total: 8, title: "Setting up agent inside sandbox" }, + }, + { + state: "policies", + terminal: false, + stepName: "policies", + progress: { number: 8, total: 8, title: "Policy presets" }, + }, + { state: "finalizing", terminal: false }, + { state: "post_verify", terminal: false }, + { state: "complete", terminal: true }, + { state: "failed", terminal: true }, +] as const; + +export const ONBOARD_MACHINE_STATE_IDS = ONBOARD_MACHINE_STATE_DEFINITIONS.map( + (definition) => definition.state, +) as readonly OnboardMachineStateId[]; + +export const ONBOARD_MACHINE_TERMINAL_STATE_IDS = ["complete", "failed"] as const; + +export type OnboardTerminalMachineStateId = (typeof ONBOARD_MACHINE_TERMINAL_STATE_IDS)[number]; + +export type OnboardMachineStateId = (typeof ONBOARD_MACHINE_STATE_DEFINITIONS)[number]["state"]; + +export type OnboardNonTerminalMachineStateId = Exclude< + OnboardMachineStateId, + OnboardTerminalMachineStateId +>; + +export const ONBOARD_MACHINE_NON_TERMINAL_STATE_IDS = ONBOARD_MACHINE_STATE_DEFINITIONS.filter( + (definition): definition is Extract< + (typeof ONBOARD_MACHINE_STATE_DEFINITIONS)[number], + { terminal: false } + > => definition.terminal === false, +).map((definition) => definition.state) as readonly OnboardNonTerminalMachineStateId[]; + +export type OnboardMachineStateDefinition = (typeof ONBOARD_MACHINE_STATE_DEFINITIONS)[number]; + +export type OnboardMachineStateWithStepDefinition = Extract< + OnboardMachineStateDefinition, + { stepName: string } +>; + +export type OnboardMachineStateWithProgressDefinition = Extract< + OnboardMachineStateDefinition, + { progress: { number: number; total: number; title: string } } +>; + +export function getOnboardMachineStateDefinition( + state: OnboardMachineStateId, +): OnboardMachineStateDefinition { + const definition = ONBOARD_MACHINE_STATE_DEFINITIONS.find((entry) => entry.state === state); + if (!definition) throw new Error(`Unknown onboarding machine state: ${state}`); + return definition; +} diff --git a/src/lib/onboard/machine/types.ts b/src/lib/onboard/machine/types.ts index e1dca21e72..d5f00477ee 100644 --- a/src/lib/onboard/machine/types.ts +++ b/src/lib/onboard/machine/types.ts @@ -9,39 +9,26 @@ * probes, or policy application is out of scope for the initial FSM shell. */ -export const ONBOARD_MACHINE_STATES = [ - "init", - "preflight", - "gateway", - "provider_selection", - "inference", - "sandbox", - "agent_setup", - "openclaw", - "policies", - "finalizing", - "post_verify", - "complete", - "failed", -] as const; +import { + ONBOARD_MACHINE_NON_TERMINAL_STATE_IDS, + ONBOARD_MACHINE_STATE_IDS, + ONBOARD_MACHINE_TERMINAL_STATE_IDS, + type OnboardMachineStateId, + type OnboardNonTerminalMachineStateId, + type OnboardTerminalMachineStateId, +} from "./definition"; + +export const ONBOARD_MACHINE_STATES = ONBOARD_MACHINE_STATE_IDS; -export type OnboardMachineState = (typeof ONBOARD_MACHINE_STATES)[number]; +export type OnboardMachineState = OnboardMachineStateId; -export const ONBOARD_TERMINAL_MACHINE_STATES = ["complete", "failed"] as const; +export const ONBOARD_TERMINAL_MACHINE_STATES = ONBOARD_MACHINE_TERMINAL_STATE_IDS; -export type OnboardTerminalMachineState = - (typeof ONBOARD_TERMINAL_MACHINE_STATES)[number]; +export type OnboardTerminalMachineState = OnboardTerminalMachineStateId; -export type OnboardNonTerminalMachineState = Exclude< - OnboardMachineState, - OnboardTerminalMachineState ->; +export type OnboardNonTerminalMachineState = OnboardNonTerminalMachineStateId; -export const ONBOARD_NON_TERMINAL_MACHINE_STATES: readonly OnboardNonTerminalMachineState[] = - ONBOARD_MACHINE_STATES.filter( - (state): state is OnboardNonTerminalMachineState => - !ONBOARD_TERMINAL_MACHINE_STATES.includes(state as OnboardTerminalMachineState), - ); +export const ONBOARD_NON_TERMINAL_MACHINE_STATES = ONBOARD_MACHINE_NON_TERMINAL_STATE_IDS; export const ONBOARD_MACHINE_EVENT_TYPES = [ "onboard.started", From c3e4ad63b31738ffad7f7c7bb306dfa8d6eca39a Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Wed, 27 May 2026 15:21:58 -0700 Subject: [PATCH 03/26] refactor(onboard): derive session step mapping from FSM metadata Signed-off-by: Carlos Villela --- src/lib/onboard/machine/definition.test.ts | 11 ++++++++ src/lib/onboard/machine/events.ts | 32 ++++++++++++++-------- 2 files changed, 31 insertions(+), 12 deletions(-) diff --git a/src/lib/onboard/machine/definition.test.ts b/src/lib/onboard/machine/definition.test.ts index 7fffa49ec5..6c5d87c6f7 100644 --- a/src/lib/onboard/machine/definition.test.ts +++ b/src/lib/onboard/machine/definition.test.ts @@ -10,6 +10,7 @@ import { ONBOARD_MACHINE_STATE_IDS, ONBOARD_MACHINE_TERMINAL_STATE_IDS, } from "./definition"; +import { ONBOARD_SESSION_STEP_TO_MACHINE_STATE } from "./events"; const expectedStateOrder = [ "init", @@ -65,6 +66,16 @@ describe("onboard machine definition", () => { ]); }); + it("derives the session step mapping from state definitions", () => { + const mappingFromDefinitions = Object.fromEntries( + ONBOARD_MACHINE_STATE_DEFINITIONS.flatMap((definition) => + "stepName" in definition ? [[definition.stepName, definition.state]] : [], + ), + ); + + expect(ONBOARD_SESSION_STEP_TO_MACHINE_STATE).toEqual(mappingFromDefinitions); + }); + it("keeps progress metadata attached only to state-backed steps", () => { for (const definition of ONBOARD_MACHINE_STATE_DEFINITIONS) { if (!("progress" in definition)) continue; diff --git a/src/lib/onboard/machine/events.ts b/src/lib/onboard/machine/events.ts index f6b7dca47c..2ce746167a 100644 --- a/src/lib/onboard/machine/events.ts +++ b/src/lib/onboard/machine/events.ts @@ -4,24 +4,32 @@ import type { JsonObject, JsonValue } from "../../core/json-types"; import { redactSensitiveText, redactUrl } from "../../security/redact"; import type { HermesAuthMethod, Session } from "../../state/onboard-session"; +import { + ONBOARD_MACHINE_STATE_DEFINITIONS, + type OnboardMachineStateWithStepDefinition, +} from "./definition"; import type { OnboardMachineContext, OnboardMachineEventType, OnboardMachineState, } from "./types"; -export const ONBOARD_SESSION_STEP_TO_MACHINE_STATE = { - preflight: "preflight", - gateway: "gateway", - provider_selection: "provider_selection", - inference: "inference", - sandbox: "sandbox", - agent_setup: "agent_setup", - openclaw: "openclaw", - policies: "policies", -} as const satisfies Readonly>; - -export type OnboardSessionStepName = keyof typeof ONBOARD_SESSION_STEP_TO_MACHINE_STATE; +type OnboardSessionStepDefinition = OnboardMachineStateWithStepDefinition; + +export type OnboardSessionStepName = OnboardSessionStepDefinition["stepName"]; + +type OnboardSessionStepToMachineState = { + readonly [StepName in OnboardSessionStepName]: Extract< + OnboardSessionStepDefinition, + { stepName: StepName } + >["state"]; +}; + +export const ONBOARD_SESSION_STEP_TO_MACHINE_STATE = Object.fromEntries( + ONBOARD_MACHINE_STATE_DEFINITIONS.flatMap((definition) => + "stepName" in definition ? [[definition.stepName, definition.state]] : [], + ), +) as OnboardSessionStepToMachineState; export interface OnboardMachineEvent { version: 1; From 603832c0c5d9e2ea2e9a8b27158ee00b8fd9bc93 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Wed, 27 May 2026 15:24:39 -0700 Subject: [PATCH 04/26] refactor(onboard): derive progress labels from FSM metadata Signed-off-by: Carlos Villela --- src/lib/onboard.ts | 23 +++++--------- src/lib/onboard/machine/definition.ts | 1 - src/lib/onboard/machine/progress.test.ts | 38 ++++++++++++++++++++++++ src/lib/onboard/machine/progress.ts | 38 ++++++++++++++++++++++++ 4 files changed, 83 insertions(+), 17 deletions(-) create mode 100644 src/lib/onboard/machine/progress.test.ts create mode 100644 src/lib/onboard/machine/progress.ts diff --git a/src/lib/onboard.ts b/src/lib/onboard.ts index 761c3c2454..4fef39b2aa 100644 --- a/src/lib/onboard.ts +++ b/src/lib/onboard.ts @@ -419,6 +419,7 @@ const { handlePoliciesState }: typeof import("./onboard/machine/handlers/policie const { handlePreflightState }: typeof import("./onboard/machine/handlers/preflight") = require("./onboard/machine/handlers/preflight"); const { handleProviderInferenceState }: typeof import("./onboard/machine/handlers/provider-inference") = require("./onboard/machine/handlers/provider-inference"); const { handleSandboxState }: typeof import("./onboard/machine/handlers/sandbox") = require("./onboard/machine/handlers/sandbox"); +const { getOnboardProgressStep }: typeof import("./onboard/machine/progress") = require("./onboard/machine/progress"); const policies: typeof import("./policy") = require("./policy"); const tiers: typeof import("./policy/tiers") = require("./policy/tiers"); const { ensureUsageNoticeConsent } = require("./onboard/usage-notice"); @@ -6390,28 +6391,18 @@ const recordRepairEvent = onboardRuntimeBoundary.recordRepairEvent.bind(onboardR const recordPostVerifyStarted = onboardRuntimeBoundary.recordPostVerifyStarted.bind(onboardRuntimeBoundary); const recordSessionComplete = onboardRuntimeBoundary.recordSessionComplete.bind(onboardRuntimeBoundary); -const ONBOARD_STEP_INDEX: Record = { - preflight: { number: 1, title: "Preflight checks" }, - gateway: { number: 2, title: "Starting OpenShell gateway" }, - provider_selection: { number: 3, title: "Configuring inference (NIM)" }, - inference: { number: 4, title: "Setting up inference provider" }, - messaging: { number: 5, title: "Messaging channels" }, - sandbox: { number: 6, title: "Creating sandbox" }, - openclaw: { number: 7, title: "Setting up agent inside sandbox" }, - policies: { number: 8, title: "Policy presets" }, -}; - function skippedStepMessage( stepName: string, detail?: string | null, reason: "resume" | "reuse" = "resume", ): void { - let stepInfo = ONBOARD_STEP_INDEX[stepName]; - if (stepInfo && stepName === "openclaw") { - stepInfo = { ...stepInfo, title: `Setting up ${agentProductName()} inside sandbox` }; - } + const progressStep = getOnboardProgressStep(stepName); + const stepInfo = + progressStep && stepName === "openclaw" + ? { ...progressStep, title: `Setting up ${agentProductName()} inside sandbox` } + : progressStep; if (stepInfo) { - step(stepInfo.number, 8, stepInfo.title); + step(stepInfo.number, stepInfo.total, stepInfo.title); } const prefix = reason === "reuse" ? "[reuse]" : "[resume]"; console.log(` ${prefix} Skipping ${stepName}${detail ? ` (${detail})` : ""}`); diff --git a/src/lib/onboard/machine/definition.ts b/src/lib/onboard/machine/definition.ts index 0f873edf0b..03903bfb34 100644 --- a/src/lib/onboard/machine/definition.ts +++ b/src/lib/onboard/machine/definition.ts @@ -45,7 +45,6 @@ export const ONBOARD_MACHINE_STATE_DEFINITIONS = [ state: "agent_setup", terminal: false, stepName: "agent_setup", - progress: { number: 7, total: 8, title: "Setting up agent inside sandbox" }, }, { state: "openclaw", diff --git a/src/lib/onboard/machine/progress.test.ts b/src/lib/onboard/machine/progress.test.ts new file mode 100644 index 0000000000..4c1ee2e99d --- /dev/null +++ b/src/lib/onboard/machine/progress.test.ts @@ -0,0 +1,38 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import { describe, expect, it } from "vitest"; + +import { ONBOARD_MACHINE_STATE_DEFINITIONS } from "./definition"; +import { getOnboardProgressStep, ONBOARD_PROGRESS_STEPS } from "./progress"; + +describe("onboard progress metadata", () => { + it("derives state-backed progress labels from machine definitions", () => { + for (const definition of ONBOARD_MACHINE_STATE_DEFINITIONS) { + if (!("progress" in definition)) continue; + expect(ONBOARD_PROGRESS_STEPS[definition.stepName]).toEqual(definition.progress); + } + }); + + it("preserves the existing eight-step onboarding labels", () => { + expect(ONBOARD_PROGRESS_STEPS).toEqual({ + preflight: { number: 1, total: 8, title: "Preflight checks" }, + gateway: { number: 2, total: 8, title: "Starting OpenShell gateway" }, + provider_selection: { number: 3, total: 8, title: "Configuring inference (NIM)" }, + inference: { number: 4, total: 8, title: "Setting up inference provider" }, + messaging: { number: 5, total: 8, title: "Messaging channels" }, + sandbox: { number: 6, total: 8, title: "Creating sandbox" }, + openclaw: { number: 7, total: 8, title: "Setting up agent inside sandbox" }, + policies: { number: 8, total: 8, title: "Policy presets" }, + }); + }); + + it("looks up known labels and ignores unknown steps", () => { + expect(getOnboardProgressStep("gateway")).toEqual({ + number: 2, + total: 8, + title: "Starting OpenShell gateway", + }); + expect(getOnboardProgressStep("not-a-step")).toBeNull(); + }); +}); diff --git a/src/lib/onboard/machine/progress.ts b/src/lib/onboard/machine/progress.ts new file mode 100644 index 0000000000..2cf485655e --- /dev/null +++ b/src/lib/onboard/machine/progress.ts @@ -0,0 +1,38 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import { + ONBOARD_MACHINE_STATE_DEFINITIONS, + type OnboardMachineStateWithProgressDefinition, +} from "./definition"; + +export interface OnboardProgressStep { + number: number; + total: number; + title: string; +} + +export type OnboardMachineProgressStepName = + OnboardMachineStateWithProgressDefinition["stepName"]; + +export type OnboardProgressStepName = OnboardMachineProgressStepName | "messaging"; + +const EXTRA_PROGRESS_STEPS = [ + { + stepName: "messaging", + progress: { number: 5, total: 8, title: "Messaging channels" }, + }, +] as const; + +export const ONBOARD_PROGRESS_STEPS = Object.fromEntries([ + ...ONBOARD_MACHINE_STATE_DEFINITIONS.flatMap((definition) => + "progress" in definition ? [[definition.stepName, definition.progress]] : [], + ), + ...EXTRA_PROGRESS_STEPS.map((definition) => [definition.stepName, definition.progress]), +]) as Readonly>; + +export function getOnboardProgressStep(stepName: string): OnboardProgressStep | null { + return Object.prototype.hasOwnProperty.call(ONBOARD_PROGRESS_STEPS, stepName) + ? ONBOARD_PROGRESS_STEPS[stepName as OnboardProgressStepName] + : null; +} From 4fad8e7cc0461d01f50dae21055a2a1c6d7232f5 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Wed, 27 May 2026 17:02:42 -0700 Subject: [PATCH 05/26] fix(onboard): emit lifecycle events for onboarding start Signed-off-by: Carlos Villela --- src/lib/onboard.ts | 3 + src/lib/onboard/runtime-boundary.test.ts | 94 ++++++++++++++++++++++++ src/lib/onboard/runtime-boundary.ts | 10 ++- 3 files changed, 105 insertions(+), 2 deletions(-) create mode 100644 src/lib/onboard/runtime-boundary.test.ts diff --git a/src/lib/onboard.ts b/src/lib/onboard.ts index 4fef39b2aa..a05c9f16da 100644 --- a/src/lib/onboard.ts +++ b/src/lib/onboard.ts @@ -6382,6 +6382,7 @@ const onboardRuntimeBoundary = new OnboardRuntimeBoundary({ maybeForceE2eStepFailure, }); +const recordOnboardStarted = onboardRuntimeBoundary.recordOnboardStarted.bind(onboardRuntimeBoundary); const startRecordedStep = onboardRuntimeBoundary.startRecordedStep.bind(onboardRuntimeBoundary); const recordStepComplete = onboardRuntimeBoundary.recordStepComplete.bind(onboardRuntimeBoundary); const recordStepSkipped = onboardRuntimeBoundary.recordStepSkipped.bind(onboardRuntimeBoundary); @@ -6675,6 +6676,8 @@ async function onboard(opts: OnboardOptions = {}): Promise { ); } + await recordOnboardStarted(resume); + // Backstop for the resume path: a session may exist (so the early guard // skipped because resume === true) but never have recorded a sandboxName // — sandbox creation could have failed before that step ran. Without a diff --git a/src/lib/onboard/runtime-boundary.test.ts b/src/lib/onboard/runtime-boundary.test.ts new file mode 100644 index 0000000000..d81116ed86 --- /dev/null +++ b/src/lib/onboard/runtime-boundary.test.ts @@ -0,0 +1,94 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import { describe, expect, it } from "vitest"; + +import { + createSession, + filterSafeUpdates, + normalizeSession, + type Session, + type SessionUpdates, +} from "../state/onboard-session"; +import type { OnboardMachineEvent } from "./machine/events"; +import { OnboardRuntime, type OnboardRuntimeDeps } from "./machine/runtime"; +import { OnboardRuntimeBoundary } from "./runtime-boundary"; + +function cloneSession(session: Session): Session { + return normalizeSession(JSON.parse(JSON.stringify(session))) ?? session; +} + +function createRuntimeHarness() { + let session: Session | null = createSession(); + const events: OnboardMachineEvent[] = []; + const updateSession = (mutator: (value: Session) => Session | void): Session => { + const current = session ? cloneSession(session) : createSession(); + session = cloneSession(mutator(current) ?? current); + return cloneSession(session); + }; + const deps: OnboardRuntimeDeps = { + loadSession: () => (session ? cloneSession(session) : null), + createSession, + saveSession: (next) => { + session = cloneSession(next); + return cloneSession(session); + }, + updateSession, + markStepStarted: (stepName) => + updateSession((current) => { + current.steps[stepName].status = "in_progress"; + return current; + }), + markStepComplete: (stepName, updates: SessionUpdates = {}) => + updateSession((current) => { + current.steps[stepName].status = "complete"; + Object.assign(current, filterSafeUpdates(updates)); + return current; + }), + markStepSkipped: (stepName) => + updateSession((current) => { + current.steps[stepName].status = "skipped"; + return current; + }), + markStepFailed: (stepName, message) => + updateSession((current) => { + current.steps[stepName].status = "failed"; + current.failure = { step: stepName, message: message ?? null, recordedAt: "now" }; + return current; + }), + completeSession: (updates: SessionUpdates = {}) => + updateSession((current) => { + Object.assign(current, filterSafeUpdates(updates)); + current.status = "complete"; + return current; + }), + filterSafeUpdates, + emitEvent: (event) => events.push(event), + now: () => "2026-05-27T00:00:00.000Z", + }; + return { + createRuntime: () => new OnboardRuntime(deps), + events, + }; +} + +describe("OnboardRuntimeBoundary", () => { + it("records started and resumed lifecycle events through the runtime", async () => { + const harness = createRuntimeHarness(); + const boundary = new OnboardRuntimeBoundary({ + toSessionUpdates: (updates) => filterSafeUpdates(updates as SessionUpdates) as SessionUpdates, + maybeForceE2eStepFailure: () => undefined, + createRuntime: harness.createRuntime, + }); + + await boundary.recordOnboardStarted(false); + await boundary.recordOnboardStarted(true); + + expect(harness.events.map((event) => event.type)).toEqual([ + "onboard.started", + "onboard.resumed", + ]); + expect(harness.events[0]).toMatchObject({ state: "init" }); + expect(harness.events[1]).toMatchObject({ state: "init" }); + }); +}); diff --git a/src/lib/onboard/runtime-boundary.ts b/src/lib/onboard/runtime-boundary.ts index daa8a13367..e90166e17b 100644 --- a/src/lib/onboard/runtime-boundary.ts +++ b/src/lib/onboard/runtime-boundary.ts @@ -8,6 +8,7 @@ import type { OnboardMachineEventType, OnboardMachineState } from "./machine/typ export interface OnboardRuntimeBoundaryOptions { toSessionUpdates(updates: Record): SessionUpdates; maybeForceE2eStepFailure(stepName: string): void; + createRuntime?(): OnboardRuntime; } export class OnboardRuntimeBoundary { @@ -16,7 +17,7 @@ export class OnboardRuntimeBoundary { constructor(private readonly options: OnboardRuntimeBoundaryOptions) {} reset(): void { - this.runtime = new OnboardRuntime(); + this.runtime = this.options.createRuntime?.() ?? new OnboardRuntime(); } clear(): void { @@ -24,12 +25,13 @@ export class OnboardRuntimeBoundary { } getRuntime(): OnboardRuntime { - if (!this.runtime) this.runtime = new OnboardRuntime(); + if (!this.runtime) this.runtime = this.options.createRuntime?.() ?? new OnboardRuntime(); return this.runtime; } recorders() { return { + recordOnboardStarted: this.recordOnboardStarted.bind(this), startRecordedStep: this.startRecordedStep.bind(this), recordStepComplete: this.recordStepComplete.bind(this), recordStepSkipped: this.recordStepSkipped.bind(this), @@ -41,6 +43,10 @@ export class OnboardRuntimeBoundary { }; } + async recordOnboardStarted(resumed: boolean): Promise { + return this.getRuntime().start({ resumed }); + } + async startRecordedStep( stepName: string, updates: { From f99e9cbaa1820e89996b9059f4bb3a2c7a82c2d6 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Wed, 27 May 2026 17:05:42 -0700 Subject: [PATCH 06/26] fix(onboard): emit machine events for resume conflicts Signed-off-by: Carlos Villela --- src/lib/onboard.ts | 2 ++ src/lib/onboard/machine/runtime.test.ts | 18 ++++++++++++++++++ src/lib/onboard/machine/runtime.ts | 19 +++++++++++++++++++ src/lib/onboard/runtime-boundary.test.ts | 21 +++++++++++++++++++++ src/lib/onboard/runtime-boundary.ts | 10 ++++++++++ 5 files changed, 70 insertions(+) diff --git a/src/lib/onboard.ts b/src/lib/onboard.ts index a05c9f16da..98a49eea55 100644 --- a/src/lib/onboard.ts +++ b/src/lib/onboard.ts @@ -6389,6 +6389,7 @@ const recordStepSkipped = onboardRuntimeBoundary.recordStepSkipped.bind(onboardR const recordStepFailed = onboardRuntimeBoundary.recordStepFailed.bind(onboardRuntimeBoundary); const recordStateSkipped = onboardRuntimeBoundary.recordStateSkipped.bind(onboardRuntimeBoundary); const recordRepairEvent = onboardRuntimeBoundary.recordRepairEvent.bind(onboardRuntimeBoundary); +const recordResumeConflict = onboardRuntimeBoundary.recordResumeConflict.bind(onboardRuntimeBoundary); const recordPostVerifyStarted = onboardRuntimeBoundary.recordPostVerifyStarted.bind(onboardRuntimeBoundary); const recordSessionComplete = onboardRuntimeBoundary.recordSessionComplete.bind(onboardRuntimeBoundary); @@ -6598,6 +6599,7 @@ async function onboard(opts: OnboardOptions = {}): Promise { }); if (resumeConflicts.length > 0) { for (const conflict of resumeConflicts) { + await recordResumeConflict(conflict); if (conflict.field === "sandbox") { console.error( ` Resumable state belongs to sandbox '${conflict.recorded}', not '${conflict.requested}'.`, diff --git a/src/lib/onboard/machine/runtime.test.ts b/src/lib/onboard/machine/runtime.test.ts index f098ba0dc3..d48da85e0a 100644 --- a/src/lib/onboard/machine/runtime.test.ts +++ b/src/lib/onboard/machine/runtime.test.ts @@ -209,6 +209,24 @@ describe("OnboardRuntime", () => { expect(events[1]).toMatchObject({ state: "post_verify" }); }); + it("emits redacted resume conflict events without mutating durable state", async () => { + const { runtime, events, getSession } = createHarness(sessionInState("provider_selection")); + + await runtime.emitResumeConflict({ + field: "fromDockerfile", + recorded: "/workspace/Dockerfile", + requested: "/tmp/Dockerfile", + metadata: { endpoint: "https://alice:secret@example.com/v1?token=super-secret" }, + }); + + expect(getSession().machine.state).toBe("provider_selection"); + expect(events).toHaveLength(1); + expect(events[0]).toMatchObject({ type: "resume.conflict", state: "provider_selection" }); + expect(events[0].metadata.field).toBe("fromDockerfile"); + expect(JSON.stringify(events)).not.toContain("super-secret"); + expect(JSON.stringify(events)).not.toContain("alice:secret"); + }); + it("emits skipped and repair events without mutating durable state", async () => { const { runtime, events, getSession } = createHarness(sessionInState("provider_selection")); diff --git a/src/lib/onboard/machine/runtime.ts b/src/lib/onboard/machine/runtime.ts index 2e5d584f3b..65516c3212 100644 --- a/src/lib/onboard/machine/runtime.ts +++ b/src/lib/onboard/machine/runtime.ts @@ -243,6 +243,25 @@ export class OnboardRuntime { return session; } + async emitResumeConflict(options: { + field: string; + recorded?: unknown; + requested?: unknown; + metadata?: Record | null; + }): Promise { + const session = this.ensureSession(); + this.emit("resume.conflict", session, { + state: session.machine.state, + metadata: { + ...eventMetadata(options.metadata), + field: options.field, + recorded: options.recorded ?? null, + requested: options.requested ?? null, + }, + }); + return session; + } + async emitRepairEvent( type: Extract< OnboardMachineEventType, diff --git a/src/lib/onboard/runtime-boundary.test.ts b/src/lib/onboard/runtime-boundary.test.ts index d81116ed86..21d6f1083e 100644 --- a/src/lib/onboard/runtime-boundary.test.ts +++ b/src/lib/onboard/runtime-boundary.test.ts @@ -91,4 +91,25 @@ describe("OnboardRuntimeBoundary", () => { expect(harness.events[0]).toMatchObject({ state: "init" }); expect(harness.events[1]).toMatchObject({ state: "init" }); }); + + it("records resume conflict diagnostics through the runtime", async () => { + const harness = createRuntimeHarness(); + const boundary = new OnboardRuntimeBoundary({ + toSessionUpdates: (updates) => filterSafeUpdates(updates as SessionUpdates) as SessionUpdates, + maybeForceE2eStepFailure: () => undefined, + createRuntime: harness.createRuntime, + }); + + await boundary.recordResumeConflict({ + field: "sandbox", + recorded: "old-sandbox", + requested: "new-sandbox", + }); + + expect(harness.events).toHaveLength(1); + expect(harness.events[0]).toMatchObject({ + type: "resume.conflict", + metadata: { field: "sandbox", recorded: "old-sandbox", requested: "new-sandbox" }, + }); + }); }); diff --git a/src/lib/onboard/runtime-boundary.ts b/src/lib/onboard/runtime-boundary.ts index e90166e17b..e2306e3ce5 100644 --- a/src/lib/onboard/runtime-boundary.ts +++ b/src/lib/onboard/runtime-boundary.ts @@ -37,6 +37,7 @@ export class OnboardRuntimeBoundary { recordStepSkipped: this.recordStepSkipped.bind(this), recordStateSkipped: this.recordStateSkipped.bind(this), recordRepairEvent: this.recordRepairEvent.bind(this), + recordResumeConflict: this.recordResumeConflict.bind(this), recordStepFailed: this.recordStepFailed.bind(this), recordPostVerifyStarted: this.recordPostVerifyStarted.bind(this), recordSessionComplete: this.recordSessionComplete.bind(this), @@ -83,6 +84,15 @@ export class OnboardRuntimeBoundary { return this.getRuntime().markSkipped(state, metadata); } + async recordResumeConflict(conflict: { + field: string; + recorded?: unknown; + requested?: unknown; + metadata?: Record | null; + }): Promise { + return this.getRuntime().emitResumeConflict(conflict); + } + async recordRepairEvent( type: Extract< OnboardMachineEventType, From 2b60df442657ef2d850f4b371d57450bf92ffeae Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Wed, 27 May 2026 17:07:45 -0700 Subject: [PATCH 07/26] refactor(onboard): introduce explicit state result types Signed-off-by: Carlos Villela --- src/lib/onboard/machine/result.test.ts | 62 ++++++++++++++++++ src/lib/onboard/machine/result.ts | 89 ++++++++++++++++++++++++++ 2 files changed, 151 insertions(+) create mode 100644 src/lib/onboard/machine/result.test.ts create mode 100644 src/lib/onboard/machine/result.ts diff --git a/src/lib/onboard/machine/result.test.ts b/src/lib/onboard/machine/result.test.ts new file mode 100644 index 0000000000..b995f6ac3b --- /dev/null +++ b/src/lib/onboard/machine/result.test.ts @@ -0,0 +1,62 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import { describe, expect, it } from "vitest"; + +import { + advanceTo, + branchTo, + completeOnboardMachine, + failOnboardMachine, + retryTo, + transitionTo, +} from "./result"; + +describe("onboard state result helpers", () => { + it("builds transition results with optional updates and metadata", () => { + expect( + transitionTo("gateway", { + updates: { sandboxName: "my-assistant" }, + metadata: { reason: "test" }, + }), + ).toEqual({ + type: "transition", + next: "gateway", + transitionKind: undefined, + updates: { sandboxName: "my-assistant" }, + metadata: { reason: "test" }, + }); + }); + + it("labels advance, retry, and branch transitions", () => { + expect(advanceTo("preflight")).toMatchObject({ + type: "transition", + next: "preflight", + transitionKind: "advance", + }); + expect(retryTo("provider_selection")).toMatchObject({ + type: "transition", + next: "provider_selection", + transitionKind: "retry", + }); + expect(branchTo("agent_setup")).toMatchObject({ + type: "transition", + next: "agent_setup", + transitionKind: "branch", + }); + }); + + it("builds terminal completion and failure results", () => { + expect(completeOnboardMachine({ sandboxName: "my-assistant" }, { verified: true })).toEqual({ + type: "complete", + updates: { sandboxName: "my-assistant" }, + metadata: { verified: true }, + }); + expect(failOnboardMachine("boom", { step: "gateway", metadata: { phase: 2 } })).toEqual({ + type: "failed", + error: "boom", + step: "gateway", + metadata: { phase: 2 }, + }); + }); +}); diff --git a/src/lib/onboard/machine/result.ts b/src/lib/onboard/machine/result.ts new file mode 100644 index 0000000000..e80fae20b5 --- /dev/null +++ b/src/lib/onboard/machine/result.ts @@ -0,0 +1,89 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import type { SessionUpdates } from "../../state/onboard-session"; +import type { OnboardMachineTransitionKind } from "./types"; +import type { OnboardMachineState } from "./types"; + +export interface OnboardStateTransitionResult { + type: "transition"; + next: OnboardMachineState; + transitionKind?: OnboardMachineTransitionKind; + updates?: SessionUpdates; + metadata?: Record | null; +} + +export interface OnboardStateCompleteResult { + type: "complete"; + updates?: SessionUpdates; + metadata?: Record | null; +} + +export interface OnboardStateFailedResult { + type: "failed"; + error: string | null; + step?: string | null; + metadata?: Record | null; +} + +export type OnboardStateResult = + | OnboardStateTransitionResult + | OnboardStateCompleteResult + | OnboardStateFailedResult; + +export function transitionTo( + next: OnboardMachineState, + options: { + transitionKind?: OnboardMachineTransitionKind; + updates?: SessionUpdates; + metadata?: Record | null; + } = {}, +): OnboardStateTransitionResult { + return { + type: "transition", + next, + transitionKind: options.transitionKind, + updates: options.updates, + metadata: options.metadata, + }; +} + +export function advanceTo( + next: OnboardMachineState, + options: Omit[1], "transitionKind"> = {}, +): OnboardStateTransitionResult { + return transitionTo(next, { ...options, transitionKind: "advance" }); +} + +export function retryTo( + next: OnboardMachineState, + options: Omit[1], "transitionKind"> = {}, +): OnboardStateTransitionResult { + return transitionTo(next, { ...options, transitionKind: "retry" }); +} + +export function branchTo( + next: OnboardMachineState, + options: Omit[1], "transitionKind"> = {}, +): OnboardStateTransitionResult { + return transitionTo(next, { ...options, transitionKind: "branch" }); +} + +export function completeOnboardMachine( + updates: SessionUpdates = {}, + metadata: Record | null = null, +): OnboardStateCompleteResult { + return { type: "complete", updates, metadata }; +} + +export function failOnboardMachine( + error: string | null, + options: { step?: string | null; metadata?: Record | null } = {}, +): OnboardStateFailedResult { + return { + type: "failed", + error, + step: options.step, + metadata: options.metadata, + }; +} From 30341b06786955d07f4ce6f96862a7ff76e1de5f Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Wed, 27 May 2026 17:09:54 -0700 Subject: [PATCH 08/26] refactor(onboard): apply explicit state results through runtime Signed-off-by: Carlos Villela --- src/lib/onboard/machine/runtime.test.ts | 66 +++++++++++++++++++++++++ src/lib/onboard/machine/runtime.ts | 28 +++++++++++ 2 files changed, 94 insertions(+) diff --git a/src/lib/onboard/machine/runtime.test.ts b/src/lib/onboard/machine/runtime.test.ts index d48da85e0a..512ee7f56b 100644 --- a/src/lib/onboard/machine/runtime.test.ts +++ b/src/lib/onboard/machine/runtime.test.ts @@ -12,6 +12,13 @@ import { type SessionUpdates, } from "../../state/onboard-session"; import type { OnboardMachineEvent } from "./events"; +import { + advanceTo, + branchTo, + completeOnboardMachine, + failOnboardMachine, + retryTo, +} from "./result"; import { OnboardRuntime, type OnboardRuntimeDeps } from "./runtime"; import { InvalidOnboardMachineTransitionError } from "./transitions"; @@ -159,6 +166,65 @@ describe("OnboardRuntime", () => { expect(JSON.stringify(events)).not.toContain("super-secret"); }); + it("applies explicit advance results through validated runtime transitions", async () => { + const { runtime, events, getSession } = createHarness(); + + await runtime.applyResult( + advanceTo("preflight", { + updates: { sandboxName: "my-assistant" }, + metadata: { source: "handler" }, + }), + ); + + expect(getSession()).toMatchObject({ + sandboxName: "my-assistant", + machine: { state: "preflight", revision: 1 }, + }); + expect(events.map((event) => event.type)).toEqual([ + "context.updated", + "state.exited", + "state.entered", + ]); + expect(events[0].metadata.fields).toEqual(["sandboxName"]); + expect(events[1]).toMatchObject({ state: "init", metadata: { source: "handler" } }); + expect(events[2]).toMatchObject({ state: "preflight", metadata: { source: "handler" } }); + }); + + it("applies explicit retry, branch, completion, and failure results", async () => { + const retryHarness = createHarness(sessionInState("inference")); + await retryHarness.runtime.applyResult(retryTo("provider_selection")); + expect(retryHarness.getSession().machine).toMatchObject({ state: "provider_selection" }); + + const branchHarness = createHarness(sessionInState("sandbox")); + await branchHarness.runtime.applyResult(branchTo("agent_setup")); + expect(branchHarness.getSession().machine).toMatchObject({ state: "agent_setup" }); + + const completeHarness = createHarness(sessionInState("post_verify")); + await completeHarness.runtime.applyResult(completeOnboardMachine({ sandboxName: "done" })); + expect(completeHarness.getSession()).toMatchObject({ + status: "complete", + sandboxName: "done", + machine: { state: "complete" }, + }); + + const failedHarness = createHarness(sessionInState("gateway")); + await failedHarness.runtime.applyResult(failOnboardMachine("boom", { step: "gateway" })); + expect(failedHarness.getSession()).toMatchObject({ + status: "failed", + failure: { step: "gateway", message: "boom" }, + machine: { state: "failed" }, + }); + }); + + it("rejects invalid explicit transition kinds before mutating context", async () => { + const { runtime, getSession } = createHarness(sessionInState("inference")); + + await expect( + runtime.applyResult(advanceTo("provider_selection", { updates: { sandboxName: "mutated" } })), + ).rejects.toThrow("expected advance, got retry"); + expect(getSession()).toMatchObject({ sandboxName: null, machine: { state: "inference" } }); + }); + it("fails non-terminal sessions with redacted failure events", async () => { const { runtime, events, getSession } = createHarness(sessionInState("gateway")); diff --git a/src/lib/onboard/machine/runtime.ts b/src/lib/onboard/machine/runtime.ts index 65516c3212..47cee9f0d2 100644 --- a/src/lib/onboard/machine/runtime.ts +++ b/src/lib/onboard/machine/runtime.ts @@ -9,6 +9,7 @@ import { emitOnboardMachineEvent, type OnboardMachineEvent, } from "./events"; +import type { OnboardStateResult } from "./result"; import { assertValidOnboardMachineTransition, canTransitionOnboardMachineState, @@ -197,6 +198,33 @@ export class OnboardRuntime { return updated; } + async applyResult(result: OnboardStateResult): Promise { + if (result.type === "complete") { + return this.complete(result.updates ?? {}); + } + if (result.type === "failed") { + return this.fail(result.error, { + step: result.step, + metadata: result.metadata, + }); + } + + const current = this.ensureSession(); + const transition = assertValidOnboardMachineTransition(current.machine.state, result.next); + if (result.transitionKind && transition.kind !== result.transitionKind) { + throw new Error( + `Invalid onboarding machine transition kind: ${current.machine.state} -> ${result.next} expected ${result.transitionKind}, got ${transition.kind}`, + ); + } + if (result.updates && Object.keys(this.deps.filterSafeUpdates(result.updates)).length > 0) { + await this.updateContext(result.updates, { + state: current.machine.state, + metadata: result.metadata, + }); + } + return this.transition(result.next, { metadata: result.metadata }); + } + async fail(message: string | null, options: OnboardRuntimeFailureOptions = {}): Promise { const current = this.ensureSession(); const from = current.machine.state; From d4ad2d9cb7bf1ba56f82cf544ae425531d65528a Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 28 May 2026 09:28:14 -0700 Subject: [PATCH 09/26] refactor(onboard): make finalization return FSM result Signed-off-by: Carlos Villela --- src/lib/onboard.ts | 6 ++--- .../machine/handlers/finalization.test.ts | 22 +++++++++---------- .../onboard/machine/handlers/finalization.ts | 13 ++++++----- src/lib/onboard/runtime-boundary.ts | 6 +++++ 4 files changed, 27 insertions(+), 20 deletions(-) diff --git a/src/lib/onboard.ts b/src/lib/onboard.ts index 98a49eea55..7d26fae8c6 100644 --- a/src/lib/onboard.ts +++ b/src/lib/onboard.ts @@ -6390,8 +6390,8 @@ const recordStepFailed = onboardRuntimeBoundary.recordStepFailed.bind(onboardRun const recordStateSkipped = onboardRuntimeBoundary.recordStateSkipped.bind(onboardRuntimeBoundary); const recordRepairEvent = onboardRuntimeBoundary.recordRepairEvent.bind(onboardRuntimeBoundary); const recordResumeConflict = onboardRuntimeBoundary.recordResumeConflict.bind(onboardRuntimeBoundary); +const recordStateResult = onboardRuntimeBoundary.recordStateResult.bind(onboardRuntimeBoundary); const recordPostVerifyStarted = onboardRuntimeBoundary.recordPostVerifyStarted.bind(onboardRuntimeBoundary); -const recordSessionComplete = onboardRuntimeBoundary.recordSessionComplete.bind(onboardRuntimeBoundary); function skippedStepMessage( stepName: string, @@ -7099,7 +7099,7 @@ async function onboard(opts: OnboardOptions = {}): Promise { }); session = policiesResult.session; - await handleFinalizationState({ + const finalizationResult = await handleFinalizationState({ sandboxName, model, provider, @@ -7114,7 +7114,6 @@ async function onboard(opts: OnboardOptions = {}): Promise { ensureAgentDashboardForward, verifyWebSearchInsideSandbox, recordPostVerifyStarted, - recordSessionComplete, toSessionUpdates: (updates) => toSessionUpdates(updates as Parameters[0]), removeLegacyCredentialsFile, cleanupStaleHostFiles, @@ -7152,6 +7151,7 @@ async function onboard(opts: OnboardOptions = {}): Promise { log: (message) => console.log(message), }, }); + await recordStateResult(finalizationResult.stateResult); traceCompleted = true; } finally { releaseOnboardLock(); diff --git a/src/lib/onboard/machine/handlers/finalization.test.ts b/src/lib/onboard/machine/handlers/finalization.test.ts index df6000b2e6..b70f2eef57 100644 --- a/src/lib/onboard/machine/handlers/finalization.test.ts +++ b/src/lib/onboard/machine/handlers/finalization.test.ts @@ -14,7 +14,6 @@ function createDeps(overrides: Partial 18789), postVerify: vi.fn(async () => createSession({ machine: { version: 1, state: "post_verify", stateEnteredAt: null, revision: 1 } })), - complete: vi.fn(async () => createSession({ status: "complete" })), removeLegacy: vi.fn(), cleanupHost: vi.fn(), recoverProcesses: vi.fn(), @@ -32,7 +31,6 @@ function createDeps(overrides: Partial) => updates as SessionUpdates, removeLegacyCredentialsFile: calls.removeLegacy, cleanupStaleHostFiles: calls.cleanupHost, @@ -81,12 +79,16 @@ describe("handleFinalizationState", () => { expect(calls.log).toHaveBeenCalledWith(" ✓ verified"); expect(calls.dashboard).toHaveBeenCalledWith("my-assistant", "model", "provider", null, null); expect(calls.postVerify).toHaveBeenCalledOnce(); - expect(calls.complete).toHaveBeenCalledWith({ - sandboxName: "my-assistant", - provider: "provider", - model: "model", - hermesAuthMethod: null, - hermesToolGateways: [], + expect(result.stateResult).toEqual({ + type: "complete", + updates: { + sandboxName: "my-assistant", + provider: "provider", + model: "model", + hermesAuthMethod: null, + hermesToolGateways: [], + }, + metadata: { state: "finalizing" }, }); expect(result.verificationDiagnostics).toEqual([" ✓ verified"]); }); @@ -98,9 +100,8 @@ describe("handleFinalizationState", () => { await handleFinalizationState({ ...baseOptions(deps), agent }); expect(calls.ensureAgentDashboard).toHaveBeenCalledWith("my-assistant", agent); - expect(calls.complete).toHaveBeenCalled(); expect(calls.ensureAgentDashboard.mock.invocationCallOrder[0]).toBeLessThan( - calls.complete.mock.invocationCallOrder[0], + calls.dashboard.mock.invocationCallOrder[0], ); expect(calls.dashboard).toHaveBeenCalledWith("my-assistant", "model", "provider", null, agent); }); @@ -115,7 +116,6 @@ describe("handleFinalizationState", () => { await expect(handleFinalizationState(baseOptions(deps))).rejects.toThrow("verification failed"); expect(calls.postVerify).toHaveBeenCalledOnce(); - expect(calls.complete).not.toHaveBeenCalled(); expect(calls.dashboard).not.toHaveBeenCalled(); }); diff --git a/src/lib/onboard/machine/handlers/finalization.ts b/src/lib/onboard/machine/handlers/finalization.ts index 34e2dba224..5bc8f96ccb 100644 --- a/src/lib/onboard/machine/handlers/finalization.ts +++ b/src/lib/onboard/machine/handlers/finalization.ts @@ -1,7 +1,8 @@ // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. // SPDX-License-Identifier: Apache-2.0 -import type { Session, SessionUpdates } from "../../../state/onboard-session"; +import type { Session } from "../../../state/onboard-session"; +import { completeOnboardMachine, type OnboardStateCompleteResult } from "../result"; export interface FinalizationStateOptions { sandboxName: string; @@ -17,8 +18,7 @@ export interface FinalizationStateOptions): number; recordPostVerifyStarted(): Promise; - recordSessionComplete(updates: SessionUpdates): Promise; - toSessionUpdates(updates: Record): SessionUpdates; + toSessionUpdates(updates: Record): NonNullable; removeLegacyCredentialsFile(): void; cleanupStaleHostFiles(): void; checkAndRecoverSandboxProcesses(sandboxName: string, options: { quiet: boolean }): void; @@ -46,7 +46,7 @@ export interface FinalizationStateOptions { + return this.getRuntime().applyResult(result); + } + async recordResumeConflict(conflict: { field: string; recorded?: unknown; From 356c9470245d6d7d7f4a50bc0a0bfa8e01763e68 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 28 May 2026 09:30:58 -0700 Subject: [PATCH 10/26] refactor(onboard): make agent setup return FSM result Signed-off-by: Carlos Villela --- .../onboard/machine/handlers/agent-setup.test.ts | 15 +++++++++++++++ src/lib/onboard/machine/handlers/agent-setup.ts | 6 ++++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/src/lib/onboard/machine/handlers/agent-setup.test.ts b/src/lib/onboard/machine/handlers/agent-setup.test.ts index f5dd3e5f65..9eb998ea4d 100644 --- a/src/lib/onboard/machine/handlers/agent-setup.test.ts +++ b/src/lib/onboard/machine/handlers/agent-setup.test.ts @@ -88,6 +88,13 @@ describe("handleAgentSetupState", () => { expect(calls.skipped).toHaveBeenCalledWith("openclaw"); expect(calls.setupOpenclaw).not.toHaveBeenCalled(); expect(result.session?.steps.openclaw.status).toBe("skipped"); + expect(result.stateResult).toEqual({ + type: "transition", + next: "policies", + transitionKind: "advance", + updates: undefined, + metadata: { state: "agent_setup" }, + }); }); it("skips OpenClaw setup on resume when OpenClaw is ready", async () => { @@ -108,6 +115,13 @@ describe("handleAgentSetupState", () => { expect.objectContaining({ sandboxName: "my-assistant", provider: "provider", model: "model" }), ); expect(calls.skipped).toHaveBeenCalledWith("agent_setup"); + expect(result.stateResult).toEqual({ + type: "transition", + next: "policies", + transitionKind: "advance", + updates: undefined, + metadata: { state: "openclaw" }, + }); expect(result.session).toMatchObject({ sandboxName: "my-assistant", provider: "provider", @@ -143,6 +157,7 @@ describe("handleAgentSetupState", () => { }), ); expect(calls.skipped).toHaveBeenCalledWith("agent_setup"); + expect(result.stateResult).toMatchObject({ next: "policies", transitionKind: "advance" }); expect(result.session).toMatchObject({ sandboxName: "my-assistant", provider: "provider", diff --git a/src/lib/onboard/machine/handlers/agent-setup.ts b/src/lib/onboard/machine/handlers/agent-setup.ts index 3b43bd69cb..4ec59f8c79 100644 --- a/src/lib/onboard/machine/handlers/agent-setup.ts +++ b/src/lib/onboard/machine/handlers/agent-setup.ts @@ -2,6 +2,7 @@ // SPDX-License-Identifier: Apache-2.0 import type { Session, SessionUpdates } from "../../../state/onboard-session"; +import { advanceTo, type OnboardStateTransitionResult } from "../result"; export interface AgentSetupStateOptions { agent: Agent | null; @@ -41,6 +42,7 @@ export interface AgentSetupStateOptions { export interface AgentSetupStateResult { session: Session | null; + stateResult: OnboardStateTransitionResult; } export async function handleAgentSetupState({ @@ -66,7 +68,7 @@ export async function handleAgentSetupState({ ); deps.ensureAgentDashboardForward(sandboxName, agent); session = await deps.recordStepSkipped("openclaw"); - return { session }; + return { session, stateResult: advanceTo("policies", { metadata: { state: "agent_setup" } }) }; } const resumeOpenclaw = resume && sandboxName && deps.isOpenclawReady(sandboxName); @@ -87,5 +89,5 @@ export async function handleAgentSetupState({ ); } session = await deps.recordStepSkipped("agent_setup"); - return { session }; + return { session, stateResult: advanceTo("policies", { metadata: { state: "openclaw" } }) }; } From 2296519e6d7875238bc02da68f9f0c0f97489b26 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 28 May 2026 09:33:07 -0700 Subject: [PATCH 11/26] refactor(onboard): make policy setup return FSM result Signed-off-by: Carlos Villela --- src/lib/onboard/machine/handlers/policies.test.ts | 14 +++++++++++++- src/lib/onboard/machine/handlers/policies.ts | 11 ++++++++++- 2 files changed, 23 insertions(+), 2 deletions(-) diff --git a/src/lib/onboard/machine/handlers/policies.test.ts b/src/lib/onboard/machine/handlers/policies.test.ts index f2865b1e5c..7ccf7cf12e 100644 --- a/src/lib/onboard/machine/handlers/policies.test.ts +++ b/src/lib/onboard/machine/handlers/policies.test.ts @@ -91,7 +91,7 @@ describe("handlePoliciesState", () => { it("runs compatible endpoint smoke before policy selection", async () => { const { deps, calls } = createDeps(); - await handlePoliciesState(baseOptions(deps)); + const result = await handlePoliciesState(baseOptions(deps)); expect(calls.smoke).toHaveBeenCalledWith({ sandboxName: "my-assistant", @@ -121,6 +121,13 @@ describe("handlePoliciesState", () => { "policies", expect.objectContaining({ policyPresets: ["npm"] }), ); + expect(result.stateResult).toEqual({ + type: "transition", + next: "finalizing", + transitionKind: "advance", + updates: undefined, + metadata: { state: "policies", policyPresets: ["npm"] }, + }); }); it("uses recorded messaging channels when no active selection exists", async () => { @@ -158,6 +165,11 @@ describe("handlePoliciesState", () => { expect.objectContaining({ policyPresets: ["npm"] }), ); expect(result.appliedPolicyPresets).toEqual(["npm"]); + expect(result.stateResult).toMatchObject({ + next: "finalizing", + transitionKind: "advance", + metadata: { policyPresets: ["npm"] }, + }); }); it("reconciles unsupported recorded presets before interactive setup", async () => { diff --git a/src/lib/onboard/machine/handlers/policies.ts b/src/lib/onboard/machine/handlers/policies.ts index 586a312abc..d0c7305171 100644 --- a/src/lib/onboard/machine/handlers/policies.ts +++ b/src/lib/onboard/machine/handlers/policies.ts @@ -2,6 +2,7 @@ // SPDX-License-Identifier: Apache-2.0 import type { Session, SessionUpdates } from "../../../state/onboard-session"; +import { advanceTo, type OnboardStateTransitionResult } from "../result"; // Inlined to avoid pulling sandbox-agent's transitive runner.ts deps into // the generic state handler. Matches normalizeSandboxAgentName: trim, @@ -99,6 +100,7 @@ export interface PoliciesStateResult { session: Session | null; recordedMessagingChannels: string[]; appliedPolicyPresets: string[]; + stateResult: OnboardStateTransitionResult; } export async function handlePoliciesState({ @@ -206,5 +208,12 @@ export async function handlePoliciesState({ ); } - return { session, recordedMessagingChannels, appliedPolicyPresets }; + return { + session, + recordedMessagingChannels, + appliedPolicyPresets, + stateResult: advanceTo("finalizing", { + metadata: { state: "policies", policyPresets: appliedPolicyPresets }, + }), + }; } From 67a9a1e26a3428e0c51f8d56a98b7711aad059f0 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 28 May 2026 09:35:40 -0700 Subject: [PATCH 12/26] refactor(onboard): make preflight and gateway return FSM results Signed-off-by: Carlos Villela --- src/lib/onboard/machine/handlers/gateway.test.ts | 7 +++++++ src/lib/onboard/machine/handlers/gateway.ts | 10 +++++++++- src/lib/onboard/machine/handlers/preflight.test.ts | 7 +++++++ src/lib/onboard/machine/handlers/preflight.ts | 5 +++++ 4 files changed, 28 insertions(+), 1 deletion(-) diff --git a/src/lib/onboard/machine/handlers/gateway.test.ts b/src/lib/onboard/machine/handlers/gateway.test.ts index 696e4940ac..b184fdb826 100644 --- a/src/lib/onboard/machine/handlers/gateway.test.ts +++ b/src/lib/onboard/machine/handlers/gateway.test.ts @@ -95,6 +95,13 @@ describe("handleGatewayState", () => { expect(calls.startGateway).toHaveBeenCalledWith({ type: "nvidia" }, { gpuPassthrough: true }); expect(calls.complete).toHaveBeenCalledWith("gateway"); expect(result.gatewayReuseState).toBe("missing"); + expect(result.stateResult).toEqual({ + type: "transition", + next: "provider_selection", + transitionKind: "advance", + updates: undefined, + metadata: { state: "gateway", gatewayReuseState: "missing" }, + }); }); it("reuses healthy gateways on fresh runs", async () => { diff --git a/src/lib/onboard/machine/handlers/gateway.ts b/src/lib/onboard/machine/handlers/gateway.ts index 461a19f924..6589db29cd 100644 --- a/src/lib/onboard/machine/handlers/gateway.ts +++ b/src/lib/onboard/machine/handlers/gateway.ts @@ -6,6 +6,7 @@ import type { GatewayReuseState } from "../../../state/gateway"; import type { Session } from "../../../state/onboard-session"; import type { GatewayContainerState } from "../../gateway-container-running"; import { withGatewayTrace } from "../../tracing"; +import { advanceTo, type OnboardStateTransitionResult } from "../result"; export interface GatewayStateOptions { resume: boolean; @@ -68,6 +69,7 @@ export interface GatewayStateOptions { export interface GatewayStateResult { gatewayReuseState: GatewayReuseState; session: Session | null; + stateResult: OnboardStateTransitionResult; } export async function handleGatewayState({ @@ -213,5 +215,11 @@ export async function handleGatewayState({ session = await deps.recordStepComplete("gateway"); } - return { gatewayReuseState, session }; + return { + gatewayReuseState, + session, + stateResult: advanceTo("provider_selection", { + metadata: { state: "gateway", gatewayReuseState }, + }), + }; } diff --git a/src/lib/onboard/machine/handlers/preflight.test.ts b/src/lib/onboard/machine/handlers/preflight.test.ts index f625a33de0..4b68f9b550 100644 --- a/src/lib/onboard/machine/handlers/preflight.test.ts +++ b/src/lib/onboard/machine/handlers/preflight.test.ts @@ -104,6 +104,13 @@ describe("handlePreflightState", () => { sandboxGpuDevice: "GPU-0", }); expect(result.gpuPassthrough).toBe(true); + expect(result.stateResult).toEqual({ + type: "transition", + next: "gateway", + transitionKind: "advance", + updates: undefined, + metadata: { state: "preflight", gpuPassthrough: true }, + }); }); it("skips full preflight on resume but re-detects GPU and revalidates CDI/sandbox GPU", async () => { diff --git a/src/lib/onboard/machine/handlers/preflight.ts b/src/lib/onboard/machine/handlers/preflight.ts index 599781119c..be28649cd8 100644 --- a/src/lib/onboard/machine/handlers/preflight.ts +++ b/src/lib/onboard/machine/handlers/preflight.ts @@ -3,6 +3,7 @@ import type { Session } from "../../../state/onboard-session"; import { withPreflightTrace } from "../../tracing"; +import { advanceTo, type OnboardStateTransitionResult } from "../result"; export type PreflightSandboxGpuFlag = "enable" | "disable" | null; @@ -86,6 +87,7 @@ export interface PreflightStateResult Date: Thu, 28 May 2026 09:38:19 -0700 Subject: [PATCH 13/26] refactor(onboard): make sandbox return branch FSM result Signed-off-by: Carlos Villela --- src/lib/onboard/machine/handlers/sandbox.test.ts | 7 +++++++ src/lib/onboard/machine/handlers/sandbox.ts | 9 +++++++++ 2 files changed, 16 insertions(+) diff --git a/src/lib/onboard/machine/handlers/sandbox.test.ts b/src/lib/onboard/machine/handlers/sandbox.test.ts index 52cf8a6db2..443166bc1e 100644 --- a/src/lib/onboard/machine/handlers/sandbox.test.ts +++ b/src/lib/onboard/machine/handlers/sandbox.test.ts @@ -153,6 +153,13 @@ describe("handleSandboxState", () => { expect(calls.setDefault).toHaveBeenCalledWith("my-assistant"); expect(calls.complete).toHaveBeenCalledWith("sandbox", expect.objectContaining({ sandboxName: "my-assistant" })); expect(result).toMatchObject({ sandboxName: "my-assistant", selectedMessagingChannels: ["telegram"], webSearchSupported: true }); + expect(result.stateResult).toEqual({ + type: "transition", + next: "openclaw", + transitionKind: "branch", + updates: undefined, + metadata: { state: "sandbox", sandboxName: "my-assistant", agent: "openclaw" }, + }); }); it("reuses a completed ready sandbox on resume", async () => { diff --git a/src/lib/onboard/machine/handlers/sandbox.ts b/src/lib/onboard/machine/handlers/sandbox.ts index efa5cf0adb..e7740fbf6d 100644 --- a/src/lib/onboard/machine/handlers/sandbox.ts +++ b/src/lib/onboard/machine/handlers/sandbox.ts @@ -3,6 +3,7 @@ import type { Session, SessionUpdates } from "../../../state/onboard-session"; import { withSandboxPhaseTrace } from "../../tracing"; +import { branchTo, type OnboardStateTransitionResult } from "../result"; export interface SandboxStateOptions { resume: boolean; @@ -98,6 +99,7 @@ export interface SandboxStateResult { selectedMessagingChannels: string[]; webSearchSupported: boolean; session: Session | null; + stateResult: OnboardStateTransitionResult; } function sameEffectiveTelegramRequireMention(left: boolean | null, right: boolean | null): boolean { @@ -335,5 +337,12 @@ export async function handleSandboxState Date: Thu, 28 May 2026 11:20:05 -0700 Subject: [PATCH 14/26] refactor(onboard): return FSM results from provider inference Signed-off-by: Carlos Villela --- .../handlers/provider-inference.test.ts | 23 +++++++++++++++++++ .../machine/handlers/provider-inference.ts | 18 +++++++++++++++ 2 files changed, 41 insertions(+) diff --git a/src/lib/onboard/machine/handlers/provider-inference.test.ts b/src/lib/onboard/machine/handlers/provider-inference.test.ts index 5414e898a5..2865973de1 100644 --- a/src/lib/onboard/machine/handlers/provider-inference.test.ts +++ b/src/lib/onboard/machine/handlers/provider-inference.test.ts @@ -157,6 +157,14 @@ describe("handleProviderInferenceState", () => { provider: "nvidia-prod", preferredInferenceApi: "openai-responses", }); + expect(result.stateResult).toEqual({ + type: "transition", + next: "sandbox", + transitionKind: "advance", + updates: undefined, + metadata: { state: "inference", provider: "nvidia-prod", model: "nvidia/test" }, + }); + expect(result.retryStateResults).toEqual([]); }); it("clears non-NVIDIA provider credentials when inference setup fails", async () => { @@ -347,6 +355,21 @@ describe("handleProviderInferenceState", () => { expect(setupInference).toHaveBeenCalledTimes(2); expect(result.model).toBe("good"); expect(calls.startStep).toHaveBeenCalledWith("provider_selection"); + expect(result.retryStateResults).toEqual([ + { + type: "transition", + next: "provider_selection", + transitionKind: "retry", + updates: undefined, + metadata: { + state: "inference", + provider: "nvidia-prod", + model: "bad", + reason: "selection_retry", + }, + }, + ]); + expect(result.stateResult).toMatchObject({ next: "sandbox", transitionKind: "advance" }); }); it("aborts before inference setup when the configuration summary is rejected", async () => { diff --git a/src/lib/onboard/machine/handlers/provider-inference.ts b/src/lib/onboard/machine/handlers/provider-inference.ts index 44d2cf5ed5..1a90147d2a 100644 --- a/src/lib/onboard/machine/handlers/provider-inference.ts +++ b/src/lib/onboard/machine/handlers/provider-inference.ts @@ -4,6 +4,7 @@ import type { WebSearchConfig } from "../../../inference/web-search"; import type { Session, SessionUpdates } from "../../../state/onboard-session"; import { withInferenceTrace, withProviderSelectionTrace } from "../../tracing"; +import { advanceTo, retryTo, type OnboardStateTransitionResult } from "../result"; export type ProviderInferenceRetry = { retry: "selection" } | { ok: true; retry?: undefined }; @@ -120,6 +121,8 @@ export interface ProviderInferenceStateResult { nimContainer: string | null; webSearchConfig: WebSearchConfig | null; session: Session | null; + stateResult: OnboardStateTransitionResult; + retryStateResults: OnboardStateTransitionResult[]; } function requireSelection( @@ -169,6 +172,7 @@ export async function handleProviderInferenceState({ const webSearchConfig = initial.webSearchConfig; let forceProviderSelection = initialForceProviderSelection; let allowToolsIncompatible = false; + const retryStateResults: OnboardStateTransitionResult[] = []; while (true) { let forceInferenceSetup = false; @@ -288,6 +292,11 @@ export async function handleProviderInferenceState({ clearStagedCredentialEnv(deps, credentialEnv); } if (inferenceResult?.retry === "selection") { + retryStateResults.push( + retryTo("provider_selection", { + metadata: { state: "inference", provider, model, reason: "selection_retry" }, + }), + ); forceProviderSelection = true; continue; } @@ -372,6 +381,11 @@ export async function handleProviderInferenceState({ clearStagedCredentialEnv(deps, credentialEnv); } if (inferenceResult?.retry === "selection") { + retryStateResults.push( + retryTo("provider_selection", { + metadata: { state: "inference", provider, model, reason: "selection_retry" }, + }), + ); forceProviderSelection = true; continue; } @@ -395,5 +409,9 @@ export async function handleProviderInferenceState({ nimContainer, webSearchConfig, session, + stateResult: advanceTo("sandbox", { + metadata: { state: "inference", provider, model }, + }), + retryStateResults, }; } From dbbb273a067af0faaf94b35509211b6c08a53b94 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 28 May 2026 11:23:11 -0700 Subject: [PATCH 15/26] refactor(onboard): add FSM runner shell Signed-off-by: Carlos Villela --- src/lib/onboard/machine/runner.test.ts | 158 +++++++++++++++++++++++++ src/lib/onboard/machine/runner.ts | 71 +++++++++++ 2 files changed, 229 insertions(+) create mode 100644 src/lib/onboard/machine/runner.test.ts create mode 100644 src/lib/onboard/machine/runner.ts diff --git a/src/lib/onboard/machine/runner.test.ts b/src/lib/onboard/machine/runner.test.ts new file mode 100644 index 0000000000..558960f618 --- /dev/null +++ b/src/lib/onboard/machine/runner.test.ts @@ -0,0 +1,158 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import { describe, expect, it, vi } from "vitest"; + +import { + createSession, + filterSafeUpdates, + normalizeSession, + sanitizeFailure, + type Session, + type SessionUpdates, +} from "../../state/onboard-session"; +import { advanceTo, branchTo, completeOnboardMachine, failOnboardMachine, retryTo } from "./result"; +import { OnboardRuntime, type OnboardRuntimeDeps } from "./runtime"; +import { + MissingOnboardStateHandlerError, + runOnboardMachine, + type OnboardStateHandlers, +} from "./runner"; + +interface RunnerContext { + attempts: number; + visited: string[]; +} + +function cloneSession(session: Session): Session { + return normalizeSession(JSON.parse(JSON.stringify(session))) ?? session; +} + +function createRuntime(initialSession: Session = createSession()) { + let session = cloneSession(initialSession); + const updateSession = (mutator: (value: Session) => Session | void): Session => { + const next = mutator(cloneSession(session)) ?? session; + session = cloneSession(next); + return cloneSession(session); + }; + const deps: OnboardRuntimeDeps = { + loadSession: () => cloneSession(session), + createSession, + saveSession: (next) => { + session = cloneSession(next); + return cloneSession(session); + }, + updateSession, + markStepStarted: () => cloneSession(session), + markStepComplete: (_stepName, updates: SessionUpdates = {}) => + updateSession((current) => { + Object.assign(current, filterSafeUpdates(updates)); + return current; + }), + markStepSkipped: () => cloneSession(session), + markStepFailed: (_stepName, message) => + updateSession((current) => { + current.status = "failed"; + current.failure = sanitizeFailure({ step: _stepName, message, recordedAt: "now" }); + return current; + }), + completeSession: (updates: SessionUpdates = {}) => + updateSession((current) => { + Object.assign(current, filterSafeUpdates(updates)); + current.status = "complete"; + current.resumable = false; + return current; + }), + filterSafeUpdates, + emitEvent: () => undefined, + now: () => "2026-05-28T00:00:00.000Z", + }; + return new OnboardRuntime(deps); +} + +describe("runOnboardMachine", () => { + it("runs handlers until completion while applying retry and branch transitions", async () => { + const runtime = createRuntime(); + const calls: string[] = []; + const handlers: OnboardStateHandlers = { + init: () => advanceTo("preflight"), + preflight: () => advanceTo("gateway"), + gateway: () => advanceTo("provider_selection"), + provider_selection: () => advanceTo("inference"), + inference: (context) => { + calls.push(`inference:${context.attempts}`); + return context.attempts === 0 ? retryTo("provider_selection") : advanceTo("sandbox"); + }, + sandbox: () => branchTo("openclaw"), + openclaw: () => advanceTo("policies"), + policies: () => advanceTo("finalizing"), + finalizing: () => advanceTo("post_verify"), + post_verify: () => completeOnboardMachine({ sandboxName: "my-assistant" }), + }; + + const result = await runOnboardMachine({ + context: { attempts: 0, visited: [] } as RunnerContext, + runtime, + handlers, + updateContext: ({ context, state }) => ({ + attempts: state === "inference" ? context.attempts + 1 : context.attempts, + visited: [...context.visited, state], + }), + }); + + expect(result.session).toMatchObject({ + status: "complete", + sandboxName: "my-assistant", + machine: { state: "complete" }, + }); + expect(calls).toEqual(["inference:0", "inference:1"]); + expect(result.context.visited).toEqual([ + "init", + "preflight", + "gateway", + "provider_selection", + "inference", + "provider_selection", + "inference", + "sandbox", + "openclaw", + "policies", + "finalizing", + "post_verify", + ]); + }); + + it("stops on failed terminal results", async () => { + const runtime = createRuntime(); + const policies = vi.fn(() => advanceTo("finalizing")); + + const result = await runOnboardMachine({ + context: { attempts: 0, visited: [] } as RunnerContext, + runtime, + handlers: { + init: () => advanceTo("preflight"), + preflight: () => failOnboardMachine("preflight failed", { step: "preflight" }), + policies, + }, + }); + + expect(result.session).toMatchObject({ + status: "failed", + failure: { step: "preflight", message: "preflight failed" }, + machine: { state: "failed" }, + }); + expect(policies).not.toHaveBeenCalled(); + }); + + it("throws when a non-terminal state has no handler", async () => { + const runtime = createRuntime(); + + await expect( + runOnboardMachine({ + context: { attempts: 0, visited: [] } as RunnerContext, + runtime, + handlers: {}, + }), + ).rejects.toThrow(MissingOnboardStateHandlerError); + }); +}); diff --git a/src/lib/onboard/machine/runner.ts b/src/lib/onboard/machine/runner.ts new file mode 100644 index 0000000000..5e4db4174d --- /dev/null +++ b/src/lib/onboard/machine/runner.ts @@ -0,0 +1,71 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import type { Session } from "../../state/onboard-session"; +import type { OnboardStateResult } from "./result"; +import { isTerminalOnboardMachineState } from "./transitions"; +import type { OnboardMachineState, OnboardNonTerminalMachineState } from "./types"; + +export type OnboardStateHandler = ( + context: Context, +) => Promise | OnboardStateResult; + +export type OnboardStateHandlers = Partial< + Record> +>; + +export interface OnboardMachineRunnerRuntime { + session(): Promise; + applyResult(result: OnboardStateResult): Promise; +} + +export interface OnboardMachineRunnerOptions { + context: Context; + runtime: OnboardMachineRunnerRuntime; + handlers: OnboardStateHandlers; + updateContext?(input: { + context: Context; + state: OnboardMachineState; + result: OnboardStateResult; + session: Session; + }): Context | Promise; +} + +export interface OnboardMachineRunnerResult { + context: Context; + session: Session; +} + +export class MissingOnboardStateHandlerError extends Error { + readonly state: OnboardNonTerminalMachineState; + + constructor(state: OnboardNonTerminalMachineState) { + super(`Missing onboarding machine handler for state: ${state}`); + this.name = "MissingOnboardStateHandlerError"; + this.state = state; + } +} + +export async function runOnboardMachine({ + context: initialContext, + runtime, + handlers, + updateContext, +}: OnboardMachineRunnerOptions): Promise> { + let context = initialContext; + let session = await runtime.session(); + + while (!isTerminalOnboardMachineState(session.machine.state)) { + const state = session.machine.state; + const handler = handlers[state as OnboardNonTerminalMachineState]; + if (!handler) throw new MissingOnboardStateHandlerError(state as OnboardNonTerminalMachineState); + + const result = await handler(context); + session = await runtime.applyResult(result); + context = updateContext + ? await updateContext({ context, state, result, session }) + : context; + } + + return { context, session }; +} From 6b27a0bd6638fa95e928fe5c34b0ed9533e67c0d Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 28 May 2026 11:28:31 -0700 Subject: [PATCH 16/26] refactor(onboard): consume handler FSM results compatibly Signed-off-by: Carlos Villela --- src/lib/onboard.ts | 8 ++++++ src/lib/onboard/runtime-boundary.test.ts | 36 ++++++++++++++++++++++++ src/lib/onboard/runtime-boundary.ts | 15 ++++++++++ 3 files changed, 59 insertions(+) diff --git a/src/lib/onboard.ts b/src/lib/onboard.ts index 7d26fae8c6..621a0f8aa1 100644 --- a/src/lib/onboard.ts +++ b/src/lib/onboard.ts @@ -6391,6 +6391,7 @@ const recordStateSkipped = onboardRuntimeBoundary.recordStateSkipped.bind(onboar const recordRepairEvent = onboardRuntimeBoundary.recordRepairEvent.bind(onboardRuntimeBoundary); const recordResumeConflict = onboardRuntimeBoundary.recordResumeConflict.bind(onboardRuntimeBoundary); const recordStateResult = onboardRuntimeBoundary.recordStateResult.bind(onboardRuntimeBoundary); +const recordStateResultWithStepCompatibility = onboardRuntimeBoundary.recordStateResultWithStepCompatibility.bind(onboardRuntimeBoundary); const recordPostVerifyStarted = onboardRuntimeBoundary.recordPostVerifyStarted.bind(onboardRuntimeBoundary); function skippedStepMessage( @@ -6790,6 +6791,7 @@ async function onboard(opts: OnboardOptions = {}): Promise { }, }); if (resume && _preflightDashboardPort === null) preflightDashboardPortRangeAvailability(); // #3953 — resume must mirror preflight()'s fail-fast + await recordStateResultWithStepCompatibility(preflightResult.stateResult); session = preflightResult.session; const { sandboxGpuConfig, @@ -6862,6 +6864,7 @@ async function onboard(opts: OnboardOptions = {}): Promise { exitProcess: (code) => process.exit(code), }, }); + await recordStateResultWithStepCompatibility(gatewayResult.stateResult); session = gatewayResult.session; // #2753: prefer requestedSandboxName over an unconfirmed session name. @@ -6943,6 +6946,7 @@ async function onboard(opts: OnboardOptions = {}): Promise { }, }, }); + await recordStateResultWithStepCompatibility(providerInferenceResult.stateResult); session = providerInferenceResult.session; sandboxName = providerInferenceResult.sandboxName; const { @@ -7019,6 +7023,7 @@ async function onboard(opts: OnboardOptions = {}): Promise { exitProcess: (code) => process.exit(code), }, }); + await recordStateResultWithStepCompatibility(sandboxStateResult.stateResult); session = sandboxStateResult.session; sandboxName = sandboxStateResult.sandboxName; webSearchConfig = sandboxStateResult.webSearchConfig ?? null; @@ -7061,6 +7066,7 @@ async function onboard(opts: OnboardOptions = {}): Promise { toSessionUpdates: (updates) => toSessionUpdates(updates as Parameters[0]), }, }); + await recordStateResultWithStepCompatibility(agentSetupResult.stateResult); session = agentSetupResult.session; const policiesResult = await handlePoliciesState({ @@ -7097,9 +7103,11 @@ async function onboard(opts: OnboardOptions = {}): Promise { toSessionUpdates: (updates) => toSessionUpdates(updates as Parameters[0]), }, }); + await recordStateResultWithStepCompatibility(policiesResult.stateResult); session = policiesResult.session; const finalizationResult = await handleFinalizationState({ + sandboxName, model, provider, diff --git a/src/lib/onboard/runtime-boundary.test.ts b/src/lib/onboard/runtime-boundary.test.ts index 21d6f1083e..89d671da2c 100644 --- a/src/lib/onboard/runtime-boundary.test.ts +++ b/src/lib/onboard/runtime-boundary.test.ts @@ -11,6 +11,7 @@ import { type SessionUpdates, } from "../state/onboard-session"; import type { OnboardMachineEvent } from "./machine/events"; +import { advanceTo } from "./machine/result"; import { OnboardRuntime, type OnboardRuntimeDeps } from "./machine/runtime"; import { OnboardRuntimeBoundary } from "./runtime-boundary"; @@ -92,6 +93,41 @@ describe("OnboardRuntimeBoundary", () => { expect(harness.events[1]).toMatchObject({ state: "init" }); }); + it("applies state results unless legacy step helpers already advanced the machine", async () => { + const harness = createRuntimeHarness(); + const boundary = new OnboardRuntimeBoundary({ + toSessionUpdates: (updates) => filterSafeUpdates(updates as SessionUpdates) as SessionUpdates, + maybeForceE2eStepFailure: () => undefined, + createRuntime: harness.createRuntime, + }); + + await boundary.recordStateResultWithStepCompatibility(advanceTo("preflight", { metadata: { state: "init" } })); + await boundary.recordStateResultWithStepCompatibility(advanceTo("preflight", { metadata: { state: "init" } })); + await boundary.recordStateResultWithStepCompatibility(advanceTo("gateway", { metadata: { state: "preflight" } })); + + expect(harness.events.map((event) => event.type)).toEqual([ + "state.exited", + "state.entered", + "state.exited", + "state.entered", + ]); + expect(harness.events[1]).toMatchObject({ state: "preflight" }); + expect(harness.events[3]).toMatchObject({ state: "gateway" }); + }); + + it("ignores stale compatible state results when legacy tests leave the machine behind", async () => { + const harness = createRuntimeHarness(); + const boundary = new OnboardRuntimeBoundary({ + toSessionUpdates: (updates) => filterSafeUpdates(updates as SessionUpdates) as SessionUpdates, + maybeForceE2eStepFailure: () => undefined, + createRuntime: harness.createRuntime, + }); + + await boundary.recordStateResultWithStepCompatibility(advanceTo("gateway", { metadata: { state: "preflight" } })); + + expect(harness.events).toEqual([]); + }); + it("records resume conflict diagnostics through the runtime", async () => { const harness = createRuntimeHarness(); const boundary = new OnboardRuntimeBoundary({ diff --git a/src/lib/onboard/runtime-boundary.ts b/src/lib/onboard/runtime-boundary.ts index 58a970dfdd..31a4dbbea8 100644 --- a/src/lib/onboard/runtime-boundary.ts +++ b/src/lib/onboard/runtime-boundary.ts @@ -40,6 +40,7 @@ export class OnboardRuntimeBoundary { recordRepairEvent: this.recordRepairEvent.bind(this), recordResumeConflict: this.recordResumeConflict.bind(this), recordStateResult: this.recordStateResult.bind(this), + recordStateResultWithStepCompatibility: this.recordStateResultWithStepCompatibility.bind(this), recordStepFailed: this.recordStepFailed.bind(this), recordPostVerifyStarted: this.recordPostVerifyStarted.bind(this), recordSessionComplete: this.recordSessionComplete.bind(this), @@ -90,6 +91,20 @@ export class OnboardRuntimeBoundary { return this.getRuntime().applyResult(result); } + async recordStateResultWithStepCompatibility(result: OnboardStateResult): Promise { + const runtime = this.getRuntime(); + const current = await runtime.session(); + if (result.type !== "transition") return runtime.applyResult(result); + + if (current.machine.state === result.next) return current; + + const sourceState = + result.metadata && typeof result.metadata.state === "string" ? result.metadata.state : null; + if (sourceState && current.machine.state !== sourceState) return current; + + return runtime.applyResult(result); + } + async recordResumeConflict(conflict: { field: string; recorded?: unknown; From 44009ad23b63cce9b465670d0943490eb121cfef Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 28 May 2026 11:53:44 -0700 Subject: [PATCH 17/26] refactor(onboard): allow step recording without machine transitions Signed-off-by: Carlos Villela --- src/lib/state/onboard-session.test.ts | 22 ++++++++++++++++++ src/lib/state/onboard-session.ts | 33 ++++++++++++++++++++++----- 2 files changed, 49 insertions(+), 6 deletions(-) diff --git a/src/lib/state/onboard-session.test.ts b/src/lib/state/onboard-session.test.ts index be35e8f73d..5c8b35f380 100644 --- a/src/lib/state/onboard-session.test.ts +++ b/src/lib/state/onboard-session.test.ts @@ -144,6 +144,28 @@ describe("onboard session", () => { expect(loaded.machine.state).toBe("failed"); }); + it("can record step boundaries without mutating the machine snapshot", () => { + session.saveSession(session.createSession()); + + session.markStepStarted("preflight", { updateMachine: false }); + let loaded = requireLoadedSession(session.loadSession()); + expect(loaded.steps.preflight.status).toBe("in_progress"); + expect(loaded.machine).toMatchObject({ state: "init", revision: 0 }); + + session.markStepComplete("preflight", { sandboxName: "my-assistant" }, { updateMachine: false }); + loaded = requireLoadedSession(session.loadSession()); + expect(loaded.steps.preflight.status).toBe("complete"); + expect(loaded.sandboxName).toBe("my-assistant"); + expect(loaded.machine).toMatchObject({ state: "init", revision: 0 }); + + session.markStepFailed("gateway", "Gateway failed", { updateMachine: false }); + loaded = requireLoadedSession(session.loadSession()); + expect(loaded.steps.gateway.status).toBe("failed"); + expect(loaded.status).toBe("failed"); + expect(loaded.failure).toMatchObject({ step: "gateway", message: "Gateway failed" }); + expect(loaded.machine).toMatchObject({ state: "init", revision: 0 }); + }); + it("persists a compact machine snapshot across step boundaries", () => { session.saveSession(session.createSession()); let loaded = requireLoadedSession(session.loadSession()); diff --git a/src/lib/state/onboard-session.ts b/src/lib/state/onboard-session.ts index 26cbf08353..d100f19ddb 100644 --- a/src/lib/state/onboard-session.ts +++ b/src/lib/state/onboard-session.ts @@ -1005,7 +1005,20 @@ export function updateSession(mutator: (session: Session) => Session | void): Se return saveSession(next); } -export function markStepStarted(stepName: string): Session { +export interface StepMutationOptions { + /** + * Transitional FSM migration escape hatch. The legacy step helpers own the + * durable machine snapshot by default; new runtime-driven paths can set this + * false so step status is recorded without advancing the machine. + */ + updateMachine?: boolean; +} + +function shouldUpdateMachine(options: StepMutationOptions | undefined): boolean { + return options?.updateMachine !== false; +} + +export function markStepStarted(stepName: string, options: StepMutationOptions = {}): Session { let shouldEmit = false; const updatedSession = updateSession((session) => { const step = session.steps[stepName]; @@ -1019,7 +1032,7 @@ export function markStepStarted(stepName: string): Session { session.failure = null; session.status = "in_progress"; const state = machineStateFromOnboardSessionStep(stepName); - if (state) transitionMachineSnapshot(session, state, now); + if (state && shouldUpdateMachine(options)) transitionMachineSnapshot(session, state, now); shouldEmit = true; return session; }); @@ -1031,7 +1044,11 @@ export function markStepStarted(stepName: string): Session { return updatedSession; } -export function markStepComplete(stepName: string, updates: SessionUpdates = {}): Session { +export function markStepComplete( + stepName: string, + updates: SessionUpdates = {}, + options: StepMutationOptions = {}, +): Session { const safeUpdates = filterSafeUpdates(updates); let shouldEmit = false; const updatedSession = updateSession((session) => { @@ -1045,7 +1062,7 @@ export function markStepComplete(stepName: string, updates: SessionUpdates = {}) session.failure = null; Object.assign(session, safeUpdates); const nextState = nextMachineStateAfterCompletedStep(stepName, session); - if (nextState) transitionMachineSnapshot(session, nextState, now); + if (nextState && shouldUpdateMachine(options)) transitionMachineSnapshot(session, nextState, now); shouldEmit = true; return session; }); @@ -1088,7 +1105,11 @@ export function markStepSkipped(stepName: string): Session { return updatedSession; } -export function markStepFailed(stepName: string, message: string | null = null): Session { +export function markStepFailed( + stepName: string, + message: string | null = null, + options: StepMutationOptions = {}, +): Session { let shouldEmit = false; const updatedSession = updateSession((session) => { const step = session.steps[stepName]; @@ -1103,7 +1124,7 @@ export function markStepFailed(stepName: string, message: string | null = null): recordedAt: now, }); session.status = "failed"; - transitionMachineSnapshot(session, "failed", now); + if (shouldUpdateMachine(options)) transitionMachineSnapshot(session, "failed", now); shouldEmit = true; return session; }); From cd6e5f720366a2b50a55f88a95a0b010356f6053 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 28 May 2026 11:56:24 -0700 Subject: [PATCH 18/26] refactor(onboard): plumb step mutation options through runtime Signed-off-by: Carlos Villela --- src/lib/onboard/machine/runtime.test.ts | 17 ++++++++++++++ src/lib/onboard/machine/runtime.ts | 31 +++++++++++++++++-------- src/lib/onboard/runtime-boundary.ts | 9 +++---- 3 files changed, 43 insertions(+), 14 deletions(-) diff --git a/src/lib/onboard/machine/runtime.test.ts b/src/lib/onboard/machine/runtime.test.ts index 512ee7f56b..76ea68d4a9 100644 --- a/src/lib/onboard/machine/runtime.test.ts +++ b/src/lib/onboard/machine/runtime.test.ts @@ -123,6 +123,23 @@ describe("OnboardRuntime", () => { expect(events[1]).toMatchObject({ type: "onboard.resumed", state: "init" }); }); + it("forwards step mutation options to step recording dependencies", async () => { + const { runtime, getSession } = createHarness(); + + await runtime.markStepStarted("preflight", { updateMachine: false }); + await runtime.markStepComplete("preflight", { sandboxName: "my-assistant" }, { updateMachine: false }); + await runtime.markStepFailed("gateway", "boom", { updateMachine: false }); + + expect(getSession()).toMatchObject({ + sandboxName: "my-assistant", + status: "failed", + steps: { + preflight: { status: "complete" }, + gateway: { status: "failed" }, + }, + }); + }); + it("validates and persists explicit transitions", async () => { const { runtime, events, getSession } = createHarness(); diff --git a/src/lib/onboard/machine/runtime.ts b/src/lib/onboard/machine/runtime.ts index 47cee9f0d2..8ff35afbb4 100644 --- a/src/lib/onboard/machine/runtime.ts +++ b/src/lib/onboard/machine/runtime.ts @@ -3,7 +3,7 @@ import type { JsonObject } from "../../core/json-types"; import * as onboardSession from "../../state/onboard-session"; -import type { Session, SessionUpdates } from "../../state/onboard-session"; +import type { Session, SessionUpdates, StepMutationOptions } from "../../state/onboard-session"; import { createOnboardMachineEvent, emitOnboardMachineEvent, @@ -22,10 +22,10 @@ export interface OnboardRuntimeDeps { createSession(overrides?: Partial): Session; saveSession(session: Session): Session; updateSession(mutator: (session: Session) => Session | void): Session; - markStepStarted(stepName: string): Session; - markStepComplete(stepName: string, updates?: SessionUpdates): Session; + markStepStarted(stepName: string, options?: StepMutationOptions): Session; + markStepComplete(stepName: string, updates?: SessionUpdates, options?: StepMutationOptions): Session; markStepSkipped(stepName: string): Session; - markStepFailed(stepName: string, message?: string | null): Session; + markStepFailed(stepName: string, message?: string | null, options?: StepMutationOptions): Session; completeSession(updates?: SessionUpdates): Session; filterSafeUpdates(updates: SessionUpdates): Partial; emitEvent(event: OnboardMachineEvent): void; @@ -102,20 +102,31 @@ export class OnboardRuntime { return session; } - async markStepStarted(stepName: string): Promise { - return this.deps.markStepStarted(stepName); + async markStepStarted( + stepName: string, + options: StepMutationOptions = {}, + ): Promise { + return this.deps.markStepStarted(stepName, options); } - async markStepComplete(stepName: string, updates: SessionUpdates = {}): Promise { - return this.deps.markStepComplete(stepName, updates); + async markStepComplete( + stepName: string, + updates: SessionUpdates = {}, + options: StepMutationOptions = {}, + ): Promise { + return this.deps.markStepComplete(stepName, updates, options); } async markStepSkipped(stepName: string): Promise { return this.deps.markStepSkipped(stepName); } - async markStepFailed(stepName: string, message: string | null = null): Promise { - return this.deps.markStepFailed(stepName, message); + async markStepFailed( + stepName: string, + message: string | null = null, + options: StepMutationOptions = {}, + ): Promise { + return this.deps.markStepFailed(stepName, message, options); } async completeSession(updates: SessionUpdates = {}): Promise { diff --git a/src/lib/onboard/runtime-boundary.ts b/src/lib/onboard/runtime-boundary.ts index 31a4dbbea8..609a610b91 100644 --- a/src/lib/onboard/runtime-boundary.ts +++ b/src/lib/onboard/runtime-boundary.ts @@ -1,7 +1,7 @@ // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. // SPDX-License-Identifier: Apache-2.0 -import type { Session, SessionUpdates } from "../state/onboard-session"; +import type { Session, SessionUpdates, StepMutationOptions } from "../state/onboard-session"; import type { OnboardStateResult } from "./machine/result"; import { OnboardRuntime } from "./machine/runtime"; import type { OnboardMachineEventType, OnboardMachineState } from "./machine/types"; @@ -10,6 +10,7 @@ export interface OnboardRuntimeBoundaryOptions { toSessionUpdates(updates: Record): SessionUpdates; maybeForceE2eStepFailure(stepName: string): void; createRuntime?(): OnboardRuntime; + stepMutationOptions?: StepMutationOptions; } export class OnboardRuntimeBoundary { @@ -61,7 +62,7 @@ export class OnboardRuntimeBoundary { } = {}, ): Promise { const runtime = this.getRuntime(); - await runtime.markStepStarted(stepName); + await runtime.markStepStarted(stepName, this.options.stepMutationOptions); if (Object.keys(updates).length > 0) { await runtime.updateContext(this.options.toSessionUpdates(updates)); } @@ -69,7 +70,7 @@ export class OnboardRuntimeBoundary { } async recordStepComplete(stepName: string, updates: SessionUpdates = {}): Promise { - return this.getRuntime().markStepComplete(stepName, updates); + return this.getRuntime().markStepComplete(stepName, updates, this.options.stepMutationOptions); } async recordStepSkipped(stepName: string): Promise { @@ -77,7 +78,7 @@ export class OnboardRuntimeBoundary { } async recordStepFailed(stepName: string, message: string | null): Promise { - return this.getRuntime().markStepFailed(stepName, message); + return this.getRuntime().markStepFailed(stepName, message, this.options.stepMutationOptions); } async recordStateSkipped( From e266e3b53d9147b88c8a5bcf448583119206c379 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 28 May 2026 12:24:22 -0700 Subject: [PATCH 19/26] refactor(onboard): add record-only FSM runner adapter Signed-off-by: Carlos Villela --- .../machine/record-only-runner.test.ts | 148 ++++++++++++++++++ src/lib/onboard/machine/record-only-runner.ts | 54 +++++++ 2 files changed, 202 insertions(+) create mode 100644 src/lib/onboard/machine/record-only-runner.test.ts create mode 100644 src/lib/onboard/machine/record-only-runner.ts diff --git a/src/lib/onboard/machine/record-only-runner.test.ts b/src/lib/onboard/machine/record-only-runner.test.ts new file mode 100644 index 0000000000..968132506e --- /dev/null +++ b/src/lib/onboard/machine/record-only-runner.test.ts @@ -0,0 +1,148 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import { describe, expect, it } from "vitest"; + +import { + createSession, + filterSafeUpdates, + normalizeSession, + type Session, + type SessionUpdates, + type StepMutationOptions, +} from "../../state/onboard-session"; +import type { OnboardMachineEvent } from "./events"; +import { advanceTo, branchTo, completeOnboardMachine } from "./result"; +import { OnboardRuntime, type OnboardRuntimeDeps } from "./runtime"; +import { + createRecordOnlyOnboardRuntimeBoundary, + runOnboardMachineWithRecordOnlySteps, +} from "./record-only-runner"; + +function cloneSession(session: Session): Session { + return normalizeSession(JSON.parse(JSON.stringify(session))) ?? session; +} + +function createHarness() { + let session = createSession(); + const events: OnboardMachineEvent[] = []; + + const updateSession = (mutator: (value: Session) => Session | void): Session => { + session = cloneSession(mutator(cloneSession(session)) ?? session); + return cloneSession(session); + }; + const maybeLegacyTransition = (state: Session["machine"]["state"], options?: StepMutationOptions) => { + if (options?.updateMachine === false) return; + session.machine = { + version: 1, + state, + stateEnteredAt: "legacy-step-transition", + revision: session.machine.revision + 1, + }; + }; + + const deps: OnboardRuntimeDeps = { + loadSession: () => cloneSession(session), + createSession, + saveSession: (next) => { + session = cloneSession(next); + return cloneSession(session); + }, + updateSession, + markStepStarted: (stepName: string, options?: StepMutationOptions) => + updateSession((current) => { + current.steps[stepName].status = "in_progress"; + if (stepName === "preflight") maybeLegacyTransition("preflight", options); + if (stepName === "gateway") maybeLegacyTransition("gateway", options); + return current; + }), + markStepComplete: (stepName: string, updates: SessionUpdates = {}, options?: StepMutationOptions) => + updateSession((current) => { + current.steps[stepName].status = "complete"; + Object.assign(current, filterSafeUpdates(updates)); + if (stepName === "preflight") maybeLegacyTransition("gateway", options); + if (stepName === "gateway") maybeLegacyTransition("provider_selection", options); + return current; + }), + markStepSkipped: (stepName) => + updateSession((current) => { + current.steps[stepName].status = "skipped"; + return current; + }), + markStepFailed: (stepName, message) => + updateSession((current) => { + current.steps[stepName].status = "failed"; + current.failure = { step: stepName, message: message ?? null, recordedAt: "now" }; + return current; + }), + completeSession: (updates: SessionUpdates = {}) => + updateSession((current) => { + Object.assign(current, filterSafeUpdates(updates)); + current.status = "complete"; + return current; + }), + filterSafeUpdates, + emitEvent: (event) => events.push(event), + now: () => "2026-05-28T00:00:00.000Z", + }; + + return { + events, + getSession: () => cloneSession(session), + boundary: createRecordOnlyOnboardRuntimeBoundary({ + toSessionUpdates: (updates) => filterSafeUpdates(updates as SessionUpdates) as SessionUpdates, + maybeForceE2eStepFailure: () => undefined, + createRuntime: () => new OnboardRuntime(deps), + }), + }; +} + +describe("record-only onboard runner", () => { + it("lets handlers record steps while the runner owns machine transitions", async () => { + const harness = createHarness(); + const recorders = harness.boundary.recorders(); + + const result = await runOnboardMachineWithRecordOnlySteps({ + boundary: harness.boundary, + context: { visited: [] as string[] }, + handlers: { + init: () => advanceTo("preflight"), + preflight: async () => { + await recorders.startRecordedStep("preflight"); + expect(harness.getSession().machine.state).toBe("preflight"); + await recorders.recordStepComplete("preflight"); + expect(harness.getSession().machine.state).toBe("preflight"); + return advanceTo("gateway"); + }, + gateway: async () => { + await recorders.startRecordedStep("gateway"); + expect(harness.getSession().machine.state).toBe("gateway"); + await recorders.recordStepComplete("gateway"); + expect(harness.getSession().machine.state).toBe("gateway"); + return advanceTo("provider_selection"); + }, + provider_selection: () => advanceTo("inference"), + inference: () => advanceTo("sandbox"), + sandbox: () => branchTo("openclaw"), + openclaw: () => advanceTo("policies"), + policies: () => advanceTo("finalizing"), + finalizing: () => advanceTo("post_verify"), + post_verify: () => completeOnboardMachine({ sandboxName: "my-assistant" }), + }, + updateContext: ({ context, state }) => ({ visited: [...context.visited, state] }), + }); + + expect(result.session).toMatchObject({ + status: "complete", + sandboxName: "my-assistant", + machine: { state: "complete" }, + steps: { + preflight: { status: "complete" }, + gateway: { status: "complete" }, + }, + }); + expect(result.context.visited).toContain("preflight"); + expect(result.context.visited).toContain("gateway"); + expect(harness.events.map((event) => event.type)).toContain("onboard.started"); + }); +}); diff --git a/src/lib/onboard/machine/record-only-runner.ts b/src/lib/onboard/machine/record-only-runner.ts new file mode 100644 index 0000000000..9490d76326 --- /dev/null +++ b/src/lib/onboard/machine/record-only-runner.ts @@ -0,0 +1,54 @@ +// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +// SPDX-License-Identifier: Apache-2.0 + +import type { StepMutationOptions } from "../../state/onboard-session"; +import { OnboardRuntimeBoundary, type OnboardRuntimeBoundaryOptions } from "../runtime-boundary"; +import { + runOnboardMachine, + type OnboardMachineRunnerOptions, + type OnboardMachineRunnerResult, +} from "./runner"; + +export type RecordOnlyOnboardRuntimeBoundaryOptions = Omit< + OnboardRuntimeBoundaryOptions, + "stepMutationOptions" +> & { + stepMutationOptions?: Omit; +}; + +export interface RecordOnlyOnboardMachineRunnerOptions + extends Omit, "runtime"> { + boundary: OnboardRuntimeBoundary; + resumed?: boolean; + emitLifecycleEvent?: boolean; +} + +export function createRecordOnlyOnboardRuntimeBoundary( + options: RecordOnlyOnboardRuntimeBoundaryOptions, +): OnboardRuntimeBoundary { + return new OnboardRuntimeBoundary({ + ...options, + stepMutationOptions: { ...options.stepMutationOptions, updateMachine: false }, + }); +} + +/** + * Run the FSM with step recorders configured for status-only mutations. + * + * This is the adapter path for the post-legacy architecture: handlers may keep + * using step boundary helpers for resumability, but those helpers do not move + * `session.machine`; the runner applies every machine transition explicitly via + * `OnboardRuntime.applyResult()`. + */ +export async function runOnboardMachineWithRecordOnlySteps({ + boundary, + resumed = false, + emitLifecycleEvent = true, + ...options +}: RecordOnlyOnboardMachineRunnerOptions): Promise> { + if (emitLifecycleEvent) await boundary.recordOnboardStarted(resumed); + return runOnboardMachine({ + ...options, + runtime: boundary.getRuntime(), + }); +} From 796ed7be16f14d097bc3a87544072c5092aeb4f2 Mon Sep 17 00:00:00 2001 From: Carlos Villela Date: Thu, 4 Jun 2026 14:05:47 -0700 Subject: [PATCH 20/26] refactor(onboard): address FSM runner review feedback Signed-off-by: Carlos Villela --- .agents/catalog-skills.yaml | 40 - .../nemoclaw-contributor-update-docs/SKILL.md | 11 +- .../SKILL.md | 257 +- .../SKILL.md | 174 ++ .../reference/candidate-selection.md | 2 +- .../reference/scoring-comments-and-logging.md | 4 +- .agents/skills/nemoclaw-skills-guide/SKILL.md | 20 +- .../nemoclaw-user-agent-skills/SKILL.md | 86 +- .../evals/evals.json | 11 + .../SKILL.md | 337 +-- .../evals/evals.json | 11 + .../references/inference-options.md | 332 ++- .../references/set-up-sub-agent.md | 34 +- .../references/switch-inference-providers.md | 184 +- .../references/tool-calling-reliability.md | 64 +- .../nemoclaw-user-configure-security/SKILL.md | 8 +- .../evals/evals.json | 11 + .../references/best-practices.md | 134 +- .../references/credential-storage.md | 39 +- .../references/openclaw-controls.md | 14 +- .../nemoclaw-user-deploy-remote/SKILL.md | 98 +- .../evals/evals.json | 11 + .../references/brev-web-ui.md | 16 +- .../references/install-openclaw-plugins.md | 54 +- .../references/sandbox-hardening.md | 61 +- .../skills/nemoclaw-user-get-started/SKILL.md | 212 +- .../evals/evals.json | 11 + .../references/prerequisites.md | 31 +- .../references/quickstart-hermes.md | 121 +- .../references/windows-preparation.md | 50 +- .../nemoclaw-user-manage-policy/SKILL.md | 124 +- .../evals/evals.json | 11 + .../references/approve-network-requests.md | 18 +- .../references/integration-policy-examples.md | 232 +- .../nemoclaw-user-manage-sandboxes/SKILL.md | 217 +- .../evals/evals.json | 11 + .../references/backup-restore.md | 206 +- .../references/install-plugins-hermes.md | 114 + .../references/messaging-channels.md | 172 +- .../references/runtime-controls.md | 80 +- .../references/workspace-files.md | 55 +- .../nemoclaw-user-monitor-sandbox/SKILL.md | 75 +- .../evals/evals.json | 11 + .../skills/nemoclaw-user-overview/SKILL.md | 11 +- .../nemoclaw-user-overview/evals/evals.json | 11 + .../references/ecosystem-hermes.md | 93 + .../references/ecosystem.md | 16 +- .../references/how-it-works.md | 69 +- .../references/overview.md | 50 +- .../references/release-notes.md | 84 +- .../skills/nemoclaw-user-reference/SKILL.md | 14 +- .../nemoclaw-user-reference/evals/evals.json | 11 + .../references/architecture.md | 78 +- .../references/cli-selection-guide.md | 128 +- .../references/commands.md | 1011 ++++++-- .../references/network-policies.md | 25 +- .../references/troubleshooting.md | 582 +++-- .coderabbit.yaml | 35 + .github/CODEOWNERS | 1 - .github/catalog-skills-signing-flow.md | 100 +- .github/pr-bodies/catalog-skills-refresh.md | 15 - .github/workflows/base-image.yaml | 8 +- .github/workflows/catalog-skills-refresh.yaml | 142 -- .github/workflows/code-scanning.yaml | 10 +- ...t.yaml => codebase-growth-guardrails.yaml} | 40 +- .github/workflows/commit-lint.yaml | 2 +- .github/workflows/dco-check.yaml | 2 +- .github/workflows/docker-pin-check.yaml | 2 +- .github/workflows/docs-cli-parity-pr.yaml | 2 +- .github/workflows/docs-links-pr.yaml | 6 +- .github/workflows/docs-preview-pr.yaml | 16 +- .github/workflows/e2e-advisor.yaml | 15 +- .github/workflows/e2e-branch-validation.yaml | 2 +- .github/workflows/e2e-scenarios-all.yaml | 105 +- .github/workflows/e2e-scenarios.yaml | 186 +- .github/workflows/e2e-script.yaml | 43 +- .github/workflows/installer-hash-check.yaml | 2 +- .github/workflows/macos-e2e.yaml | 2 +- .github/workflows/main.yaml | 2 +- .github/workflows/nightly-e2e.yaml | 603 +++-- .github/workflows/ollama-proxy-e2e.yaml | 2 +- .github/workflows/platform-vitest-main.yaml | 4 +- .github/workflows/pr-review-advisor.yaml | 6 +- .github/workflows/pr-self-hosted.yaml | 14 +- .github/workflows/pr.yaml | 9 +- .github/workflows/regression-e2e.yaml | 104 +- .github/workflows/release-latest-tag.yaml | 39 + .github/workflows/sandbox-images-and-e2e.yaml | 12 +- .github/workflows/wsl-e2e.yaml | 2 +- .markdownlint-cli2.yaml | 2 + .pre-commit-config.yaml | 17 +- AGENTS.md | 4 +- CONTRIBUTING.md | 56 +- Dockerfile | 142 +- Dockerfile.base | 49 +- README.md | 287 +-- agents/hermes/Dockerfile | 26 +- agents/hermes/Dockerfile.base | 26 +- agents/hermes/config/hermes-config.ts | 41 +- agents/hermes/manifest.yaml | 12 + agents/hermes/plugin/__init__.py | 196 +- agents/hermes/policy-additions.yaml | 11 +- agents/hermes/policy-permissive.yaml | 21 - agents/hermes/start.sh | 274 ++- agents/openclaw/policy-permissive.yaml | 21 - biome.json | 4 + ci/platform-matrix.json | 2 +- docs/CONTRIBUTING.md | 87 +- docs/_components/AgentGuide.tsx | 124 + docs/about/ecosystem-hermes.mdx | 102 + docs/about/ecosystem.mdx | 16 +- docs/about/how-it-works.mdx | 77 +- docs/about/overview.mdx | 58 +- docs/about/release-notes.mdx | 146 +- docs/deployment/brev-web-ui.mdx | 22 +- docs/deployment/deploy-to-remote-gpu.mdx | 96 +- docs/deployment/install-openclaw-plugins.mdx | 52 +- docs/deployment/sandbox-hardening.mdx | 59 +- docs/get-started/prerequisites.mdx | 39 +- docs/get-started/quickstart-hermes.mdx | 129 +- docs/get-started/quickstart.mdx | 239 +- docs/get-started/windows-preparation.mdx | 50 +- docs/index.mdx | 128 +- docs/index.yml | 444 ++-- docs/inference/inference-options.mdx | 342 ++- docs/inference/set-up-sub-agent.mdx | 36 +- docs/inference/switch-inference-providers.mdx | 203 +- docs/inference/tool-calling-reliability.mdx | 64 +- docs/inference/use-local-inference.mdx | 353 +-- docs/manage-sandboxes/backup-restore.mdx | 210 +- .../install-plugins-hermes.mdx | 125 + docs/manage-sandboxes/lifecycle.mdx | 237 +- docs/manage-sandboxes/messaging-channels.mdx | 210 +- docs/manage-sandboxes/runtime-controls.mdx | 86 +- docs/manage-sandboxes/workspace-files.mdx | 81 +- docs/monitoring/monitor-sandbox-activity.mdx | 89 +- .../approve-network-requests.mdx | 20 +- .../customize-network-policy.mdx | 142 +- .../integration-policy-examples.mdx | 232 +- docs/reference/architecture.mdx | 98 +- docs/reference/cli-selection-guide.mdx | 160 +- docs/reference/commands-nemohermes.mdx | 1682 ++++++++++++++ docs/reference/commands.mdx | 1238 +++++++--- docs/reference/network-policies.mdx | 27 +- docs/reference/troubleshooting.mdx | 667 +++--- docs/resources/agent-skills.mdx | 12 +- docs/security/best-practices.mdx | 142 +- docs/security/credential-storage.mdx | 51 +- docs/security/openclaw-controls.mdx | 12 +- fern/docs.yml | 99 +- fern/fern.config.json | 2 +- install.sh | 13 +- .../policies/openclaw-sandbox-permissive.yaml | 21 - .../policies/presets/claude-code.yaml | 41 + .../openclaw-diagnostics-otel-local.yaml | 27 + nemoclaw-blueprint/policies/presets/pypi.yaml | 3 + .../scripts/nemotron-inference-fix.js | 259 ++- .../scripts/telegram-diagnostics.js | 163 +- .../scripts/whatsapp-qr-compact.js | 86 + nemoclaw/openclaw.plugin.json | 9 + nemoclaw/src/blueprint/runner.test.ts | 31 +- nemoclaw/src/blueprint/runner.ts | 18 +- nemoclaw/src/index.ts | 10 +- nemoclaw/src/register.test.ts | 60 +- .../src/security/safe-resolve-path.test.ts | 49 + nemoclaw/src/security/safe-resolve-path.ts | 65 + nemoclaw/src/security/secret-scanner.test.ts | 63 +- nemoclaw/src/security/secret-scanner.ts | 84 +- package.json | 10 +- schemas/openclaw-plugin.schema.json | 43 +- ...ld.js => benchmark-sandbox-image-build.ts} | 100 +- scripts/bootstrap-windows.ps1 | 341 ++- scripts/bump-version.ts | 2 + scripts/convert-docs-to-fern.mjs | 341 --- ...-tier-selector.js => dev-tier-selector.ts} | 42 +- scripts/docs-to-skills.py | 540 +++-- scripts/export-catalog-skills.py | 407 ---- scripts/generate-openclaw-config.mts | 1242 ++++++++++ scripts/generate-openclaw-config.py | 956 -------- scripts/generate-platform-docs.py | 11 +- scripts/install-openshell.sh | 11 +- scripts/install.sh | 177 +- scripts/lib/runtime.sh | 47 - scripts/lib/sandbox-init.sh | 197 +- scripts/nemoclaw-start.sh | 891 ++++++- scripts/openclaw-build-messaging-plugins.py | 184 ++ scripts/release-cut-tag.sh | 95 + scripts/release-latest-tag.sh | 111 + scripts/release-notes-data.ts | 175 ++ scripts/release-plan.ts | 255 ++ scripts/release-wait-latest.sh | 102 + scripts/scorecard/build-slack-blocks.ts | 204 ++ scripts/seed-wechat-accounts.py | 6 +- scripts/setup-spark.sh | 172 -- scripts/sync-agent-variant-docs.ts | 477 ++++ ...fern-preview.mjs => watch-fern-preview.ts} | 73 +- skills/README.md | 19 + skills/catalog-metadata.json | 24 + .../nemoclaw-user-agent-skills/BENCHMARK.md | 64 + skills/nemoclaw-user-agent-skills/SKILL.md | 10 + .../evals/evals.json | 20 + .../references/agent-skills.md | 2 - .../nemoclaw-user-agent-skills/skill-card.md | 51 + .../nemoclaw-user-agent-skills/skill.oms.sig | 1 + .../BENCHMARK.md | 75 + .../SKILL.md | 285 +++ .../evals/evals.json | 92 + .../references/inference-options.md | 142 ++ .../references/set-up-sub-agent.md | 120 + .../references/switch-inference-providers.md | 207 ++ .../references/tool-calling-reliability.md | 164 ++ .../references/use-local-inference-details.md | 151 ++ .../skill-card.md | 52 + .../skill.oms.sig | 1 + .../BENCHMARK.md | 67 + .../nemoclaw-user-configure-security/SKILL.md | 16 + .../evals/evals.json | 56 + .../references/best-practices.md | 511 ++++ .../references/credential-storage.md | 110 + .../references/openclaw-controls.md | 121 + .../skill-card.md | 51 + .../skill.oms.sig | 1 + .../nemoclaw-user-deploy-remote/BENCHMARK.md | 70 + skills/nemoclaw-user-deploy-remote/SKILL.md | 177 ++ .../evals/evals.json | 74 + .../references/brev-web-ui.md | 155 ++ .../references/install-openclaw-plugins.md | 93 + .../references/sandbox-hardening.md | 127 + .../nemoclaw-user-deploy-remote/skill-card.md | 50 + .../nemoclaw-user-deploy-remote/skill.oms.sig | 1 + skills/nemoclaw-user-get-started/BENCHMARK.md | 64 + skills/nemoclaw-user-get-started/SKILL.md | 192 ++ .../evals/evals.json | 74 + .../references/prerequisites.md | 67 + .../references/quickstart-details.md | 165 ++ .../references/quickstart-hermes.md | 168 ++ .../references/windows-preparation.md | 144 ++ .../nemoclaw-user-get-started/skill-card.md | 51 + .../nemoclaw-user-get-started/skill.oms.sig | 1 + .../nemoclaw-user-manage-policy/BENCHMARK.md | 67 + skills/nemoclaw-user-manage-policy/SKILL.md | 296 +++ .../evals/evals.json | 56 + .../references/approve-network-requests.md | 64 + .../customize-network-policy-details.md | 13 + .../references/integration-policy-examples.md | 336 +++ .../nemoclaw-user-manage-policy/skill-card.md | 55 + .../nemoclaw-user-manage-policy/skill.oms.sig | 1 + .../BENCHMARK.md | 69 + .../nemoclaw-user-manage-sandboxes/SKILL.md | 266 +++ .../evals/evals.json | 92 + .../references/backup-restore.md | 166 ++ .../references/lifecycle-details.md | 26 + .../references/messaging-channels.md | 285 +++ .../references/runtime-controls.md | 41 + .../references/workspace-files.md | 105 + .../skill-card.md | 52 + .../skill.oms.sig | 1 + .../BENCHMARK.md | 64 + skills/nemoclaw-user-monitor-sandbox/SKILL.md | 93 + .../evals/evals.json | 20 + .../skill-card.md | 51 + .../skill.oms.sig | 1 + skills/nemoclaw-user-overview/BENCHMARK.md | 68 + skills/nemoclaw-user-overview/SKILL.md | 17 + .../nemoclaw-user-overview/evals/evals.json | 92 + .../references/ecosystem.md | 94 + .../references/how-it-works.md | 104 + .../nemoclaw-highlevel-component-diagram.png | Bin 0 -> 880587 bytes .../references/overview.md | 63 + .../references/release-notes.md | 244 ++ skills/nemoclaw-user-overview/skill-card.md | 51 + skills/nemoclaw-user-overview/skill.oms.sig | 1 + skills/nemoclaw-user-reference/BENCHMARK.md | 72 + skills/nemoclaw-user-reference/SKILL.md | 18 + .../nemoclaw-user-reference/evals/evals.json | 92 + .../references/architecture.md | 260 +++ .../references/cli-selection-guide.md | 205 ++ .../references/commands.md | 1381 +++++++++++ .../references/network-policies.md | 125 + .../references/troubleshooting.md | 1434 ++++++++++++ skills/nemoclaw-user-reference/skill-card.md | 55 + skills/nemoclaw-user-reference/skill.oms.sig | 1 + src/commands/debug.ts | 11 + .../global-oclif-command-adapters.test.ts | 45 + src/commands/inference/set.ts | 27 +- src/commands/list.ts | 8 +- src/commands/sandbox/agents.ts | 24 + src/commands/sandbox/agents/add.ts | 36 + src/commands/sandbox/agents/delete.ts | 36 + src/commands/sandbox/channels/add.ts | 6 +- src/commands/sandbox/channels/mutate.test.ts | 48 + src/commands/sandbox/channels/status.ts | 49 + src/commands/sandbox/doctor.ts | 14 +- .../sandbox/oclif-command-adapters.test.ts | 46 + src/commands/sandbox/sessions.ts | 37 + src/commands/sandbox/sessions/delete.ts | 79 + src/commands/sandbox/sessions/list.ts | 36 + src/commands/sandbox/sessions/reset.ts | 83 + src/commands/sandbox/skill.test.ts | 43 +- src/commands/sandbox/skill.ts | 9 +- src/commands/sandbox/skill/remove.ts | 36 + src/commands/sandbox/status.ts | 29 +- src/commands/tunnel.ts | 22 + .../actions/gateway-drift-preflight.test.ts | 37 + src/lib/actions/inference-set.test.ts | 85 +- src/lib/actions/inference-set.ts | 72 +- src/lib/actions/installer/plan.test.ts | 2 +- src/lib/actions/sandbox/agents/passthrough.ts | 59 + .../actions/sandbox/channel-status.test.ts | 480 ++++ src/lib/actions/sandbox/channel-status.ts | 583 +++++ .../actions/sandbox/connect-vllm-preflight.ts | 24 + src/lib/actions/sandbox/connect.ts | 148 +- src/lib/actions/sandbox/destroy.ts | 60 +- src/lib/actions/sandbox/docker-health.test.ts | 72 +- src/lib/actions/sandbox/docker-health.ts | 104 +- src/lib/actions/sandbox/doctor.ts | 149 +- src/lib/actions/sandbox/forward-health.ts | 61 + .../sandbox/gateway-failure-classifier.ts | 183 +- src/lib/actions/sandbox/gateway-state.ts | 46 +- .../sandbox/hermes-dashboard-recovery.test.ts | 149 ++ .../sandbox/hermes-dashboard-recovery.ts | 73 + src/lib/actions/sandbox/host-aliases.ts | 142 +- src/lib/actions/sandbox/logs.ts | 15 + .../sandbox/policy-channel-conflict.test.ts | 567 +++++ src/lib/actions/sandbox/policy-channel.ts | 514 +++- src/lib/actions/sandbox/process-recovery.ts | 137 +- .../sandbox/rebuild-gpu-opt-out.test.ts | 180 ++ .../actions/sandbox/rebuild-gpu-opt-out.ts | 58 + src/lib/actions/sandbox/rebuild.ts | 73 +- .../sandbox/sandbox-container-owner.ts | 46 + .../actions/sandbox/sessions/delete.test.ts | 168 ++ src/lib/actions/sandbox/sessions/delete.ts | 103 + .../sandbox/sessions/gateway-rpc-envelope.ts | 63 + .../sandbox/sessions/gateway-rpc.test.ts | 116 + .../actions/sandbox/sessions/gateway-rpc.ts | 64 + .../actions/sandbox/sessions/passthrough.ts | 45 + .../actions/sandbox/sessions/paths.test.ts | 79 + src/lib/actions/sandbox/sessions/paths.ts | 56 + .../actions/sandbox/sessions/reset.test.ts | 150 ++ src/lib/actions/sandbox/sessions/reset.ts | 153 ++ src/lib/actions/sandbox/skill-install.test.ts | 221 ++ src/lib/actions/sandbox/skill-install.ts | 139 +- .../sandbox/slack-channel-validation.ts | 57 + src/lib/actions/sandbox/snapshot.ts | 25 +- src/lib/actions/sandbox/status-preflight.ts | 180 ++ src/lib/actions/sandbox/status-snapshot.ts | 261 +++ src/lib/actions/sandbox/status.test.ts | 266 ++- src/lib/actions/sandbox/status.ts | 342 +-- ...legram-channel-bridge-verification.test.ts | 62 + .../telegram-channel-bridge-verification.ts | 39 + src/lib/actions/uninstall/run-plan.test.ts | 294 ++- src/lib/actions/uninstall/run-plan.ts | 146 +- src/lib/actions/update.test.ts | 15 +- src/lib/actions/update.ts | 20 +- src/lib/actions/upgrade-sandboxes.ts | 15 +- src/lib/adapters/docker/index.ts | 1 + src/lib/adapters/docker/pull.test.ts | 220 ++ src/lib/adapters/docker/pull.ts | 260 +++ src/lib/adapters/docker/runtime.test.ts | 58 + src/lib/adapters/docker/runtime.ts | 33 + src/lib/adapters/http/probe.test.ts | 158 ++ src/lib/adapters/http/probe.ts | 66 +- .../adapters/openshell/gateway-drift.test.ts | 196 +- src/lib/adapters/openshell/gateway-drift.ts | 274 ++- src/lib/agent/dashboard-ui.ts | 125 + src/lib/agent/defs.test.ts | 31 + src/lib/agent/defs.ts | 10 +- src/lib/agent/onboard.test.ts | 44 +- src/lib/agent/onboard.ts | 7 +- src/lib/agent/runtime.test.ts | 25 + src/lib/agent/runtime.ts | 46 +- src/lib/channel-runtime-status.test.ts | 447 ++++ src/lib/channel-runtime-status.ts | 376 +++ src/lib/cli/command-registry.test.ts | 35 +- src/lib/cli/public-argv-translation.test.ts | 100 +- src/lib/cli/public-argv-translation.ts | 13 + src/lib/cli/public-display-agents.ts | 23 + src/lib/cli/public-display-defaults.ts | 33 +- src/lib/cli/public-display-layout.ts | 14 + src/lib/cli/public-display-sessions.ts | 39 + src/lib/cli/stdout-guard.ts | 28 + src/lib/core/ports.ts | 4 +- src/lib/core/require-value.test.ts | 21 + src/lib/core/shell-quote.test.ts | 30 + src/lib/core/wait.ts | 19 +- src/lib/credentials/store.ts | 2 +- src/lib/diagnostics/debug-command.test.ts | 119 + src/lib/diagnostics/debug-command.ts | 41 +- src/lib/diagnostics/debug.test.ts | 91 +- src/lib/diagnostics/debug.ts | 73 +- src/lib/diagnostics/tarball.ts | 70 + src/lib/domain/installer/ref.test.ts | 6 +- src/lib/domain/installer/ref.ts | 10 +- src/lib/hermes-dashboard.test.ts | 55 + src/lib/hermes-dashboard.ts | 66 + src/lib/inference/config.test.ts | 90 +- src/lib/inference/config.ts | 46 + src/lib/inference/gpu-trust.test.ts | 105 + src/lib/inference/gpu-trust.ts | 73 + src/lib/inference/local.test.ts | 89 +- src/lib/inference/local.ts | 43 +- src/lib/inference/nim.test.ts | 489 +++- src/lib/inference/nim.ts | 147 +- .../inference/ollama-model-registry.test.ts | 2 +- src/lib/inference/ollama-model-registry.ts | 2 +- src/lib/inference/ollama/model-size.test.ts | 22 +- src/lib/inference/ollama/proxy.test.ts | 8 +- src/lib/inference/ollama/proxy.ts | 7 +- src/lib/inference/onboard-probes.test.ts | 198 +- src/lib/inference/onboard-probes.ts | 152 +- src/lib/inference/vllm-models.test.ts | 92 + src/lib/inference/vllm-models.ts | 113 +- .../inference/vllm-runtime-context.test.ts | 90 + src/lib/inference/vllm-runtime-context.ts | 63 + src/lib/inference/vllm.ts | 314 +-- src/lib/inventory/index.test.ts | 4 +- src/lib/messaging-channel-config.test.ts | 30 + src/lib/messaging-channel-config.ts | 67 +- src/lib/messaging/applier/agent-config.ts | 605 +++++ src/lib/messaging/applier/index.ts | 8 + .../messaging/applier/openshell-provider.ts | 154 ++ src/lib/messaging/applier/plan-filter.ts | 34 + src/lib/messaging/applier/policy.ts | 36 + .../messaging/applier/setup-applier.test.ts | 765 ++++++ src/lib/messaging/applier/setup-applier.ts | 178 ++ src/lib/messaging/applier/types.ts | 110 + .../messaging/channels/discord/manifest.ts | 168 ++ src/lib/messaging/channels/index.ts | 28 + src/lib/messaging/channels/manifests.test.ts | 423 ++++ src/lib/messaging/channels/slack/manifest.ts | 132 ++ .../hooks/get-me-reachability.test.ts | 89 + .../telegram/hooks/get-me-reachability.ts | 91 + .../channels/telegram/hooks/index.ts | 4 + .../messaging/channels/telegram/manifest.ts | 156 ++ .../channels/wechat/hooks/health-check.ts | 23 + .../channels/wechat/hooks/ilink-login.ts | 144 ++ .../wechat/hooks/implementations.test.ts | 225 ++ .../messaging/channels/wechat/hooks/index.ts | 32 + .../wechat/hooks/seed-openclaw-account.ts | 149 ++ src/lib/messaging/channels/wechat/manifest.ts | 177 ++ .../messaging/channels/whatsapp/manifest.ts | 44 + .../compiler/engines/agent-render-engine.ts | 53 + .../compiler/engines/build-step-engine.ts | 34 + .../engines/credential-binding-engine.ts | 33 + .../compiler/engines/health-check-engine.ts | 17 + .../compiler/engines/policy-resolver.ts | 55 + .../compiler/engines/state-update-engine.ts | 24 + .../messaging/compiler/engines/template.ts | 88 + src/lib/messaging/compiler/index.ts | 6 + .../compiler/manifest-compiler.test.ts | 637 +++++ .../messaging/compiler/manifest-compiler.ts | 387 +++ src/lib/messaging/compiler/types.ts | 23 + .../compiler/workflow-planner.test.ts | 373 +++ .../messaging/compiler/workflow-planner.ts | 109 + src/lib/messaging/hooks/builtins.ts | 33 + src/lib/messaging/hooks/common/index.ts | 4 + .../hooks/common/token-paste.test.ts | 194 ++ src/lib/messaging/hooks/common/token-paste.ts | 144 ++ src/lib/messaging/hooks/hook-runner.test.ts | 359 +++ src/lib/messaging/hooks/hook-runner.ts | 112 + src/lib/messaging/hooks/index.ts | 8 + src/lib/messaging/hooks/registry.ts | 50 + src/lib/messaging/hooks/types.ts | 61 + src/lib/messaging/index.ts | 4 + src/lib/messaging/manifest/registry.test.ts | 3 +- src/lib/messaging/manifest/types.test.ts | 90 +- src/lib/messaging/manifest/types.ts | 156 +- src/lib/name-validation.ts | 31 + src/lib/onboard.ts | 1067 +++------ src/lib/onboard/agent-fixed-forward.ts | 59 + src/lib/onboard/bridge-dns-preflight.test.ts | 22 + src/lib/onboard/bridge-dns-preflight.ts | 45 +- src/lib/onboard/cancel-rollback.test.ts | 211 ++ src/lib/onboard/cancel-rollback.ts | 160 ++ src/lib/onboard/dashboard.ts | 12 +- src/lib/onboard/default-preservation.test.ts | 48 + src/lib/onboard/default-preservation.ts | 36 + src/lib/onboard/docker-cdi.test.ts | 258 ++ src/lib/onboard/docker-cdi.ts | 455 +++- .../docker-driver-gateway-failure.test.ts | 48 +- .../onboard/docker-driver-gateway-failure.ts | 5 + .../docker-driver-gateway-launch.test.ts | 56 + .../onboard/docker-driver-gateway-launch.ts | 34 +- .../docker-gpu-local-inference.test.ts | 420 ++++ src/lib/onboard/docker-gpu-local-inference.ts | 384 ++- src/lib/onboard/docker-gpu-patch.test.ts | 428 ++++ src/lib/onboard/docker-gpu-patch.ts | 464 +++- src/lib/onboard/docker-gpu-sandbox-create.ts | 106 +- .../docker-gpu-supervisor-reconnect.test.ts | 159 ++ .../docker-gpu-supervisor-reconnect.ts | 158 ++ .../dockerfile-patch-extra-agents.test.ts | 151 ++ src/lib/onboard/dockerfile-patch.test.ts | 78 +- src/lib/onboard/dockerfile-patch.ts | 33 + src/lib/onboard/gateway-binding.test.ts | 183 ++ src/lib/onboard/gateway-binding.ts | 100 + src/lib/onboard/gateway-destroy.test.ts | 35 +- src/lib/onboard/gateway-destroy.ts | 15 +- src/lib/onboard/gateway-gpu-passthrough.ts | 1 + src/lib/onboard/gateway-lifecycle.ts | 5 +- src/lib/onboard/gateway-reuse.ts | 8 +- .../gateway-sandbox-reachability.test.ts | 225 +- .../onboard/gateway-sandbox-reachability.ts | 76 +- src/lib/onboard/gateway-start-failure.test.ts | 80 + src/lib/onboard/gateway-start-failure.ts | 40 + src/lib/onboard/gpu-recovery.test.ts | 5 +- src/lib/onboard/gpu-recovery.ts | 1 + src/lib/onboard/hermes-dashboard.test.ts | 69 + src/lib/onboard/hermes-dashboard.ts | 209 ++ src/lib/onboard/host-gateway-process.test.ts | 289 +++ src/lib/onboard/host-gateway-process.ts | 368 +++ src/lib/onboard/host-proxy-env.ts | 41 + .../onboard/host-service-reachability.test.ts | 136 ++ src/lib/onboard/host-service-reachability.ts | 270 +++ src/lib/onboard/inference-providers/hermes.ts | 138 ++ src/lib/onboard/inference-providers/index.ts | 30 + .../inference-providers/ollama-local.ts | 114 + src/lib/onboard/inference-providers/remote.ts | 136 ++ src/lib/onboard/inference-providers/routed.ts | 47 + src/lib/onboard/inference-providers/types.ts | 217 ++ .../onboard/inference-providers/vllm-local.ts | 76 + src/lib/onboard/initial-policy.test.ts | 38 +- src/lib/onboard/initial-policy.ts | 2 + src/lib/onboard/landlock-warning.ts | 41 + src/lib/onboard/local-inference-topology.ts | 138 +- src/lib/onboard/machine/definition.test.ts | 17 + src/lib/onboard/machine/definition.ts | 36 +- .../machine/handlers/finalization.test.ts | 10 + .../onboard/machine/handlers/finalization.ts | 10 + .../onboard/machine/handlers/policies.test.ts | 94 + src/lib/onboard/machine/handlers/policies.ts | 35 + .../machine/handlers/preflight.test.ts | 24 +- .../handlers/provider-inference.test.ts | 65 + .../machine/handlers/provider-inference.ts | 14 + .../onboard/machine/handlers/sandbox.test.ts | 4 +- src/lib/onboard/machine/handlers/sandbox.ts | 4 +- src/lib/onboard/machine/progress.ts | 4 + src/lib/onboard/machine/result.test.ts | 30 +- src/lib/onboard/machine/result.ts | 40 +- src/lib/onboard/machine/runner.test.ts | 61 + src/lib/onboard/machine/runner.ts | 29 + src/lib/onboard/machine/runtime.test.ts | 37 +- src/lib/onboard/machine/runtime.ts | 42 +- .../onboard/messaging-channel-setup.test.ts | 129 + src/lib/onboard/messaging-channel-setup.ts | 154 +- src/lib/onboard/messaging-config.test.ts | 47 + src/lib/onboard/messaging-config.ts | 12 +- src/lib/onboard/missing-credential-hints.ts | 13 + src/lib/onboard/model-router.ts | 35 + src/lib/onboard/ollama-install-menu.test.ts | 20 + src/lib/onboard/ollama-install-menu.ts | 53 +- src/lib/onboard/ollama-probe-failure.test.ts | 171 ++ src/lib/onboard/ollama-probe-failure.ts | 60 + src/lib/onboard/ollama-proxy-reachability.ts | 265 +-- src/lib/onboard/ollama-startup.test.ts | 177 ++ src/lib/onboard/ollama-startup.ts | 32 +- .../openclaw-otel-policy-presets.test.ts | 92 + .../onboard/openclaw-otel-policy-presets.ts | 63 + src/lib/onboard/openclaw-runtime-env.test.ts | 56 + src/lib/onboard/openclaw-runtime-env.ts | 25 + src/lib/onboard/openshell-pin.ts | 12 +- src/lib/onboard/policy-carryforward.test.ts | 52 +- src/lib/onboard/policy-carryforward.ts | 31 + .../onboard/policy-preset-persistence.test.ts | 217 ++ src/lib/onboard/policy-preset-persistence.ts | 173 ++ src/lib/onboard/policy-presets.ts | 9 +- src/lib/onboard/policy-selection.ts | 51 +- src/lib/onboard/preflight-cdi.test.ts | 345 +++ src/lib/onboard/preflight.test.ts | 406 +--- src/lib/onboard/preflight.ts | 257 +- src/lib/onboard/prompt-helpers.test.ts | 90 + src/lib/onboard/prompt-helpers.ts | 25 +- src/lib/onboard/provider-key-fallback.test.ts | 18 +- src/lib/onboard/provider-key-fallback.ts | 6 +- src/lib/onboard/providers.test.ts | 94 +- src/lib/onboard/providers.ts | 42 +- src/lib/onboard/routed-inference.test.ts | 137 ++ src/lib/onboard/routed-inference.ts | 125 + src/lib/onboard/runtime-boundary.ts | 8 +- src/lib/onboard/sandbox-gpu-mode.ts | 5 + src/lib/onboard/sandbox-gpu-preflight.test.ts | 107 +- src/lib/onboard/sandbox-gpu-preflight.ts | 136 +- src/lib/onboard/sandbox-readiness-tracing.ts | 59 +- .../onboard/sandbox-registry-metadata.test.ts | 115 +- src/lib/onboard/sandbox-registry-metadata.ts | 10 +- src/lib/onboard/slack-validation.test.ts | 234 ++ src/lib/onboard/slack-validation.ts | 270 +++ src/lib/onboard/telegram-reachability.test.ts | 114 + src/lib/onboard/telegram-reachability.ts | 112 + src/lib/onboard/ufw-auto-apply.ts | 207 ++ .../onboard/verify-channel-runtime.test.ts | 48 + src/lib/onboard/verify-channel-runtime.ts | 70 + src/lib/onboard/windows-host-ollama.test.ts | 64 + src/lib/onboard/windows-host-ollama.ts | 11 +- .../onboard/wsl-docker-desktop-gpu.test.ts | 110 + src/lib/onboard/wsl-docker-desktop-gpu.ts | 124 +- src/lib/policy/index.ts | 38 +- src/lib/runner.ts | 10 +- src/lib/runtime-recovery.test.ts | 20 +- src/lib/runtime-recovery.ts | 12 +- src/lib/sandbox/build-context.ts | 8 +- src/lib/sandbox/channels-command-support.ts | 15 +- src/lib/sandbox/channels.ts | 5 +- src/lib/sandbox/config.ts | 4 +- src/lib/sandbox/version.test.ts | 21 + src/lib/sandbox/version.ts | 11 +- src/lib/sandbox/whatsapp-diagnostics.test.ts | 396 ++++ src/lib/sandbox/whatsapp-diagnostics.ts | 614 +++++ src/lib/shields/audit.ts | 10 +- src/lib/shields/index.test.ts | 163 +- src/lib/shields/index.ts | 413 +++- src/lib/shields/seal.test.ts | 95 + src/lib/shields/seal.ts | 36 + src/lib/shields/state-dir-lock.ts | 557 +++++ src/lib/shields/timer.test.ts | 70 + src/lib/shields/timer.ts | 40 +- src/lib/shields/verify-lock.test.ts | 192 ++ src/lib/shields/verify-lock.ts | 57 +- src/lib/skill-install.test.ts | 15 +- src/lib/skill-install.ts | 97 +- src/lib/skill-name.ts | 16 + src/lib/skill-remote.test.ts | 156 ++ src/lib/skill-remote.ts | 181 ++ src/lib/state/config-io.test.ts | 172 ++ src/lib/state/config-io.ts | 113 +- src/lib/state/gateway.ts | 67 +- src/lib/state/registry.ts | 51 +- src/lib/tunnel/command-support.ts | 11 + src/lib/validation.ts | 39 + src/lib/verify-deployment.test.ts | 221 ++ src/lib/verify-deployment.ts | 216 +- test/agent-variant-docs.test.ts | 69 + test/bootstrap-windows.test.ts | 297 ++- test/catalog-skills-export.test.ts | 210 -- test/channels-add-preset.test.ts | 1324 ++++++++++- test/check-docs-links.test.ts | 146 +- test/cli.test.ts | 2070 ++++++++++++++++- test/config-set-nested-ssrf.test.ts | 120 +- ...tials.test.js => credentials-shim.test.ts} | 18 +- test/detect-vllm-profile.test.ts | 20 +- test/e2e-gateway-isolation.sh | 18 +- test/e2e-scenario-advisor.test.ts | 366 ++- test/e2e-scenario/docs/MIGRATION.md | 242 +- test/e2e-scenario/docs/README.md | 175 +- .../framework-tests/e2e-lib-helpers.test.ts | 7 +- .../e2e-migration-inventory-lock.test.ts | 104 +- .../e2e-scenario-matrix.test.ts | 123 + .../e2e-scenario-registry.test.ts | 19 - .../e2e-scenario-resolver.test.ts | 15 + .../framework-tests/e2e-suite-runner.test.ts | 1 + .../manifests/hermes-nvidia-slack.yaml | 1 + .../helpers/emit-context-from-plan.sh | 4 + .../nemoclaw_scenarios/scenarios.yaml | 126 + .../scenarios/assertions/registry.ts | 5 +- .../scenarios/migration-inventory.ts | 181 -- test/e2e-scenario/scenarios/run.ts | 103 +- test/e2e-scenario/scenarios/runner-routing.ts | 58 + .../scenarios/scenarios/baseline.ts | 2 +- .../hermes/01-history-writable.sh | 131 ++ .../lib/messaging_providers.sh | 10 + .../slack/00-slack-provider-state.sh | 137 ++ .../validation_suites/suites.yaml | 2 + test/e2e-script-workflow.test.ts | 103 +- test/e2e/brev-e2e.test.ts | 2 +- test/e2e/e2e-cloud-experimental/check-docs.sh | 335 ++- test/e2e/lib/discord-rest-policy-proof.sh | 301 +++ test/e2e/lib/fake-discord-message-api.cjs | 157 ++ test/e2e/lib/fake-telegram-api.cjs | 156 ++ test/e2e/lib/slack-api-proof.sh | 165 +- test/e2e/lib/telegram-api-proof.sh | 311 +++ ...st-bedrock-runtime-compatible-anthropic.sh | 3 + test/e2e/test-channels-add-remove.sh | 30 + test/e2e/test-channels-stop-start.sh | 30 +- test/e2e/test-diagnostics.sh | 61 + .../test-docker-unreachable-gateway-start.sh | 162 ++ test/e2e/test-double-onboard.sh | 91 +- test/e2e/test-full-e2e.sh | 37 + test/e2e/test-gateway-drift-preflight.sh | 188 ++ test/e2e/test-gpu-e2e.sh | 25 + test/e2e/test-hermes-discord-e2e.sh | 46 +- test/e2e/test-hermes-e2e.sh | 167 ++ test/e2e/test-hermes-inference-switch.sh | 3 + test/e2e/test-hermes-slack-e2e.sh | 23 +- .../test-issue-4462-scope-upgrade-approval.sh | 1025 ++++++++ test/e2e/test-messaging-providers.sh | 817 ++++++- test/e2e/test-network-policy.sh | 168 +- test/e2e/test-openclaw-skill-cli-e2e.sh | 340 +++ test/e2e/test-openclaw-slack-pairing.sh | 11 + test/e2e/test-rebuild-openclaw.sh | 87 +- test/e2e/test-sandbox-operations.sh | 45 + test/e2e/test-sessions-agents-cli.sh | 457 ++++ test/e2e/test-shields-config.sh | 121 + test/e2e/test-strict-tool-call-probe.sh | 377 +++ test/e2e/test-token-rotation.sh | 28 + test/gateway-failure-classifier.test.ts | 319 +++ test/gateway-final-failure-cleanup.test.ts | 9 + test/gateway-state-reconcile-2276.test.ts | 160 +- test/gateway-state.test.ts | 21 + test/generate-hermes-config.test.ts | 70 +- test/generate-openclaw-config.test.ts | 980 +++++++- test/get-ollama-model-options.test.ts | 4 +- test/helpers/e2e-workflow-contract.ts | 2 + .../helpers/onboard-smoke-verifier-harness.ts | 109 + test/hermes-plugin-handlers.test.ts | 294 +++ test/hermes-share-mount-deps.test.ts | 69 + test/hermes-start.test.ts | 420 +++- test/install-docker-group-reexec.test.ts | 185 ++ test/install-preflight.test.ts | 254 +- test/install-stage-from-stdin.test.ts | 195 ++ test/internal-cli.test.ts | 2 +- test/internal-commands-docs.test.ts | 92 + test/nemoclaw-start-plugin-refresh.test.ts | 377 +++ test/nemoclaw-start-reconcile.test.ts | 191 +- test/nemoclaw-start.test.ts | 1302 ++++++++++- test/nemotron-inference-fix.test.ts | 158 +- ...-local-openclaw-config-propagation.test.ts | 99 + test/ollama-pull-timeout.test.ts | 2 +- test/ollama-tools-capability.test.ts | 6 +- test/onboard-dashboard.test.ts | 54 + test/onboard-gateway-runtime.test.ts | 28 + test/onboard-lifecycle.test.ts | 313 +++ test/onboard-messaging.test.ts | 116 +- test/onboard-ollama-autostart.test.ts | 140 +- test/onboard-openshell-install-stream.test.ts | 57 + test/onboard-policy-suggestions.test.ts | 91 +- test/onboard-preset-diff.test.ts | 10 +- test/onboard-sandbox-name.test.ts | 119 +- test/onboard-selection.test.ts | 616 ++++- test/onboard-smoke-verifier.test.ts | 40 + test/onboard.test.ts | 208 +- test/openclaw-build-messaging-plugins.test.ts | 272 +++ test/policies.test.ts | 378 ++- test/pr-review-advisor.test.ts | 4 + test/process-recovery.test.ts | 93 +- test/rebuild-credential-preflight.test.ts | 64 +- test/registry.test.ts | 33 + test/release-latest-tag.test.ts | 613 +++++ test/repro-2010.test.ts | 14 +- test/repro-2681-group-writable.test.ts | 34 +- test/{runner.test.js => runner-basic.test.ts} | 8 +- test/runner.test.ts | 44 + test/runtime-shell.test.ts | 39 - test/sandbox-build-context.test.ts | 8 +- test/sandbox-connect-inference.test.ts | 145 +- test/sandbox-container-owner.test.ts | 88 + test/sandbox-init.test.ts | 264 ++- test/sandbox-provisioning.test.ts | 113 +- test/sandbox-status-json-stdout.test.ts | 59 + test/sandbox-stuck-recovery.test.ts | 20 +- test/scorecard-blocks.test.ts | 258 ++ test/security-c2-dockerfile-injection.test.ts | 64 +- test/seed-wechat-accounts.test.ts | 8 +- test/shields-up-runtime-perms.test.ts | 458 ++++ test/skills-frontmatter.test.ts | 27 +- test/snapshot-gateway-guard.test.ts | 76 + test/snapshot-shields-guard.test.ts | 141 ++ test/snapshot.test.ts | 21 +- test/stdout-guard.test.ts | 61 + test/sync-agent-variant-docs.test.ts | 53 + test/telegram-diagnostics.test.ts | 246 ++ test/validate-blueprint.test.ts | 42 + test/validate-config-schemas.test.ts | 26 + test/validate-e2e-coverage.test.ts | 51 + test/wait.test.ts | 71 +- test/whatsapp-qr-compact.test.ts | 265 +++ tools/e2e-advisor/scenarios-schema.json | 57 + tools/e2e-advisor/scenarios.mts | 757 +++--- tools/e2e-scenarios/workflow-boundary.mts | 19 + tools/pr-review-advisor/analyze.mts | 38 +- tools/pr-review-advisor/comment.mts | 57 +- tsconfig.cli.json | 2 +- 770 files changed, 85434 insertions(+), 13127 deletions(-) delete mode 100644 .agents/catalog-skills.yaml create mode 100644 .agents/skills/nemoclaw-maintainer-release-notes/SKILL.md create mode 100644 .agents/skills/nemoclaw-user-agent-skills/evals/evals.json create mode 100644 .agents/skills/nemoclaw-user-configure-inference/evals/evals.json create mode 100644 .agents/skills/nemoclaw-user-configure-security/evals/evals.json create mode 100644 .agents/skills/nemoclaw-user-deploy-remote/evals/evals.json create mode 100644 .agents/skills/nemoclaw-user-get-started/evals/evals.json create mode 100644 .agents/skills/nemoclaw-user-manage-policy/evals/evals.json create mode 100644 .agents/skills/nemoclaw-user-manage-sandboxes/evals/evals.json create mode 100644 .agents/skills/nemoclaw-user-manage-sandboxes/references/install-plugins-hermes.md create mode 100644 .agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json create mode 100644 .agents/skills/nemoclaw-user-overview/evals/evals.json create mode 100644 .agents/skills/nemoclaw-user-overview/references/ecosystem-hermes.md create mode 100644 .agents/skills/nemoclaw-user-reference/evals/evals.json delete mode 100644 .github/pr-bodies/catalog-skills-refresh.md delete mode 100644 .github/workflows/catalog-skills-refresh.yaml rename .github/workflows/{onboard-entrypoint-budget.yaml => codebase-growth-guardrails.yaml} (64%) create mode 100644 .github/workflows/release-latest-tag.yaml create mode 100644 docs/_components/AgentGuide.tsx create mode 100644 docs/about/ecosystem-hermes.mdx create mode 100644 docs/manage-sandboxes/install-plugins-hermes.mdx create mode 100644 docs/reference/commands-nemohermes.mdx create mode 100644 nemoclaw-blueprint/policies/presets/claude-code.yaml create mode 100644 nemoclaw-blueprint/policies/presets/openclaw-diagnostics-otel-local.yaml create mode 100644 nemoclaw-blueprint/scripts/whatsapp-qr-compact.js create mode 100644 nemoclaw/src/security/safe-resolve-path.test.ts create mode 100644 nemoclaw/src/security/safe-resolve-path.ts rename scripts/{benchmark-sandbox-image-build.js => benchmark-sandbox-image-build.ts} (65%) mode change 100755 => 100644 delete mode 100644 scripts/convert-docs-to-fern.mjs rename scripts/{dev-tier-selector.js => dev-tier-selector.ts} (61%) delete mode 100755 scripts/export-catalog-skills.py create mode 100755 scripts/generate-openclaw-config.mts delete mode 100755 scripts/generate-openclaw-config.py create mode 100755 scripts/openclaw-build-messaging-plugins.py create mode 100755 scripts/release-cut-tag.sh create mode 100755 scripts/release-latest-tag.sh create mode 100644 scripts/release-notes-data.ts create mode 100644 scripts/release-plan.ts create mode 100755 scripts/release-wait-latest.sh create mode 100644 scripts/scorecard/build-slack-blocks.ts delete mode 100755 scripts/setup-spark.sh create mode 100644 scripts/sync-agent-variant-docs.ts rename scripts/{watch-fern-preview.mjs => watch-fern-preview.ts} (68%) mode change 100755 => 100644 create mode 100644 skills/README.md create mode 100644 skills/catalog-metadata.json create mode 100644 skills/nemoclaw-user-agent-skills/BENCHMARK.md create mode 100644 skills/nemoclaw-user-agent-skills/SKILL.md create mode 100644 skills/nemoclaw-user-agent-skills/evals/evals.json rename {.agents/skills => skills}/nemoclaw-user-agent-skills/references/agent-skills.md (97%) create mode 100644 skills/nemoclaw-user-agent-skills/skill-card.md create mode 100644 skills/nemoclaw-user-agent-skills/skill.oms.sig create mode 100644 skills/nemoclaw-user-configure-inference/BENCHMARK.md create mode 100644 skills/nemoclaw-user-configure-inference/SKILL.md create mode 100644 skills/nemoclaw-user-configure-inference/evals/evals.json create mode 100644 skills/nemoclaw-user-configure-inference/references/inference-options.md create mode 100644 skills/nemoclaw-user-configure-inference/references/set-up-sub-agent.md create mode 100644 skills/nemoclaw-user-configure-inference/references/switch-inference-providers.md create mode 100644 skills/nemoclaw-user-configure-inference/references/tool-calling-reliability.md create mode 100644 skills/nemoclaw-user-configure-inference/references/use-local-inference-details.md create mode 100644 skills/nemoclaw-user-configure-inference/skill-card.md create mode 100644 skills/nemoclaw-user-configure-inference/skill.oms.sig create mode 100644 skills/nemoclaw-user-configure-security/BENCHMARK.md create mode 100644 skills/nemoclaw-user-configure-security/SKILL.md create mode 100644 skills/nemoclaw-user-configure-security/evals/evals.json create mode 100644 skills/nemoclaw-user-configure-security/references/best-practices.md create mode 100644 skills/nemoclaw-user-configure-security/references/credential-storage.md create mode 100644 skills/nemoclaw-user-configure-security/references/openclaw-controls.md create mode 100644 skills/nemoclaw-user-configure-security/skill-card.md create mode 100644 skills/nemoclaw-user-configure-security/skill.oms.sig create mode 100644 skills/nemoclaw-user-deploy-remote/BENCHMARK.md create mode 100644 skills/nemoclaw-user-deploy-remote/SKILL.md create mode 100644 skills/nemoclaw-user-deploy-remote/evals/evals.json create mode 100644 skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md create mode 100644 skills/nemoclaw-user-deploy-remote/references/install-openclaw-plugins.md create mode 100644 skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md create mode 100644 skills/nemoclaw-user-deploy-remote/skill-card.md create mode 100644 skills/nemoclaw-user-deploy-remote/skill.oms.sig create mode 100644 skills/nemoclaw-user-get-started/BENCHMARK.md create mode 100644 skills/nemoclaw-user-get-started/SKILL.md create mode 100644 skills/nemoclaw-user-get-started/evals/evals.json create mode 100644 skills/nemoclaw-user-get-started/references/prerequisites.md create mode 100644 skills/nemoclaw-user-get-started/references/quickstart-details.md create mode 100644 skills/nemoclaw-user-get-started/references/quickstart-hermes.md create mode 100644 skills/nemoclaw-user-get-started/references/windows-preparation.md create mode 100644 skills/nemoclaw-user-get-started/skill-card.md create mode 100644 skills/nemoclaw-user-get-started/skill.oms.sig create mode 100644 skills/nemoclaw-user-manage-policy/BENCHMARK.md create mode 100644 skills/nemoclaw-user-manage-policy/SKILL.md create mode 100644 skills/nemoclaw-user-manage-policy/evals/evals.json create mode 100644 skills/nemoclaw-user-manage-policy/references/approve-network-requests.md create mode 100644 skills/nemoclaw-user-manage-policy/references/customize-network-policy-details.md create mode 100644 skills/nemoclaw-user-manage-policy/references/integration-policy-examples.md create mode 100644 skills/nemoclaw-user-manage-policy/skill-card.md create mode 100644 skills/nemoclaw-user-manage-policy/skill.oms.sig create mode 100644 skills/nemoclaw-user-manage-sandboxes/BENCHMARK.md create mode 100644 skills/nemoclaw-user-manage-sandboxes/SKILL.md create mode 100644 skills/nemoclaw-user-manage-sandboxes/evals/evals.json create mode 100644 skills/nemoclaw-user-manage-sandboxes/references/backup-restore.md create mode 100644 skills/nemoclaw-user-manage-sandboxes/references/lifecycle-details.md create mode 100644 skills/nemoclaw-user-manage-sandboxes/references/messaging-channels.md create mode 100644 skills/nemoclaw-user-manage-sandboxes/references/runtime-controls.md create mode 100644 skills/nemoclaw-user-manage-sandboxes/references/workspace-files.md create mode 100644 skills/nemoclaw-user-manage-sandboxes/skill-card.md create mode 100644 skills/nemoclaw-user-manage-sandboxes/skill.oms.sig create mode 100644 skills/nemoclaw-user-monitor-sandbox/BENCHMARK.md create mode 100644 skills/nemoclaw-user-monitor-sandbox/SKILL.md create mode 100644 skills/nemoclaw-user-monitor-sandbox/evals/evals.json create mode 100644 skills/nemoclaw-user-monitor-sandbox/skill-card.md create mode 100644 skills/nemoclaw-user-monitor-sandbox/skill.oms.sig create mode 100644 skills/nemoclaw-user-overview/BENCHMARK.md create mode 100644 skills/nemoclaw-user-overview/SKILL.md create mode 100644 skills/nemoclaw-user-overview/evals/evals.json create mode 100644 skills/nemoclaw-user-overview/references/ecosystem.md create mode 100644 skills/nemoclaw-user-overview/references/how-it-works.md create mode 100644 skills/nemoclaw-user-overview/references/images/nemoclaw-highlevel-component-diagram.png create mode 100644 skills/nemoclaw-user-overview/references/overview.md create mode 100644 skills/nemoclaw-user-overview/references/release-notes.md create mode 100644 skills/nemoclaw-user-overview/skill-card.md create mode 100644 skills/nemoclaw-user-overview/skill.oms.sig create mode 100644 skills/nemoclaw-user-reference/BENCHMARK.md create mode 100644 skills/nemoclaw-user-reference/SKILL.md create mode 100644 skills/nemoclaw-user-reference/evals/evals.json create mode 100644 skills/nemoclaw-user-reference/references/architecture.md create mode 100644 skills/nemoclaw-user-reference/references/cli-selection-guide.md create mode 100644 skills/nemoclaw-user-reference/references/commands.md create mode 100644 skills/nemoclaw-user-reference/references/network-policies.md create mode 100644 skills/nemoclaw-user-reference/references/troubleshooting.md create mode 100644 skills/nemoclaw-user-reference/skill-card.md create mode 100644 skills/nemoclaw-user-reference/skill.oms.sig create mode 100644 src/commands/sandbox/agents.ts create mode 100644 src/commands/sandbox/agents/add.ts create mode 100644 src/commands/sandbox/agents/delete.ts create mode 100644 src/commands/sandbox/channels/status.ts create mode 100644 src/commands/sandbox/sessions.ts create mode 100644 src/commands/sandbox/sessions/delete.ts create mode 100644 src/commands/sandbox/sessions/list.ts create mode 100644 src/commands/sandbox/sessions/reset.ts create mode 100644 src/commands/sandbox/skill/remove.ts create mode 100644 src/commands/tunnel.ts create mode 100644 src/lib/actions/sandbox/agents/passthrough.ts create mode 100644 src/lib/actions/sandbox/channel-status.test.ts create mode 100644 src/lib/actions/sandbox/channel-status.ts create mode 100644 src/lib/actions/sandbox/connect-vllm-preflight.ts create mode 100644 src/lib/actions/sandbox/forward-health.ts create mode 100644 src/lib/actions/sandbox/hermes-dashboard-recovery.test.ts create mode 100644 src/lib/actions/sandbox/hermes-dashboard-recovery.ts create mode 100644 src/lib/actions/sandbox/policy-channel-conflict.test.ts create mode 100644 src/lib/actions/sandbox/rebuild-gpu-opt-out.test.ts create mode 100644 src/lib/actions/sandbox/rebuild-gpu-opt-out.ts create mode 100644 src/lib/actions/sandbox/sandbox-container-owner.ts create mode 100644 src/lib/actions/sandbox/sessions/delete.test.ts create mode 100644 src/lib/actions/sandbox/sessions/delete.ts create mode 100644 src/lib/actions/sandbox/sessions/gateway-rpc-envelope.ts create mode 100644 src/lib/actions/sandbox/sessions/gateway-rpc.test.ts create mode 100644 src/lib/actions/sandbox/sessions/gateway-rpc.ts create mode 100644 src/lib/actions/sandbox/sessions/passthrough.ts create mode 100644 src/lib/actions/sandbox/sessions/paths.test.ts create mode 100644 src/lib/actions/sandbox/sessions/paths.ts create mode 100644 src/lib/actions/sandbox/sessions/reset.test.ts create mode 100644 src/lib/actions/sandbox/sessions/reset.ts create mode 100644 src/lib/actions/sandbox/skill-install.test.ts create mode 100644 src/lib/actions/sandbox/slack-channel-validation.ts create mode 100644 src/lib/actions/sandbox/status-preflight.ts create mode 100644 src/lib/actions/sandbox/status-snapshot.ts create mode 100644 src/lib/actions/sandbox/telegram-channel-bridge-verification.test.ts create mode 100644 src/lib/actions/sandbox/telegram-channel-bridge-verification.ts create mode 100644 src/lib/adapters/docker/pull.test.ts create mode 100644 src/lib/adapters/docker/runtime.test.ts create mode 100644 src/lib/adapters/docker/runtime.ts create mode 100644 src/lib/agent/dashboard-ui.ts create mode 100644 src/lib/channel-runtime-status.test.ts create mode 100644 src/lib/channel-runtime-status.ts create mode 100644 src/lib/cli/public-display-agents.ts create mode 100644 src/lib/cli/public-display-layout.ts create mode 100644 src/lib/cli/public-display-sessions.ts create mode 100644 src/lib/cli/stdout-guard.ts create mode 100644 src/lib/core/require-value.test.ts create mode 100644 src/lib/core/shell-quote.test.ts create mode 100644 src/lib/diagnostics/tarball.ts create mode 100644 src/lib/hermes-dashboard.test.ts create mode 100644 src/lib/hermes-dashboard.ts create mode 100644 src/lib/inference/gpu-trust.test.ts create mode 100644 src/lib/inference/gpu-trust.ts create mode 100644 src/lib/inference/vllm-runtime-context.test.ts create mode 100644 src/lib/inference/vllm-runtime-context.ts create mode 100644 src/lib/messaging/applier/agent-config.ts create mode 100644 src/lib/messaging/applier/index.ts create mode 100644 src/lib/messaging/applier/openshell-provider.ts create mode 100644 src/lib/messaging/applier/plan-filter.ts create mode 100644 src/lib/messaging/applier/policy.ts create mode 100644 src/lib/messaging/applier/setup-applier.test.ts create mode 100644 src/lib/messaging/applier/setup-applier.ts create mode 100644 src/lib/messaging/applier/types.ts create mode 100644 src/lib/messaging/channels/discord/manifest.ts create mode 100644 src/lib/messaging/channels/index.ts create mode 100644 src/lib/messaging/channels/manifests.test.ts create mode 100644 src/lib/messaging/channels/slack/manifest.ts create mode 100644 src/lib/messaging/channels/telegram/hooks/get-me-reachability.test.ts create mode 100644 src/lib/messaging/channels/telegram/hooks/get-me-reachability.ts create mode 100644 src/lib/messaging/channels/telegram/hooks/index.ts create mode 100644 src/lib/messaging/channels/telegram/manifest.ts create mode 100644 src/lib/messaging/channels/wechat/hooks/health-check.ts create mode 100644 src/lib/messaging/channels/wechat/hooks/ilink-login.ts create mode 100644 src/lib/messaging/channels/wechat/hooks/implementations.test.ts create mode 100644 src/lib/messaging/channels/wechat/hooks/index.ts create mode 100644 src/lib/messaging/channels/wechat/hooks/seed-openclaw-account.ts create mode 100644 src/lib/messaging/channels/wechat/manifest.ts create mode 100644 src/lib/messaging/channels/whatsapp/manifest.ts create mode 100644 src/lib/messaging/compiler/engines/agent-render-engine.ts create mode 100644 src/lib/messaging/compiler/engines/build-step-engine.ts create mode 100644 src/lib/messaging/compiler/engines/credential-binding-engine.ts create mode 100644 src/lib/messaging/compiler/engines/health-check-engine.ts create mode 100644 src/lib/messaging/compiler/engines/policy-resolver.ts create mode 100644 src/lib/messaging/compiler/engines/state-update-engine.ts create mode 100644 src/lib/messaging/compiler/engines/template.ts create mode 100644 src/lib/messaging/compiler/index.ts create mode 100644 src/lib/messaging/compiler/manifest-compiler.test.ts create mode 100644 src/lib/messaging/compiler/manifest-compiler.ts create mode 100644 src/lib/messaging/compiler/types.ts create mode 100644 src/lib/messaging/compiler/workflow-planner.test.ts create mode 100644 src/lib/messaging/compiler/workflow-planner.ts create mode 100644 src/lib/messaging/hooks/builtins.ts create mode 100644 src/lib/messaging/hooks/common/index.ts create mode 100644 src/lib/messaging/hooks/common/token-paste.test.ts create mode 100644 src/lib/messaging/hooks/common/token-paste.ts create mode 100644 src/lib/messaging/hooks/hook-runner.test.ts create mode 100644 src/lib/messaging/hooks/hook-runner.ts create mode 100644 src/lib/messaging/hooks/index.ts create mode 100644 src/lib/messaging/hooks/registry.ts create mode 100644 src/lib/messaging/hooks/types.ts create mode 100644 src/lib/onboard/agent-fixed-forward.ts create mode 100644 src/lib/onboard/cancel-rollback.test.ts create mode 100644 src/lib/onboard/cancel-rollback.ts create mode 100644 src/lib/onboard/default-preservation.test.ts create mode 100644 src/lib/onboard/default-preservation.ts create mode 100644 src/lib/onboard/docker-cdi.test.ts create mode 100644 src/lib/onboard/docker-gpu-local-inference.test.ts create mode 100644 src/lib/onboard/docker-gpu-supervisor-reconnect.test.ts create mode 100644 src/lib/onboard/docker-gpu-supervisor-reconnect.ts create mode 100644 src/lib/onboard/dockerfile-patch-extra-agents.test.ts create mode 100644 src/lib/onboard/gateway-binding.test.ts create mode 100644 src/lib/onboard/gateway-binding.ts create mode 100644 src/lib/onboard/gateway-start-failure.test.ts create mode 100644 src/lib/onboard/gateway-start-failure.ts create mode 100644 src/lib/onboard/hermes-dashboard.test.ts create mode 100644 src/lib/onboard/hermes-dashboard.ts create mode 100644 src/lib/onboard/host-gateway-process.test.ts create mode 100644 src/lib/onboard/host-gateway-process.ts create mode 100644 src/lib/onboard/host-proxy-env.ts create mode 100644 src/lib/onboard/host-service-reachability.test.ts create mode 100644 src/lib/onboard/host-service-reachability.ts create mode 100644 src/lib/onboard/inference-providers/hermes.ts create mode 100644 src/lib/onboard/inference-providers/index.ts create mode 100644 src/lib/onboard/inference-providers/ollama-local.ts create mode 100644 src/lib/onboard/inference-providers/remote.ts create mode 100644 src/lib/onboard/inference-providers/routed.ts create mode 100644 src/lib/onboard/inference-providers/types.ts create mode 100644 src/lib/onboard/inference-providers/vllm-local.ts create mode 100644 src/lib/onboard/landlock-warning.ts create mode 100644 src/lib/onboard/missing-credential-hints.ts create mode 100644 src/lib/onboard/ollama-probe-failure.test.ts create mode 100644 src/lib/onboard/ollama-probe-failure.ts create mode 100644 src/lib/onboard/ollama-startup.test.ts create mode 100644 src/lib/onboard/openclaw-otel-policy-presets.test.ts create mode 100644 src/lib/onboard/openclaw-otel-policy-presets.ts create mode 100644 src/lib/onboard/openclaw-runtime-env.test.ts create mode 100644 src/lib/onboard/openclaw-runtime-env.ts create mode 100644 src/lib/onboard/policy-preset-persistence.test.ts create mode 100644 src/lib/onboard/policy-preset-persistence.ts create mode 100644 src/lib/onboard/preflight-cdi.test.ts create mode 100644 src/lib/onboard/prompt-helpers.test.ts create mode 100644 src/lib/onboard/routed-inference.test.ts create mode 100644 src/lib/onboard/routed-inference.ts create mode 100644 src/lib/onboard/slack-validation.test.ts create mode 100644 src/lib/onboard/slack-validation.ts create mode 100644 src/lib/onboard/telegram-reachability.test.ts create mode 100644 src/lib/onboard/telegram-reachability.ts create mode 100644 src/lib/onboard/ufw-auto-apply.ts create mode 100644 src/lib/onboard/verify-channel-runtime.test.ts create mode 100644 src/lib/onboard/verify-channel-runtime.ts create mode 100644 src/lib/onboard/windows-host-ollama.test.ts create mode 100644 src/lib/sandbox/whatsapp-diagnostics.test.ts create mode 100644 src/lib/sandbox/whatsapp-diagnostics.ts create mode 100644 src/lib/shields/seal.test.ts create mode 100644 src/lib/shields/seal.ts create mode 100644 src/lib/shields/state-dir-lock.ts create mode 100644 src/lib/skill-name.ts create mode 100644 src/lib/skill-remote.test.ts create mode 100644 src/lib/skill-remote.ts create mode 100644 test/agent-variant-docs.test.ts delete mode 100644 test/catalog-skills-export.test.ts rename test/{credentials.test.js => credentials-shim.test.ts} (85%) create mode 100644 test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts delete mode 100644 test/e2e-scenario/scenarios/migration-inventory.ts create mode 100644 test/e2e-scenario/scenarios/runner-routing.ts create mode 100755 test/e2e-scenario/validation_suites/hermes/01-history-writable.sh create mode 100755 test/e2e/lib/fake-discord-message-api.cjs create mode 100755 test/e2e/lib/fake-telegram-api.cjs create mode 100755 test/e2e/lib/telegram-api-proof.sh create mode 100755 test/e2e/test-docker-unreachable-gateway-start.sh create mode 100755 test/e2e/test-issue-4462-scope-upgrade-approval.sh create mode 100755 test/e2e/test-openclaw-skill-cli-e2e.sh create mode 100755 test/e2e/test-sessions-agents-cli.sh create mode 100755 test/e2e/test-strict-tool-call-probe.sh create mode 100644 test/helpers/onboard-smoke-verifier-harness.ts create mode 100644 test/install-docker-group-reexec.test.ts create mode 100644 test/install-stage-from-stdin.test.ts create mode 100644 test/internal-commands-docs.test.ts create mode 100644 test/nemoclaw-start-plugin-refresh.test.ts create mode 100644 test/ollama-local-openclaw-config-propagation.test.ts create mode 100644 test/onboard-lifecycle.test.ts create mode 100644 test/onboard-openshell-install-stream.test.ts create mode 100644 test/onboard-smoke-verifier.test.ts create mode 100644 test/openclaw-build-messaging-plugins.test.ts create mode 100644 test/release-latest-tag.test.ts rename test/{runner.test.js => runner-basic.test.ts} (89%) create mode 100644 test/sandbox-container-owner.test.ts create mode 100644 test/sandbox-status-json-stdout.test.ts create mode 100644 test/scorecard-blocks.test.ts create mode 100644 test/shields-up-runtime-perms.test.ts create mode 100644 test/snapshot-shields-guard.test.ts create mode 100644 test/stdout-guard.test.ts create mode 100644 test/sync-agent-variant-docs.test.ts create mode 100644 test/telegram-diagnostics.test.ts create mode 100644 test/whatsapp-qr-compact.test.ts create mode 100644 tools/e2e-advisor/scenarios-schema.json mode change 100644 => 100755 tools/e2e-advisor/scenarios.mts diff --git a/.agents/catalog-skills.yaml b/.agents/catalog-skills.yaml deleted file mode 100644 index 94616a5f33..0000000000 --- a/.agents/catalog-skills.yaml +++ /dev/null @@ -1,40 +0,0 @@ -# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: Apache-2.0 - -# Explicit allowlist for NemoClaw skills exported to the NVIDIA Verified Skills catalog. -# Keep this file deterministic: no timestamps, no generated comments, and no implicit globs. -# The export is regenerated with: python3 scripts/export-catalog-skills.py -version: 1 -source: .agents/skills -export: skills -include: - - skill: nemoclaw-skills-guide - rationale: Public index for user-facing NemoClaw skills. - - skill: nemoclaw-user-agent-skills - rationale: Public user documentation skill. - - skill: nemoclaw-user-configure-inference - rationale: Public user documentation skill. - - skill: nemoclaw-user-configure-security - rationale: Public user documentation skill. - - skill: nemoclaw-user-deploy-remote - rationale: Public user documentation skill. - - skill: nemoclaw-user-get-started - rationale: Public user documentation skill. - - skill: nemoclaw-user-manage-policy - rationale: Public user documentation skill. - - skill: nemoclaw-user-manage-sandboxes - rationale: Public user documentation skill. - - skill: nemoclaw-user-monitor-sandbox - rationale: Public user documentation skill. - - skill: nemoclaw-user-overview - rationale: Public user documentation skill. - - skill: nemoclaw-user-reference - rationale: Public user documentation skill. -exclude: - - pattern: nemoclaw-maintainer-* - rationale: Internal maintainer workflows are not catalog/customer-facing. - - pattern: nemoclaw-contributor-* - rationale: Contributor workflows are repo-local until explicitly approved for catalog publication. -metadata: - minNemoClawVersion: "0.1.0" - testedNemoClawVersion: "0.1.0" diff --git a/.agents/skills/nemoclaw-contributor-update-docs/SKILL.md b/.agents/skills/nemoclaw-contributor-update-docs/SKILL.md index 7ca7041190..f1d097711f 100644 --- a/.agents/skills/nemoclaw-contributor-update-docs/SKILL.md +++ b/.agents/skills/nemoclaw-contributor-update-docs/SKILL.md @@ -116,6 +116,8 @@ Write the doc update following these conventions: - **Start sections with an introductory sentence** that orients the reader. - **No superlatives.** Say what the feature does, not how great it is. - **Copyable code examples use language-specific fences** such as `bash`, `sh`, or `powershell`, without prompt markers. +- **Shared NemoClaw CLI examples use `$$nemoclaw`.** In shared OpenClaw/Hermes variant pages, write host CLI examples with the `$$nemoclaw` build-time placeholder so the docs build renders `nemoclaw` on OpenClaw pages and `nemohermes` on Hermes pages before Fern renders fenced code blocks. +- **Do not duplicate code blocks for binary-name-only differences.** Use one fenced block with `$$nemoclaw` when the only difference is `nemoclaw` versus `nemohermes`; keep `` only when the surrounding text, flags, behavior, or setup steps actually differ. - **Use `console` only for terminal transcripts** that include prompts, output, or interactive sessions. - **Include the SPDX header** if creating a new page. - **Match existing frontmatter format** if creating a new page. @@ -132,6 +134,13 @@ When updating an existing page: - Do not reorganize sections unless the change requires it. - Update any cross-references or "Next Steps" links if relevant. +**Release prep only:** When updating `docs/about/release-notes.mdx`: + +- For each release-note bullet that corresponds to a deeper doc page, end the bullet with `For more information, refer to [DOC PAGE](/doc/path).` +- Link to the most specific existing page that explains the behavior, command, setup flow, or troubleshooting path. +- Do not add a link when no deeper page exists or when the only possible target is unrelated or too broad. +- Keep the source docs link as a normal MDX link. The docs-to-skills generator will convert it to the appropriate generated skill reference where needed. + When creating a new page: - Follow the frontmatter template from existing pages in `docs/`. @@ -217,7 +226,7 @@ User says: "Catch up the docs for everything merged since v0.1.0." 4. Read the commit diffs and current doc pages. 5. Draft doc updates reflecting the source code changes in the commits following the style guide. 6. **Release prep only:** Determine the release label from the user-requested release version. -7. **Release prep only:** Run `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx`. +7. **Release prep only:** Run `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx`. 8. Present the summary. 9. Build with `npm run docs` to verify. 10. **Release prep only:** Commit changes and open a pull request with the `documentation` label and the corresponding `vX.Y.Z` release label. Include a concise summary of the doc updates and a source summary that links each identified merged PR to its matching doc page. Include the PR number, affected doc page, links, and description of the doc change in this shape: diff --git a/.agents/skills/nemoclaw-maintainer-cut-release-tag/SKILL.md b/.agents/skills/nemoclaw-maintainer-cut-release-tag/SKILL.md index 36ea0c754f..a1db694bac 100644 --- a/.agents/skills/nemoclaw-maintainer-cut-release-tag/SKILL.md +++ b/.agents/skills/nemoclaw-maintainer-cut-release-tag/SKILL.md @@ -1,226 +1,151 @@ --- name: nemoclaw-maintainer-cut-release-tag -description: Cut a new semver release — bump all version strings via bump-version.ts, open a release PR, and after merge tag main and push. Use when cutting a release, tagging a version, shipping a build, or preparing a deployment. Trigger keywords - cut tag, release tag, new tag, cut release, tag version, ship it. +description: Creates deterministic NemoClaw semver release tags on origin/main and drafts release notes. Use when cutting a release, tagging a version, shipping a build, creating vX.Y.Z tags, or preparing release announcements. user_invocable: true --- + + + # Cut Release Tag -Bump all version strings, open a release PR, and after merge create annotated semver + `latest` tags on `origin/main`. +Use the release scripts only. Do not run raw `git tag`, `git push`, `gh api`, or version-bump commands by hand for the normal release flow. -This skill delegates the version-bump work to `scripts/bump-version.ts` (invoked via `npm run bump:version`). That script updates package.json (root + plugin), blueprint.yaml, installer defaults, docs config, and versioned doc links — then runs the build and tests before opening a PR. +The release is one annotated semver tag on an already-merged `origin/main` commit. The GitHub workflow moves `latest`; release admins promote `lkg` manually after validation. -## Prerequisites +## Hard Rules -- You must be in the NemoClaw git repository. -- You must have push access to `origin` (NVIDIA/NemoClaw). -- The nightly E2E suite should have passed before tagging. Check with the user if unsure. +- Tag only the commit captured in a generated release plan. +- Ask the maintainer to paste the exact confirmation phrase from the plan before cutting the tag. +- Push only the semver tag (`vX.Y.Z`) from the agent-controlled step. +- Never push `latest` or `lkg` from this skill. +- Never move, delete, or force-push an existing remote semver tag unless the maintainer explicitly starts protected-tag remediation. +- Draft release notes locally. Do not create the GitHub Discussion; the maintainer does that. -## Step 1: Determine the Current Version +## Workflow -Fetch all tags and find the latest semver tag: +Copy this checklist and update it as you proceed: -```bash -git fetch origin --tags -git tag --sort=-v:refname | grep -E '^v[0-9]+\.[0-9]+\.[0-9]+$' | head -1 +```text +Release Progress: +- [ ] Step 1: Generate release plan +- [ ] Step 2: Show plan and exact confirmation phrase +- [ ] Step 3: Cut the semver tag from the confirmed plan +- [ ] Step 4: Wait for workflow-managed latest +- [ ] Step 5: Generate release-note data and draft Markdown +- [ ] Step 6: Hand off announcement steps ``` -Parse the major, minor, and patch components from this tag. +### Step 1: Generate Release Plan -## Step 2: Ask the User Which Bump +Run exactly one of: -Present the options with the **patch bump as default**: +```bash +npm run release:plan -- --bump patch +npm run release:plan -- --bump minor +npm run release:plan -- --bump major +``` -- **Patch** (default): `vX.Y.(Z+1)` — bug fixes, small changes -- **Minor**: `vX.(Y+1).0` — new features, larger changes -- **Major**: `v(X+1).0.0` — breaking changes +Patch is the default if the maintainer says "yes", "go", or similar without choosing. -Show the concrete version strings. Example prompt: +The script writes a plan outside the checkout root, for example: -> Current tag: `v0.0.2` -> -> Which version bump? -> -> 1. **Patch** → `v0.0.3` (default) -> 2. **Minor** → `v0.1.0` -> 3. **Major** → `v1.0.0` +```text +../nemoclaw-release-v0.0.58/plan.json +``` -Wait for the user to confirm before proceeding. If they just say "yes", "go", "do it", or similar, use the patch default. +### Step 2: Show Plan and Ask for Exact Confirmation -## Step 3: Show What's Being Tagged +Read the generated `plan.json` and show the maintainer: -Show the user the commit that will be tagged and the changelog since the last tag: +- previous tag, +- next tag, +- target `origin/main` commit and headline, +- plan hash, +- forbidden operations, +- exact confirmation phrase. -```bash -git log --oneline origin/main -1 -git log --oneline ..origin/main +Ask the maintainer to paste the exact phrase: + +```text +CONFIRM RELEASE vX.Y.Z ``` -Ask for confirmation before proceeding. +Do not proceed on a generic "yes" at this step. -## Step 4: Run the Version Bump Script +### Step 3: Cut the Semver Tag -First, preview the plan with `--dry-run`: +Run the cut script with the plan and the maintainer's exact phrase: ```bash -npm run bump:version -- --dry-run +npm run release:cut -- --plan --confirm "CONFIRM RELEASE vX.Y.Z " ``` -Show the dry-run output to the user. After confirmation, ask the user which mode they want: - -### Option A: PR mode (default, recommended) +The script verifies a clean worktree, unchanged `origin/main`, tag availability, target reachability, and remote peeled tag state. It writes: -```bash -npm run bump:version -- +```text +/cut-result.json ``` -This will: +If the script fails, stop and report the error. Do not improvise git commands. -1. Update all version strings across the repo -2. Run the build and tests -3. Create a `release/` branch and open a release PR against `main` +### Step 4: Wait for Workflow-Managed `latest` -In PR mode, tagging is deferred — proceed to Step 5 after the PR merges. - -### Option B: Direct mode (no PR) +Run: ```bash -npm run bump:version -- --no-create-pr --push +npm run release:wait-latest -- --plan ``` -This will: - -1. Update all version strings across the repo -2. Run the build and tests -3. Commit directly on `main` -4. Create annotated `v` and `latest` tags -5. Push the commit and both tags to origin +The script waits until `vX.Y.Z^{}` and `latest^{}` both peel to the planned commit and verifies `lkg` did not change from the plan. It writes: -In direct mode, tagging and pushing are handled by the script — skip to Step 6. - -If the user wants to skip tests (e.g., they already ran them), add `--skip-tests` to either mode. +```text +/latest-result.json +``` -## Step 5: Create and Push Tags (PR mode only, after PR merge) +If it fails, report the failed workflow/status. Do not manually move `latest`. -Skip this step if you used direct mode in Step 4 — the script already tagged and pushed. +### Step 5: Generate Release-Note Data and Draft Markdown -Once the release PR is merged into `main`, create the annotated tag, move `latest`, and push: +Collect deterministic release-note input: ```bash -git fetch origin main --tags -git tag -a origin/main -m "" - -# Move the latest tag (delete old, create new) -git tag -d latest 2>/dev/null || true -git tag -a latest origin/main -m "latest" - -# Push both tags (force-push latest since it moves) -git push origin -git push origin latest --force +npm run release:notes-data -- --plan ``` -## Step 6: Verify +This writes: -```bash -git ls-remote --tags origin | grep -E '(|latest)' +```text +/notes-data.json ``` -Confirm both tags point to the same commit on the remote. +If `notes-data.json` has `status: "partial"` or non-empty `pullRequestWarnings`, report the warnings and ask the maintainer whether to fetch/fill the missing PR metadata before drafting. -## Step 7: Conditionally Sweep Stale-Issue Verification Labels +Draft release notes from `notes-data.json` using the style from `nemoclaw-maintainer-release-notes`. Save only Markdown, outside the checkout root: -Strip `fixed-on-latest` from open issues only when the verification has actually gone stale or a regression risk appeared since we verified — never blanket-sweep. A blanket sweep on every release re-verifies labels that were freshly applied yesterday, wasting Brev cost and creating noise. The skill's by-design path uses the existing repo `status: wont-fix` label, which is **not** swept (also applied for non-skill triage reasons, so clearing it would erase human work). `verify-inconclusive` is also kept on the same conditional cascade as `fixed-on-latest`. - -**Decision cascade per labeled-and-open issue:** - -| Order | Check | Action | -|---|---|---| -| 1 | Project [NVIDIA/199](https://github.com/orgs/NVIDIA/projects/199) status == **Done** | **Skip clear** — maintainer already accepted the verification; label can stay until the issue closes. | -| 2 | More than 14 days since the skill marker comment AND status != Done | **Clear** — verification is stale; reporter never confirmed in the review window. Re-verify on next skill run. | -| 3 | A PR merged since the marker date touches the paths the comment cited in `Relevant changes since v0.0.X` | **Clear** — regression risk; what was "fixed" may have been re-broken. | -| — | else | **Skip clear** — verification still holds; skill won't re-run on this issue (still excluded by Step 3 marker-TTL plus the live label). | +```text +/release-note-draft.md +``` -Closed issues are not iterated (the `--state open` filter on the listing excludes them implicitly). +Do not create or update a GitHub Discussion. -Requires the `project` scope on the maintainer's gh CLI for the Project 199 status lookup. If missing, run `gh auth refresh -h github.com -s project` in a real terminal once (OAuth device-code flow). With the scope absent, the sweep falls back to the **time + regression** logic alone (skips check #1) and logs a warning. +### Step 6: Hand Off Announcement -```bash -PROJECT_NUMBER=199 -TODAY_TS=$(date -u +%s) -HAVE_PROJECT_SCOPE=0 -gh auth status 2>&1 | grep -q "'project'" && HAVE_PROJECT_SCOPE=1 || \ - echo "[release-sweep] WARN gh missing 'project' scope — Done-state check disabled this run" - -for label in fixed-on-latest verify-inconclusive; do - for n in $(gh issue list --repo NVIDIA/NemoClaw --state open --label "$label" --json number -q '.[].number'); do - - # 1. Project Done-state check (only if we have project scope) - if [ "$HAVE_PROJECT_SCOPE" = "1" ]; then - STATUS=$(gh api graphql -F num="$n" -f query=' - query($num: Int!) { - repository(owner: "NVIDIA", name: "NemoClaw") { - issue(number: $num) { - projectItems(first: 100) { - nodes { - project { number } - fieldValueByName(name: "Status") { - ... on ProjectV2ItemFieldSingleSelectValue { name } - } - } - } - } - } - }' --jq '.data.repository.issue.projectItems.nodes[] | select(.project.number == 199) | .fieldValueByName.name' 2>/dev/null | head -1) - if [ "$STATUS" = "Done" ]; then - echo "[release-sweep] kept #$n ($label) — Project 199 status is Done" - continue - fi - fi - - # 2. Find the most recent skill marker comment date - MARKER_DATE=$(gh issue view "$n" --repo NVIDIA/NemoClaw --json comments \ - --jq '.comments | map(select(.body | test("nemoclaw-verify-stale v\\d+ \\d{4}-\\d{2}-\\d{2}"))) | last | .body | (capture("nemoclaw-verify-stale v\\d+ (?\\d{4}-\\d{2}-\\d{2})") // {}) | .d // empty') - if [ -z "$MARKER_DATE" ]; then - # Label exists but no skill marker — applied manually; leave alone. - echo "[release-sweep] kept #$n ($label) — no skill marker, label applied manually" - continue - fi - - AGE_DAYS=$(( (TODAY_TS - $(date -u -j -f "%Y-%m-%d" "$MARKER_DATE" +%s 2>/dev/null || date -u -d "$MARKER_DATE" +%s)) / 86400 )) - if [ "$AGE_DAYS" -ge 14 ]; then - gh issue edit "$n" --repo NVIDIA/NemoClaw --remove-label "$label" - echo "[release-sweep] cleared #$n ($label) — stale (verified ${AGE_DAYS}d ago, reporter not confirmed)" - continue - fi - - # 3. Regression check — any PR-merge commit since MARKER_DATE touch the paths the - # comment's `Relevant changes since v0.0.X` block cited? - PATHS=$(gh issue view "$n" --repo NVIDIA/NemoClaw --json comments \ - --jq '.comments | map(select(.body | test("nemoclaw-verify-stale v\\d+"))) | last | .body' \ - | grep -oE '`[a-zA-Z0-9_/.-]+\.(ts|js|sh|py|yaml|yml|md)`' | tr -d '`' | sort -u) - if [ -n "$PATHS" ]; then - # Run from the current directory — Step 1's prerequisite already requires the maintainer - # to be inside the NemoClaw repo, and hardcoding ~/NemoClaw breaks anyone with a non-default - # checkout location. - REGRESSED=$(git log --since="$MARKER_DATE" origin/main --name-only --format=oneline -- $PATHS 2>/dev/null | head -1) - if [ -n "$REGRESSED" ]; then - gh issue edit "$n" --repo NVIDIA/NemoClaw --remove-label "$label" - echo "[release-sweep] cleared #$n ($label) — regression risk (commits since ${MARKER_DATE} touch implicated paths)" - continue - fi - fi - - echo "[release-sweep] kept #$n ($label) — verified ${AGE_DAYS}d ago, no Done state, no regression touch" - done -done -``` +Return: -The verification record itself stays in each issue's comment history — only the labels are reset, and only when the cascade above fires. +- release tag, +- confirmed release commit, +- plan path and plan hash, +- `cut-result.json`, `latest-result.json`, and `notes-data.json` paths, +- Markdown draft path, +- suggested discussion title: `NemoClaw is out`, +- reminder: maintainer creates the Announcement discussion and shares its link in external channels. -## Important Notes +## Recovery -- NEVER tag without explicit user confirmation of the version. -- NEVER tag a branch other than `origin/main`. -- Always use annotated tags (`-a`), not lightweight tags. -- The `latest` tag is a floating tag that always points to the most recent release — it requires `--force` to push. -- The version string passed to `npm run bump:version` should NOT have a `v` prefix (e.g., `0.0.3`, not `v0.0.3`). The script adds the `v` prefix for tags internally. +- Plan generation fails: fix the named precondition, then regenerate the plan. +- `origin/main` moved after plan generation: regenerate the plan and ask for the new exact confirmation phrase. +- Remote semver tag already exists: stop; do not retag unless the maintainer explicitly starts protected-tag remediation. +- `latest` workflow fails or times out: report the workflow/status; do not move `latest` manually. +- `latest` workflow rejects a rollback: keep `latest` unchanged, inspect the plan target commit, and regenerate the plan for the current `origin/main` tip if appropriate. +- `lkg` changed: stop and escalate to a release admin. diff --git a/.agents/skills/nemoclaw-maintainer-release-notes/SKILL.md b/.agents/skills/nemoclaw-maintainer-release-notes/SKILL.md new file mode 100644 index 0000000000..c25c3f436e --- /dev/null +++ b/.agents/skills/nemoclaw-maintainer-release-notes/SKILL.md @@ -0,0 +1,174 @@ +--- +name: nemoclaw-maintainer-release-notes +description: Drafts NemoClaw release notes from live GitHub tag and compare data. Produces the repo's narrative release-note style with three lead paragraphs, categorized shipped changes, why-it-matters bullets, and external-only contributor thanks. Use after cutting a release tag or when asked to draft release notes, prepare an announcement, write a changelog, or summarize v0.0.x. +user_invocable: true +--- + + + + +# NemoClaw Maintainer Release Notes + +Draft NemoClaw release notes from live release data. The house style is: + +- three narrative lead paragraphs, +- a categorized list of shipped changes, +- one "what changed and why it matters / why we did it" bullet for every included shipped change, +- external-only contributor thanks, +- visible `#NNNN` GitHub links. + +Create a local Markdown draft. Do not create or update a GitHub Discussion; the maintainer posts the announcement manually. + +## Prerequisites + +- You must be in the NemoClaw git repository. +- `gh` must be authenticated for `NVIDIA/NemoClaw`. +- The release tag should already exist. If the user is still cutting the tag, use `nemoclaw-maintainer-cut-release-tag` first. +- Use live GitHub and remote tag state, not memory or a stale local branch. +- If `/notes-data.json` exists from `npm run release:notes-data`, use it as the starting source of truth and query GitHub only to fill missing fields. +- If `notes-data.json` has `status: "partial"` or non-empty `pullRequestWarnings`, report those warnings and ask the maintainer whether to fetch or fill the missing PR metadata before drafting. + +## Step 1: Verify the Release Range + +Identify the current release tag and previous release tag. If the user gives only the current version, derive the previous semver tag from remote tags. + +```bash +git ls-remote https://github.com/NVIDIA/NemoClaw.git \ + refs/heads/main \ + refs/tags/ 'refs/tags/^{}' \ + refs/tags/ 'refs/tags/^{}' \ + refs/tags/latest 'refs/tags/latest^{}' +``` + +Confirm: + +- `^{}` peels to the intended release commit. +- `latest` points to the same peeled commit unless the user explicitly says otherwise. +- The compare range is `...`. + +## Step 2: Collect the Shipped Surface + +If `notes-data.json` exists, read it first. Otherwise, use the compare API as the first source of truth: + +```bash +gh api repos/NVIDIA/NemoClaw/compare/... \ + --jq '{status,ahead_by,total_commits,commits:[.commits[] | {sha:.sha, headline:(.commit.message|split("\n")[0]), author:.commit.author.name}], files:[.files[] | {filename,status,changes}]}' +``` + +For each PR number in commit headlines, collect live PR metadata: + +```bash +gh pr view --repo NVIDIA/NemoClaw \ + --json number,title,author,headRepositoryOwner,url,mergeCommit,labels,body,mergedAt +``` + +Also inspect any shipped commit that does not have a PR number in the headline. Include it only if it is a real shipped change worth announcing. + +## Step 3: Decide What to Include + +Include the shipped product, docs, release-surface, and CI confidence changes that a reader should know about. + +For each included item, write: + +- what changed, +- why it matters or why we did it, +- a visible PR link like `[#4474](https://github.com/NVIDIA/NemoClaw/pull/4474)`. + +Be careful with sensitive internal cleanup: + +- Do not count testing reverts or guardrail reversions as release value unless the user explicitly asks for a full raw changelog. +- If a revert-like commit must be mentioned, use neutral language and do not frame it as someone else's mistake. +- Avoid public wording that could embarrass a teammate. + +## Step 4: Categorize the Changes + +Use categories that match the release surface. Prefer 4-6 sections. Common categories: + +- OpenClaw, Sandbox, and Network Stability +- Windows, WSL, and Onboarding Recovery +- Messaging and OpenClaw Runtime Activation +- Hermes and Inference +- Skills, Docs, and Release Surface +- CI and Release Confidence + +Every included shipped change should appear in exactly one category unless the user asks for a shorter note. + +## Step 5: Handle Contributor Credit + +By default, thank external contributors only. Do not thank NVIDIA/internal contributors by GitHub ID unless the user explicitly asks. + +Determine external contributors from live GitHub state: + +```bash +for login in ; do + code=$(gh api -i orgs/NVIDIA/members/$login 2>/dev/null \ + | sed -n '1s/.* \([0-9][0-9][0-9]\).*/\1/p' || true) + printf '%s %s\n' "$login" "${code:-unknown}" +done +``` + +Interpretation: + +- `204` means the account is a visible member of `NVIDIA`. +- `404` means the account is not a visible member and should be treated as external for release-note thanks. + +Replay PRs need special care: + +- If a maintainer replayed an external PR, inspect the replay PR body and the original PR. +- Credit the original external GitHub username in the issue-level bullet for the shipped replay. +- Also thank that username in the final thanks section. +- Do not mention affiliations, organizations, domains, or companies unless the user explicitly asks. Use the GitHub username only. + +Example: + +```markdown +- [#4474](https://github.com/NVIDIA/NemoClaw/pull/4474) replays and narrows the Hermes Provider host-smoke fix originally contributed by @shannonsands in [#4385](https://github.com/NVIDIA/NemoClaw/pull/4385). ... + +## Thank you + +Thank you to external contributor @shannonsands for the original Hermes Provider smoke-check contribution in [#4385](https://github.com/NVIDIA/NemoClaw/pull/4385), which was replayed and narrowed into [#4474](https://github.com/NVIDIA/NemoClaw/pull/4474) for this release. +``` + +## Step 6: Draft the Narrative + +Write the top section as exactly three paragraphs unless the user asks otherwise. + +If the user requests a theme, let it shape the paragraphs. If the user asks for the voice of Carl Sagan, keep it subtle: cosmic scale, humility, clarity, and wonder, but no parody, no quotes, and no overdone imitation. + +Suggested structure: + +1. Stability theme and infrastructure/security boundary changes. +2. User-facing workflow stability: messaging, Hermes, inference, onboarding. +3. Maintenance stability: skills, docs, checks, and release confidence. + +Keep the prose warm and polished, but concrete. Tie the narrative to actual PRs in the release range. + +## Step 7: Write a Local Draft First + +Create a Markdown draft outside the checkout root so the repo stays clean, for example: + +```bash +../nemoclaw--release-note-draft.md +``` + +The Markdown body is the source the maintainer can paste into GitHub Discussions. + +Stop here. The maintainer creates the GitHub Discussion and shares the announcement link. + +## Output + +For a draft-only run, return: + +- Markdown draft path, +- a short note about the compare range and any excluded revert/test-cleanup items. + +Also return the suggested discussion title: `NemoClaw is out`. + +## Hard Rules + +- Never create or update a GitHub Discussion from this skill. +- Never draft from memory alone; use live `gh api compare` and PR metadata. +- Never mention contributor affiliation unless the user explicitly asks. +- Never thank internal contributors by default; keep thanks external-only. +- Never include testing reverts as release-value bullets unless explicitly asked for a raw changelog. +- Never create duplicate release Discussions. diff --git a/.agents/skills/nemoclaw-maintainer-verify-stale/reference/candidate-selection.md b/.agents/skills/nemoclaw-maintainer-verify-stale/reference/candidate-selection.md index f19252141d..a62baf96b3 100644 --- a/.agents/skills/nemoclaw-maintainer-verify-stale/reference/candidate-selection.md +++ b/.agents/skills/nemoclaw-maintainer-verify-stale/reference/candidate-selection.md @@ -74,7 +74,7 @@ Apply these rules in order. Drop any issue that fails a rule. **Idempotency:** drop if **either** of these is true: -- The issue carries a `fixed-on-latest` or `verify-inconclusive` label. (Cleared by the release sweep in `nemoclaw-maintainer-cut-release-tag` so the issue re-opens on each release.) The by-design path uses the existing repo `status: wont-fix` label, which is already covered by the issue-type skip rule above — no separate idempotency clause needed for that path. +- The issue carries a `fixed-on-latest` or `verify-inconclusive` label. These labels are persistent; rerun verification only when a maintainer explicitly targets the issue or removes the label. The by-design path uses the existing repo `status: wont-fix` label, which is already covered by the issue-type skip rule above — no separate idempotency clause needed for that path. - A comment matching `` was posted **within the last 7 days**. The regex matches any marker version (`v1`, `v2`, …) so future skill versions can re-verify older-marked issues by tightening the regex (e.g. require a specific marker version). The marker carries a date so the candidate filter can apply a TTL — useful for the still-reproduces case (Step 9), where no label is applied and we want next week's run to re-verify rather than skip forever. Implementation — match the marker against each comment's `createdAt`. Use `gh issue view --json comments` (single-issue mode already fetches this; batch mode's `gh issue list` also returns the comment array per issue): diff --git a/.agents/skills/nemoclaw-maintainer-verify-stale/reference/scoring-comments-and-logging.md b/.agents/skills/nemoclaw-maintainer-verify-stale/reference/scoring-comments-and-logging.md index b536e4c33a..d4f1fe37f0 100644 --- a/.agents/skills/nemoclaw-maintainer-verify-stale/reference/scoring-comments-and-logging.md +++ b/.agents/skills/nemoclaw-maintainer-verify-stale/reference/scoring-comments-and-logging.md @@ -470,10 +470,10 @@ Never stage or commit the log to the NemoClaw repo. - macOS verification *via the Brev path*. Brev offers no macOS instances. The Step 6.7 local-first short-circuit *does* run on a maintainer's macOS laptop — so manual single-issue runs against pure-CLI bugs work on macOS. The weekly batch cron is Linux-only because that path always uses Brev. - Issues requiring third-party integration credentials (Slack, Discord, Telegram, Hermes, OpenClaw, WeChat). - Service-account bot identity. v1 runs under each maintainer's own GitHub credentials. -- Versioned labels. A single `fixed-on-latest` label is swept on each release cut. +- Versioned labels. `fixed-on-latest` and `verify-inconclusive` are persistent maintainer-review labels, not per-release labels. --- ## Companion Behavior -`nemoclaw-maintainer-cut-release-tag` sweeps `fixed-on-latest` and `verify-inconclusive` from all open issues at release time. Without that sweep, "latest" drifts and verifications go stale silently. The by-design path uses the existing repo `status: wont-fix` label; that label is **not** swept (it's also applied for non-skill reasons such as scope or priority decisions, and clearing it would erase human triage work). +`nemoclaw-maintainer-cut-release-tag` does not sweep issue labels during release. A `fixed-on-latest` or `verify-inconclusive` label stays until a maintainer removes it or explicitly re-runs verification for that issue. The by-design path uses the existing repo `status: wont-fix` label; that label is also persistent because it is applied for non-skill reasons such as scope or priority decisions. diff --git a/.agents/skills/nemoclaw-skills-guide/SKILL.md b/.agents/skills/nemoclaw-skills-guide/SKILL.md index 1f744377c0..5e0f6e9287 100644 --- a/.agents/skills/nemoclaw-skills-guide/SKILL.md +++ b/.agents/skills/nemoclaw-skills-guide/SKILL.md @@ -1,6 +1,7 @@ --- name: "nemoclaw-skills-guide" description: "Start here. Introduces what NemoClaw is, what agent skills are available, and which skill to use for a given task. Use when discovering NemoClaw capabilities, choosing the right skill, or orienting in the project. Trigger keywords - skills, capabilities, what can I do, help, guide, index, overview, start here." +license: "Apache-2.0" --- # NemoClaw Skills Guide @@ -16,15 +17,15 @@ Load the specific skill you need after identifying it here. Skills are grouped into three buckets by audience. The prefix in each skill name indicates who it is for. -### `nemoclaw-user-*` (9 skills) +### `nemoclaw-user-*` (10 skills) For end users operating a NemoClaw sandbox. Covers installation, inference configuration, network policy management, monitoring, remote deployment, security configuration, workspace management, and reference material. -### `nemoclaw-maintainer-*` (8 skills) +### `nemoclaw-maintainer-*` (12 skills) For project maintainers. -Covers the daily maintainer cadence (morning standup, daytime loop, evening handoff), cutting releases, finding PRs to review, normalizing issue and PR title tags, performing security code reviews, and verifying whether stale bug reports still reproduce on the latest release. +Covers the daily maintainer cadence (morning standup, daytime loop, evening handoff), cutting releases, drafting release notes, finding PRs to review, comparing PRs, cross-issue sweeps, triage, normalizing issue and PR title tags, performing security code reviews, and verifying whether stale bug reports still reproduce on the latest release. ### `nemoclaw-contributor-*` (2 skills) @@ -47,6 +48,7 @@ Covers creating pull requests that follow the project template and drafting docu | `nemoclaw-user-configure-security` | Review the risk framework for every configurable security control, understand credential storage, and assess posture trade-offs. | | `nemoclaw-user-manage-sandboxes` | Manage day-two sandbox operations, including status, logs, diagnostics, rebuilds, upgrades, messaging channels, workspace files, backup, and restore. | | `nemoclaw-user-reference` | CLI command reference, plugin and blueprint architecture, baseline network policies, and troubleshooting guide. | +| `nemoclaw-user-agent-skills` | Describes the agent skills shipped with NemoClaw and how to access them by cloning the repository. | ### Maintainer Skills @@ -54,10 +56,14 @@ Covers creating pull requests that follow the project template and drafting docu | Skill | Summary | |-------|---------| | `nemoclaw-maintainer-morning` | Morning standup: triage the backlog, determine the day's target version, label selected items, surface stragglers, and output the daily plan. | +| `nemoclaw-maintainer-triage` | Suggest and optionally apply labels for issues and PRs using the live NemoClaw triage instructions. | +| `nemoclaw-maintainer-cross-issue-sweep` | Scan open issues for adjacent fixes or contradiction risks when reviewing a PR. | | `nemoclaw-maintainer-day` | Daytime loop: pick the highest-value version-targeted item and execute the right workflow (merge gate, salvage, security sweep, test gaps, hotspot cooling, or sequencing). Designed for `/loop`. | | `nemoclaw-maintainer-evening` | End-of-day handoff: check version progress, bump stragglers to the next patch, generate a QA handoff summary, and cut the release tag. | -| `nemoclaw-maintainer-cut-release-tag` | Cut an annotated semver tag on main, move the `latest` floating tag, and push both to origin. | +| `nemoclaw-maintainer-cut-release-tag` | Cut an annotated semver tag on a maintainer-confirmed `origin/main` commit; the GitHub workflow moves `latest`, and `lkg` stays manual. | +| `nemoclaw-maintainer-release-notes` | Draft release notes from live tag/compare data, with the three-paragraph narrative, categorized change list, and external-only contributor thanks. | | `nemoclaw-maintainer-find-review-pr` | Find open PRs labeled security + priority-high, link each to its issue, detect duplicates, and present a review summary. | +| `nemoclaw-maintainer-pr-comparator` | Compare competing PRs for the same issue and recommend which one to merge. | | `nemoclaw-maintainer-normalize-title-tags` | Preview and remove bracketed `NemoClaw` title tags from issues and PRs case-insensitively, even when the tag appears later in the title. | | `nemoclaw-maintainer-security-code-review` | Perform a 9-category security review of a PR or issue, producing per-category PASS/WARNING/FAIL verdicts. | | `nemoclaw-maintainer-verify-stale` | Verify whether old bug reports still reproduce on latest. Reuses or provisions a Brev box (CPU or GPU), runs the extracted reproducer, scores confidence, and posts an evidence-backed comment with `fixed-on-latest` or `verify-inconclusive`. Tag-only — never auto-closes. | @@ -81,8 +87,8 @@ Skills are cumulative. Each role includes the skills from the roles above it: | Role | Skills included | Count | Start with | |------|----------------|-------|------------| -| User | `nemoclaw-user-*` | 9 | `nemoclaw-user-get-started` | -| Contributor | `nemoclaw-user-*` + `nemoclaw-contributor-*` | 11 | `nemoclaw-user-overview` | -| Maintainer | All skills | 19 | `nemoclaw-maintainer-morning` | +| User | `nemoclaw-user-*` | 10 | `nemoclaw-user-get-started` | +| Contributor | `nemoclaw-user-*` + `nemoclaw-contributor-*` | 12 | `nemoclaw-user-overview` | +| Maintainer | All skills | 24 | `nemoclaw-maintainer-morning` | After identifying the role, present the applicable skills from the Skill Catalog above and recommend the starting skill. diff --git a/.agents/skills/nemoclaw-user-agent-skills/SKILL.md b/.agents/skills/nemoclaw-user-agent-skills/SKILL.md index 1d33a7e988..fca8d94d8c 100644 --- a/.agents/skills/nemoclaw-user-agent-skills/SKILL.md +++ b/.agents/skills/nemoclaw-user-agent-skills/SKILL.md @@ -1,13 +1,89 @@ --- name: "nemoclaw-user-agent-skills" description: "Describes the agent skills shipped with NemoClaw and how to access them by cloning the repository. Use when users ask about AI agent support, coding assistant integration, or the .agents/skills/ directory. Trigger keywords - nemoclaw agent skills, ai coding assistant, cursor, claude code, copilot." +license: "Apache-2.0" --- - - - # NemoClaw Agent Skills for Your AI Coding Assistant -## References +NemoClaw ships agent skills that are generated directly from this documentation. +Each skill is a converted version of one or more doc pages, structured so AI coding assistants can consume it as context. +This means you can interact with the full NemoClaw documentation as skills inside your agent chat session, instead of reading the docs separately. + +Ask your assistant a question about NemoClaw and it responds with the same guidance found in these docs, adapted to your current situation. +Skills cover installation, inference configuration, network policy management, monitoring, deployment, security, workspace management, and the CLI reference. + +**Note:** + +If you are a contributor and have cloned the full NemoClaw repository, the full set of skills including contributor and maintainer skills are already available at the project root. +Open the `NemoClaw` directory in your coding assistant and the skills load automatically. +This page is for users who installed NemoClaw with the installer and do not have a local clone. + +## Get the Skills + +Fetch only the skills from the NemoClaw repository without downloading the full source tree. + +```bash +git clone --filter=blob:none --no-checkout https://github.com/NVIDIA/NemoClaw.git +cd NemoClaw +git sparse-checkout set --no-cone '/.agents/skills/nemoclaw-user-*/**' '/.agents/skills/nemoclaw-skills-guide/**' '/.claude/**' '/AGENTS.md' '/CLAUDE.md' +git checkout +``` + +Open the `NemoClaw` directory in your AI coding assistant. +The assistant discovers the skills in `.agents/skills/` and uses them to answer NemoClaw questions with project-specific guidance. + +You can keep the skills inside the cloned directory or copy `.agents/skills/` to a global location (such as `~/.cursor/skills/` or `~/.claude/skills/`) so they are available across all your projects. +The choice depends on whether you want NemoClaw skills scoped to one workspace or accessible everywhere. + +## Update the Skills + +The sparse checkout filter is saved, so `git pull` fetches only updated skills without downloading the full source tree. +Run `git pull` after each NemoClaw release to pick up new and updated skills. + +## Available Skills + +The following user skills ship with NemoClaw. + +| Skill | Summary | +|-------|---------| +| `nemoclaw-user-overview` | What NemoClaw is, ecosystem placement (OpenClaw + OpenShell + NemoClaw), how it works internally, and release notes. | +| `nemoclaw-user-get-started` | Install NemoClaw, launch a sandbox, and run the first agent prompt. | +| `nemoclaw-user-configure-inference` | Choose inference providers during onboarding, switch models without restarting, and set up local inference servers (Ollama, vLLM, TensorRT-LLM, NIM). | +| `nemoclaw-user-manage-policy` | Approve or deny blocked egress requests in the TUI and customize the sandbox network policy (add, remove, or modify allowed endpoints). | +| `nemoclaw-user-monitor-sandbox` | Check sandbox health, read logs, and trace agent behavior to diagnose problems. | +| `nemoclaw-user-deploy-remote` | Deploy NemoClaw to a remote GPU instance, set up the Telegram bridge, and review sandbox container hardening. | +| `nemoclaw-user-configure-security` | Review the risk framework for every configurable security control, understand credential storage, and assess posture trade-offs. | +| `nemoclaw-user-manage-sandboxes` | Manage day-two sandbox operations, including status, logs, diagnostics, rebuilds, upgrades, messaging channels, workspace files, backup, and restore. | +| `nemoclaw-user-reference` | CLI command reference, plugin and blueprint architecture, baseline network policies, and troubleshooting guide. | + +## Example Questions and Triggered Skills + +After opening the cloned repository in your coding assistant, ask a NemoClaw question in natural language. +The assistant matches your question to the relevant skill and follows the guidance it contains. + +Examples of questions your assistant can answer with these skills: + +| Question | Skill triggered | +|----------|-----------------| +| "How do I install NemoClaw?" | `nemoclaw-user-get-started` | +| "Switch my inference provider to Ollama." | `nemoclaw-user-configure-inference` | +| "A network request was blocked. How do I approve it?" | `nemoclaw-user-manage-policy` | +| "Show me the sandbox logs." | `nemoclaw-user-monitor-sandbox` | +| "How do I deploy NemoClaw to a remote GPU?" | `nemoclaw-user-deploy-remote` | +| "What security controls can I configure?" | `nemoclaw-user-configure-security` | +| "Back up my agent workspace files." | `nemoclaw-user-manage-sandboxes` | +| "What CLI commands are available?" | `nemoclaw-user-reference` | + +You can also reference a skill directly by name if you know which one you need. + +## AI Coding Assistants that You Can Use with NemoClaw Skills + +The NemoClaw agent skills follow the [Agent Skills best practices](https://agentskills.io/skill-creation/best-practices) and the [Claude Skills best practices](https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices). +The following table shows how each AI coding assistant can use the NemoClaw skills. -- **Load [references/agent-skills.md](references/agent-skills.md)** when users ask about AI agent support, coding assistant integration, or the .agents/skills/ directory. Describes the agent skills shipped with NemoClaw and how to access them by cloning the repository. +| Assistant | Skill discovery | +|-----------|----------------| +| Cursor | Reads `AGENTS.md` at the project root, which references `.agents/skills/`. | +| Claude Code | Follows the `.claude/skills/` symlink, which points to `.agents/skills/`. | +| Other assistants | Point the assistant to `.agents/skills/` if it supports project-level skill loading. | diff --git a/.agents/skills/nemoclaw-user-agent-skills/evals/evals.json b/.agents/skills/nemoclaw-user-agent-skills/evals/evals.json new file mode 100644 index 0000000000..922afeb949 --- /dev/null +++ b/.agents/skills/nemoclaw-user-agent-skills/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-resources-agent-skills-001", + "question": "I'm looking at NemoClaw agent skills. Help me find a skill that can guide installation, policy, inference, or operations so I can delegate the right workflow to my AI coding assistant.", + "expected_skill": "nemoclaw-user-agent-skills", + "ground_truth": "A NemoClaw-specific answer that helps the user find a skill that can guide installation, policy, inference, or operations and gives enough concrete guidance, decision criteria, verification steps, or risk framing to delegate the right workflow to my AI coding assistant.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-configure-inference/SKILL.md b/.agents/skills/nemoclaw-user-configure-inference/SKILL.md index f43d2631eb..72b4ad14a7 100644 --- a/.agents/skills/nemoclaw-user-configure-inference/SKILL.md +++ b/.agents/skills/nemoclaw-user-configure-inference/SKILL.md @@ -1,11 +1,9 @@ --- name: "nemoclaw-user-configure-inference" -description: "Connects NemoClaw to a local inference server. Use when setting up Ollama, vLLM, TensorRT-LLM, NIM, or any OpenAI-compatible local model server with NemoClaw. Trigger keywords - nemoclaw local inference, ollama nemoclaw, vllm nemoclaw, local model server, openai compatible endpoint, switch nemoclaw inference model, change inference runtime, nemoclaw additional model, nemoclaw sub-agent model, openclaw sub-agent, agents.list, sessions_spawn, vlm-demo, nemoclaw tool calling, ollama tool calls, vllm tool-call-parser, raw json in tui, nemoclaw inference options, nemoclaw onboarding providers, nemoclaw inference routing." +description: "Connects NemoClaw to a local inference server. Use when setting up Ollama, vLLM, TensorRT-LLM, NIM, or any OpenAI-compatible local model server with NemoClaw. Trigger keywords - nemoclaw local inference, ollama nemoclaw, vllm nemoclaw, local model server, openai compatible endpoint, switch nemoclaw inference model, change inference runtime, nemoclaw additional model, nemoclaw sub-agent model, openclaw sub-agent, agents.list, sessions_spawn, vlm-demo, nemoclaw inference options, nemoclaw onboarding providers, nemoclaw inference routing, nemoclaw tool calling, ollama tool calls, vllm tool-call-parser, raw json in tui." +license: "Apache-2.0" --- - - - # Use a Local Inference Server ## Gotchas @@ -14,9 +12,14 @@ description: "Connects NemoClaw to a local inference server. Use when setting up ## Prerequisites -- NemoClaw installed. + + +- NemoClaw installed. Refer to the Quickstart (use the `nemoclaw-user-get-started` skill) if you have not installed yet. +- NemoClaw installed. Refer to Quickstart with Hermes (use the `nemoclaw-user-get-started` skill) if you have not installed yet. - A local model server running, or a supported Ollama, vLLM, or NIM setup that the NemoClaw onboard wizard can use, start, or install. +import { AgentOnly } from "../_components/AgentGuide"; + NemoClaw can route inference to a model server running on your machine instead of a cloud API. This page covers Ollama, compatible-endpoint paths for other servers, and experimental managed options for vLLM and NVIDIA NIM. @@ -27,13 +30,14 @@ OpenShell intercepts inference traffic and forwards it to the local endpoint you ## Ollama Ollama is the default local inference option. -The onboard wizard detects Ollama automatically when it is installed or running on the host. +The onboard wizard detects Ollama automatically when you have installed it or started it on the host. -If Ollama is installed but not running, NemoClaw starts it for you. +If you installed Ollama but have not started it, NemoClaw starts it for you. On macOS and Linux, the wizard can also offer to install Ollama when it is not present. When the host Ollama is below the minimum version NemoClaw expects for its starter models (currently `0.7.0`), the wizard surfaces an explicit **Upgrade Ollama** entry in the provider menu instead of silently reusing the older daemon, and the express setup path resolves to that entry. The wizard inspects both the CLI binary (`ollama --version`) and the locally running daemon (`/api/version` on `:11434`) so the upgrade entry still appears when only one side is stale, for example a fresh user-local binary paired with the original system daemon. -The gate skips Windows-host Ollama reached from WSL via `host.docker.internal`; the separate **Use / Start / Install Ollama on Windows host** entries handle that case and run their own actions on the Windows side. +The gate skips Windows-host Ollama reached from WSL through `host.docker.internal`. +The separate **Use / Start / Install Ollama on Windows host** entries handle that case and run their own actions on the Windows side. On macOS, the wizard runs the platform install or upgrade path with `brew upgrade ollama`. On Linux, the wizard runs the official `https://ollama.com/install.sh` path. Upgrades on Linux always take the sudo-driven system path because the sudo-free user-local fallback would leave the existing system daemon on `:11434` serving the stale binary. @@ -44,7 +48,7 @@ On WSL, the wizard can use, start, restart, or install Ollama on the Windows hos ### Linux Install Modes -On native Linux, the install path picks between a system install (under `/usr/local`, via the official `https://ollama.com/install.sh`) and a sudo-free user-local install (under `${HOME}/.local`). +On native Linux, the install path picks between a system install (under `/usr/local`, using the official `https://ollama.com/install.sh`) and a sudo-free user-local install (under `${HOME}/.local`). NemoClaw selects the mode automatically: - Running as root or with passwordless sudo (`sudo -n true` returns 0) selects the system install. @@ -55,27 +59,31 @@ NemoClaw selects the mode automatically: Override the detection with `NEMOCLAW_OLLAMA_INSTALL_MODE=system` or `NEMOCLAW_OLLAMA_INSTALL_MODE=user`. The user-local install replicates only the binary extraction step of the official installer. -It downloads the release tarball, extracts it to `${HOME}/.local`, and launches `${HOME}/.local/bin/ollama serve` once. -It does not configure a systemd service, does not create the `ollama` system user, and does not install CUDA drivers, so the daemon must be relaunched manually after a reboot. +It downloads the release tarball, extracts it to `${HOME}/.local`, and launches `${HOME}/.local/bin/ollama serve` one time. +It does not configure a systemd service, does not create the `ollama` system user, and does not install CUDA drivers, so you must relaunch the daemon manually after a reboot. NemoClaw also prints a one-line `PATH` hint if `${HOME}/.local/bin` is not already on your `PATH`; you can add `export PATH="${HOME}/.local/bin:$PATH"` to your shell profile to invoke `ollama` directly. Both modes rely on `zstd` for archive extraction. On Debian and Ubuntu, the system path uses `sudo apt-get` to install `zstd` automatically and explains the prompt before continuing. -The user-local path cannot bootstrap system packages without elevation, so if `zstd` is missing it prints per-distro install hints and exits — install `zstd` manually, then rerun onboarding. +The user-local path cannot bootstrap system packages without elevation. +If `zstd` is missing, it prints per-distro install hints and exits. +Install `zstd` manually, then rerun onboarding. Run the onboard wizard. -```console -$ nemoclaw onboard +```bash +nemoclaw onboard ``` Select **Local Ollama** from the provider list. -NemoClaw lists installed models or offers starter models if none are installed. +NemoClaw lists installed models or offers starter models if you have not installed any. On hosts where the larger starter models fit the currently available GPU memory, the starter list includes `qwen3.6:35b` and selects it by default. When another GPU workload is using most of the memory at onboard time, NemoClaw downgrades the menu to the largest model that still fits. It pulls the selected model, loads it into memory, and validates it before continuing. +When Ollama reports a loaded-model context length, NemoClaw uses that value for the `contextWindow` baked into `openclaw.json` unless you set `NEMOCLAW_CONTEXT_WINDOW` yourself. If the selected model declares that it does not support tool calling, onboarding stops with guidance to choose a model whose `ollama show ` capabilities include `tools`. The validation also requires structured chat-completions tool calls. If the model leaks tool-call JSON as plain message text, onboarding stops so you can choose a model that returns tool calls in the expected response field. +If a host-side validation probe times out, NemoClaw retries the Ollama tool-call validation with a larger timeout before failing the setup. On WSL, if you choose the Windows-host Ollama path, NemoClaw uses `host.docker.internal:11434` and pulls missing models through the Ollama HTTP API instead of requiring the `ollama` CLI inside WSL. ### WSL with Windows-Host Ollama @@ -83,8 +91,8 @@ On WSL, if you choose the Windows-host Ollama path, NemoClaw uses `host.docker.i When NemoClaw runs inside WSL, the provider menu can include Windows-host Ollama actions: - Use Ollama on Windows host when the Windows daemon is already reachable. -- Restart Ollama on Windows host when the daemon is installed but only bound to Windows loopback. -- Start Ollama on Windows host when Ollama is installed but not running. +- Restart Ollama on Windows host when you installed the daemon but bound it only to Windows loopback. +- Start Ollama on Windows host when you installed Ollama but have not started it. - Install Ollama on Windows host when Windows does not have Ollama installed. The install and restart paths set `OLLAMA_HOST=0.0.0.0:11434` on the Windows side so Docker and WSL can reach the daemon through `host.docker.internal`. @@ -94,20 +102,25 @@ If the HTTP endpoint is not reachable yet, NemoClaw also checks for the Windows If the daemon does not become reachable, onboarding prints PowerShell commands you can run to inspect the Windows-side process and port state. Use one Ollama instance on port `11434` at a time. If both WSL and Windows-host Ollama are running, pick the intended menu entry during onboarding so NemoClaw validates and pulls models against the right daemon. +Windows-host Ollama requires Docker Desktop WSL integration because the sandbox reaches the Windows daemon through Docker Desktop's WSL routing path. +If NemoClaw detects native Docker Engine inside WSL, the provider menu labels Windows-host Ollama actions as requiring Docker Desktop integration. +Selecting one of those actions in the unsupported native Docker topology exits early with a remediation message instead of trying to start or install Ollama on Windows. + + **Warning:** Ollama is convenient for local chat, but some model/template combinations can return tool calls as plain text under realistic agent load. If the TUI shows raw JSON such as `{"name":"memory_search","arguments":{...}}` instead of running a tool, switch to vLLM with `--enable-auto-tool-choice` and the correct -`--tool-call-parser`. See Tool-Calling Reliability (use the `nemoclaw-user-configure-inference` skill). +`--tool-call-parser`. See [Tool-Calling Reliability](references/tool-calling-reliability.md). + ### Authenticated Reverse Proxy On non-WSL hosts, NemoClaw keeps Ollama bound to `127.0.0.1:11434` and starts a token-gated reverse proxy on `0.0.0.0:11435`. The native install/start paths also reset NemoClaw-managed systemd launches to the loopback binding. -Containers and other hosts on the local network reach Ollama only through the -proxy, which validates a Bearer token before forwarding requests. +Containers and other hosts on the local network reach Ollama only through the proxy, which validates a Bearer token before forwarding requests. On that native path, NemoClaw never exposes Ollama without authentication. WSL Ollama paths do not use this proxy. @@ -127,22 +140,19 @@ For non-WSL Ollama setups, the onboard wizard manages the proxy automatically: On native Linux hosts, a firewall can allow the host proxy health check while still blocking sandbox containers on the OpenShell Docker bridge. When the sandbox-side proxy probe fails with a TCP error, onboarding exits before it saves the inference route and prints a command like: -```console -$ sudo ufw allow from to any port 11435 proto tcp -$ nemoclaw onboard +```bash +sudo ufw allow from to any port 11435 proto tcp +nemoclaw onboard ``` If the probe cannot run, for example because Docker Desktop or WSL uses a different host routing model, onboarding continues and relies on the regular proxy health check. -The sandbox provider is configured to use proxy port `11435` with the generated -token as its `OPENAI_API_KEY` credential. -OpenShell's L7 proxy injects the token at egress, so the agent inside the -sandbox never sees the token directly. +NemoClaw configures the sandbox provider to use proxy port `11435` with the generated token as its `OPENAI_API_KEY` credential. +OpenShell's L7 proxy injects the token at egress, so the agent inside the sandbox never sees the token directly. All proxy endpoints require the Bearer token, including `GET /api/tags`. -Internal health and reachability checks run via the proxy treat any HTTP -response (including `401`) as proof the proxy is alive — they only fail -when nothing answers at all. +Internal health and reachability checks run through the proxy treat any HTTP response, including `401`, as proof the proxy is alive. +They fail only when nothing answers at all. If Ollama is already running on a non-loopback address when you start onboard, the wizard restarts it on `127.0.0.1:11434` so the proxy is the only network @@ -156,8 +166,8 @@ This does not delete downloaded model files. ### Non-Interactive Setup -```console -$ NEMOCLAW_PROVIDER=ollama \ +```bash +NEMOCLAW_PROVIDER=ollama \ NEMOCLAW_MODEL=qwen2.5:14b \ nemoclaw onboard --non-interactive --yes ``` @@ -166,8 +176,9 @@ If `NEMOCLAW_MODEL` is not set, NemoClaw selects a default model based on availa If `NEMOCLAW_MODEL` names a known bootstrap model (for example `qwen3.6:35b`) that does not fit the host's currently available GPU memory, NemoClaw warns and falls back to the largest known model that does fit. Unknown or custom tags (any value the bootstrap registry has not seen) are still passed through; the Ollama runner validates the choice itself. -`--yes` (or `NEMOCLAW_YES=1`) authorises the Ollama model download without an interactive confirmation prompt. -Under `--non-interactive`, `--yes` (or `NEMOCLAW_YES=1`) is required to authorise the download — onboard exits otherwise, since it cannot prompt. +`--yes` (or `NEMOCLAW_YES=1`) authorizes the Ollama model download without an interactive confirmation prompt. +Under `--non-interactive`, include `--yes` (or `NEMOCLAW_YES=1`) to authorize the download. +Onboard exits otherwise because it cannot prompt. Run onboard without `--non-interactive` to get the interactive `[y/N]` prompt that shows the model size before downloading. | Variable | Purpose | @@ -176,251 +187,28 @@ Run onboard without `--non-interactive` to get the interactive `[y/N]` prompt th | `NEMOCLAW_MODEL` | Ollama model tag to use. Optional. | | `NEMOCLAW_YES` | Set to `1` to auto-accept the model-download confirmation prompt. Optional. | -## OpenAI-Compatible Server - -This option works with any server that implements `/v1/chat/completions`, including vLLM, TensorRT-LLM, llama.cpp, LocalAI, and others. -For compatible endpoints, NemoClaw uses `/v1/chat/completions` by default. -This avoids a class of failures where local backends accept `/v1/responses` requests but silently drop the system prompt and tool definitions. -To opt in to `/v1/responses`, set `NEMOCLAW_PREFERRED_API=openai-responses` before running onboard. - -Start your model server. -The examples below use vLLM, but any OpenAI-compatible server works. - -```console -$ vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000 -``` - -Run the onboard wizard. - -```console -$ nemoclaw onboard -``` - -When the wizard asks you to choose an inference provider, select **Other OpenAI-compatible endpoint**. -Enter the base URL of your local server, for example `http://localhost:8000/v1`. - -The wizard prompts for an API key. -If your server does not require authentication, enter any non-empty string (for example, `dummy`). - -NemoClaw validates the endpoint by sending a test inference request before continuing. -The wizard probes `/v1/chat/completions` by default for the compatible-endpoint provider. -If you set `NEMOCLAW_PREFERRED_API=openai-responses`, NemoClaw probes `/v1/responses` instead and only selects it when the response includes the streaming events OpenClaw requires. -If a reasoning model returns only reasoning content before producing a final answer, NemoClaw retries the smoke request with a larger response budget. -Route, configuration, and authentication failures still fail immediately. - -### Non-Interactive Setup +## Compatible Local Servers -Set the following environment variables for scripted or CI/CD deployments. - -```console -$ NEMOCLAW_PROVIDER=custom \ - NEMOCLAW_ENDPOINT_URL=http://localhost:8000/v1 \ - NEMOCLAW_MODEL=meta-llama/Llama-3.1-8B-Instruct \ - COMPATIBLE_API_KEY=dummy \ - nemoclaw onboard --non-interactive -``` - -| Variable | Purpose | -|---|---| -| `NEMOCLAW_PROVIDER` | Set to `custom` for an OpenAI-compatible endpoint. | -| `NEMOCLAW_ENDPOINT_URL` | Base URL of the local server. | -| `NEMOCLAW_MODEL` | Model ID as reported by the server. | -| `COMPATIBLE_API_KEY` | API key for the endpoint. Use any non-empty value if authentication is not required. | +Use **Other OpenAI-compatible endpoint** for vLLM, TensorRT-LLM, llama.cpp, LocalAI, NIM, SGLang, or another server that implements `/v1/chat/completions`. +For compatible endpoints, NemoClaw uses `/v1/chat/completions` by default because some local backends accept `/v1/responses` but drop system prompts or tool definitions. +Set `NEMOCLAW_PREFERRED_API=openai-responses` only after you have verified that the backend streams the events OpenClaw requires. -### Selecting the API Path +For the full compatible-endpoint prompt flow, non-interactive variables, API-path controls, managed vLLM profiles, NIM setup, and timeout settings, refer to [Inference Options](references/inference-options.md#setup-details-for-local-and-compatible-providers). -For the compatible-endpoint provider, `/v1/chat/completions` is the default. -NemoClaw tests streaming events during onboarding and uses chat completions -without probing the Responses API. +## Managed vLLM and NIM -To opt in to `/v1/responses`, set `NEMOCLAW_PREFERRED_API` before running onboard: - -```console -$ NEMOCLAW_PREFERRED_API=openai-responses nemoclaw onboard -``` - -The wizard then probes `/v1/responses` and only selects it when streaming -support is complete. -If the probe fails, the wizard falls back to `/v1/chat/completions` -automatically. -You can use this variable in both interactive and non-interactive mode. - -| Variable | Values | Default | -|---|---|---| -| `NEMOCLAW_PREFERRED_API` | `openai-completions`, `openai-responses` | `openai-completions` for compatible endpoints | - -If you already onboarded and the sandbox is failing at runtime, re-run -`nemoclaw onboard` to re-probe the endpoint and bake the correct API path -into the image. -Refer to Switch Inference Models (use the `nemoclaw-user-configure-inference` skill) for details. - -## Anthropic-Compatible Server - -If your local server implements the Anthropic Messages API (`/v1/messages`), choose **Other Anthropic-compatible endpoint** during onboarding instead. - -```console -$ nemoclaw onboard -``` - -For non-interactive setup, use `NEMOCLAW_PROVIDER=anthropicCompatible` and set `COMPATIBLE_ANTHROPIC_API_KEY`. - -```console -$ NEMOCLAW_PROVIDER=anthropicCompatible \ - NEMOCLAW_ENDPOINT_URL=http://localhost:8080 \ - NEMOCLAW_MODEL=my-model \ - COMPATIBLE_ANTHROPIC_API_KEY=dummy \ - nemoclaw onboard --non-interactive -``` - -## vLLM - -When vLLM is already running on `localhost:8000`, NemoClaw can detect it automatically and query the `/v1/models` endpoint to determine the loaded model. -On supported Linux hosts with NVIDIA GPUs, the onboard wizard can also install or start a managed vLLM container for you. - -For an already-running vLLM server, run `nemoclaw onboard` and select **Local vLLM [experimental]** from the provider list. - -```console -$ nemoclaw onboard -``` - -If vLLM is already running, NemoClaw detects the running model and validates the endpoint. -If vLLM is not running and your host matches a DGX Spark or DGX Station managed profile, NemoClaw shows the **Install vLLM** or **Start vLLM** entry by default. -Generic Linux NVIDIA GPU hosts still require `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm` before the managed entry appears. -NemoClaw pulls the vLLM image, downloads model weights into `~/.cache/huggingface`, starts the `nemoclaw-vllm` container on `localhost:8000`, and prints progress markers while the model loads. -The first run can take 10 to 30 minutes. -Later runs reuse the cached image and model weights. - -Managed vLLM uses these profiles: - -| Host profile | Default model | -|---|---| -| DGX Spark | `Qwen/Qwen3.6-27B-FP8` | -| DGX Station | `Qwen/Qwen3.6-27B-FP8` | -| Linux with an NVIDIA GPU | `nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8` | - -**Note:** - -NemoClaw forces the `chat/completions` API path for vLLM. -The vLLM `/v1/responses` endpoint does not run the `--tool-call-parser`, so tool calls arrive as raw text. - -### Non-Interactive Setup - -Use an already-running vLLM server: - -```console -$ NEMOCLAW_PROVIDER=vllm \ - nemoclaw onboard --non-interactive -``` - -Install or start managed vLLM when a supported profile is detected. -On DGX Spark and DGX Station, `NEMOCLAW_PROVIDER=install-vllm` is enough for non-interactive runs; add `NEMOCLAW_EXPERIMENTAL=1` on generic Linux NVIDIA GPU hosts. - -```console -$ NEMOCLAW_PROVIDER=install-vllm \ - nemoclaw onboard --non-interactive -``` - -NemoClaw records the model returned by vLLM's `/v1/models` endpoint. -Start vLLM with the model you want before onboarding if you manage the server yourself. - -### Override the Managed-vLLM Model - -Managed vLLM serves the profile default unless you select a different registry entry. -Export `NEMOCLAW_VLLM_MODEL=` before invoking the installer to choose a different model from the registry. -NemoClaw uses the matching `vllm serve` flags, including the reasoning parser, tool-call parser, and `--max-model-len`. -Recognised slugs: - -| Slug | Hugging Face model | Notes | -|---|---|---| -| `qwen3.6-27b` | `Qwen/Qwen3.6-27B-FP8` | Default on DGX Spark and DGX Station profiles | -| `nemotron-3-nano-4b` | `nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8` | Default on the generic Linux + NVIDIA GPU profile | -| `deepseek-r1-distill-70b` | `deepseek-ai/DeepSeek-R1-Distill-Llama-70B` | Gated. Requires Hugging Face license acceptance | - -The slug is case-insensitive; the full Hugging Face id is also accepted. -An unrecognised value fails fast with a list of valid slugs. - -Gated models require a Hugging Face token; export it before onboarding so NemoClaw can forward it into the managed vLLM container: - -```console -$ export HF_TOKEN= -$ NEMOCLAW_PROVIDER=install-vllm \ - NEMOCLAW_VLLM_MODEL=deepseek-r1-distill-70b \ - nemoclaw onboard --non-interactive -``` - -`HUGGING_FACE_HUB_TOKEN` is accepted as an alternative. -The token check runs on the host before any docker pull, so a missing or empty token aborts onboarding before bandwidth is spent on a 401. - -## NVIDIA NIM (Experimental) - -NemoClaw can pull, start, and manage a NIM container on hosts with a NIM-capable NVIDIA GPU. - -Set the experimental flag and run onboard. - -```console -$ NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard -``` - -Select **Local NVIDIA NIM [experimental]** from the provider list. -NemoClaw filters available models by GPU VRAM, pulls the NIM container image, starts it, and waits for it to become healthy before continuing. -On hosts with mixed NVIDIA GPU models, the preflight summary shows each detected GPU model and the total VRAM so you can confirm which device class the model selection used. - -NIM container images are hosted on `nvcr.io` and require NGC registry authentication before `docker pull` succeeds. -If Docker is not already logged in to `nvcr.io`, onboard prompts for an [NGC API key](https://org.ngc.nvidia.com/setup/api-key) and runs `docker login nvcr.io` over `--password-stdin` so the key is never written to disk or shell history. -The prompt masks the key during input and retries once on a bad key before failing. -In non-interactive mode, onboard exits with login instructions if Docker is not already authenticated; run `docker login nvcr.io` yourself, then re-run `nemoclaw onboard --non-interactive`. -If `NGC_API_KEY` or `NVIDIA_API_KEY` is already exported, NemoClaw passes it into the managed NIM container through the process environment instead of command-line arguments. -If the NIM container exits before the health endpoint becomes ready, onboarding stops early and prints the last container log lines. - -**Note:** - -NIM uses vLLM internally. -The same `chat/completions` API path restriction applies. - -### Non-Interactive Setup - -```console -$ NEMOCLAW_EXPERIMENTAL=1 \ - NEMOCLAW_PROVIDER=nim \ - nemoclaw onboard --non-interactive -``` - -To select a specific model, set `NEMOCLAW_MODEL`. - -## Timeout Configuration - -Local inference requests use a default timeout of 180 seconds. -Large prompts on hardware such as DGX Spark can exceed shorter timeouts, so NemoClaw sets a higher default for Ollama, vLLM, NIM, and compatible-endpoint setup. - -To override the timeout, set the `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` environment variable before onboarding: - -```console -$ export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300 -$ nemoclaw onboard -``` - -The value is in seconds. -This setting is baked into the sandbox at build time. -Changing it after onboarding requires re-running `nemoclaw onboard`. - -`NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` only governs the inference-server validation probe. -The post-create readiness wait (image build, gateway upload, in-sandbox boot) has its own budget, `NEMOCLAW_SANDBOX_READY_TIMEOUT`, also defaulting to 180 seconds. -On hosts where the sandbox image takes minutes to build or upload — large quantised models, DGX Station first runs, or remote VMs over a slow link — raise both together: - -```console -$ export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300 -$ export NEMOCLAW_SANDBOX_READY_TIMEOUT=600 -$ nemoclaw onboard -``` +NemoClaw can use an already-running vLLM server on `localhost:8000`, start managed vLLM on supported NVIDIA GPU hosts, or manage a local NIM container when `NEMOCLAW_EXPERIMENTAL=1` is set. +Managed vLLM records the model returned by `/v1/models` and uses runtime metadata such as `max_model_len` when available. +NIM uses the same chat-completions API path restriction as vLLM. -If onboard ends with `Sandbox '' was created but did not become ready within 180s`, refer to Troubleshooting (use the `nemoclaw-user-reference` skill). +For registry slugs, Hugging Face token requirements, NGC login behavior, and non-interactive examples, refer to [Inference Options](references/inference-options.md#setup-details-for-local-and-compatible-providers). ## Verify the Configuration After onboarding completes, confirm the active provider and model. -```console -$ nemoclaw status +```bash +nemoclaw status ``` The output shows the provider label (for example, "Local vLLM" or "Other OpenAI-compatible endpoint") and the active model. @@ -430,12 +218,12 @@ If `Inference` is healthy but `Inference (auth proxy)` is not, rerun onboarding ## Switch Models at Runtime You can change the model without re-running onboard. -Refer to Switch Inference Models (use the `nemoclaw-user-configure-inference` skill) for the full procedure. +Refer to [Switch Inference Models](references/switch-inference-providers.md) for the full procedure. For compatible endpoints, the command is: -```console -$ nemoclaw inference set --provider compatible-endpoint --model +```bash +nemoclaw inference set --provider compatible-endpoint --model ``` If the provider itself needs to change (for example, switching from vLLM to a cloud API), pass the new provider to `nemoclaw inference set`. @@ -444,9 +232,12 @@ If the provider itself needs to change (for example, switching from vLLM to a cl - **Load [references/switch-inference-providers.md](references/switch-inference-providers.md)** when switching inference providers, changing the model runtime, or reconfiguring inference routing. Changes the active inference model without restarting the sandbox. - **Load [references/set-up-sub-agent.md](references/set-up-sub-agent.md)** when users ask how to add a second model, configure a sub-agent model, use Omni for vision tasks, configure agents.list, or use sessions_spawn in NemoClaw. Shows the NemoClaw-specific file paths and update flow for adding an auxiliary OpenClaw sub-agent model. -- **[references/tool-calling-reliability.md](references/tool-calling-reliability.md)** — Explains Ollama tool-call leak symptoms, when vLLM with a tool-call parser is recommended, and how to repoint NemoClaw to a parser-aware local endpoint. - **Load [references/inference-options.md](references/inference-options.md)** when explaining which providers are available, what the onboard wizard presents, or how inference routing works. Lists all inference providers offered during NemoClaw onboarding. +- **[references/tool-calling-reliability.md](references/tool-calling-reliability.md)** — Explains Ollama tool-call leak symptoms, when to use vLLM with a tool-call parser, and how to repoint NemoClaw to a parser-aware local endpoint. ## Related Skills +- [Inference Options](references/inference-options.md) for the full list of providers available during onboarding. +- [Tool-Calling Reliability](references/tool-calling-reliability.md) for diagnosing raw JSON tool-call output with local models. +- [Switch Inference Models](references/switch-inference-providers.md) for runtime model switching. - `nemoclaw-user-get-started` — Quickstart (use the `nemoclaw-user-get-started` skill) for first-time installation diff --git a/.agents/skills/nemoclaw-user-configure-inference/evals/evals.json b/.agents/skills/nemoclaw-user-configure-inference/evals/evals.json new file mode 100644 index 0000000000..44f8cca76b --- /dev/null +++ b/.agents/skills/nemoclaw-user-configure-inference/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-inference-inference-options-001", + "question": "I'm choosing an inference option during onboarding. Help me compare hosted providers, local servers, and compatible endpoints so I can select a model path that fits my privacy, cost, and reliability needs.", + "expected_skill": "nemoclaw-user-configure-inference", + "ground_truth": "A NemoClaw-specific answer that helps the user compare hosted providers, local servers, and compatible endpoints and gives enough concrete guidance, decision criteria, verification steps, or risk framing to select a model path that fits my privacy, cost, and reliability needs.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-configure-inference/references/inference-options.md b/.agents/skills/nemoclaw-user-configure-inference/references/inference-options.md index dc5441bdf5..56ede783ed 100644 --- a/.agents/skills/nemoclaw-user-configure-inference/references/inference-options.md +++ b/.agents/skills/nemoclaw-user-configure-inference/references/inference-options.md @@ -1,10 +1,20 @@ - - # NemoClaw Inference Options +import { AgentOnly } from "../_components/AgentGuide"; + NemoClaw supports multiple inference providers. -During onboarding, the `nemoclaw onboard` wizard presents a numbered list of providers to choose from. -Your selection determines where the agent's inference traffic is routed. +During onboarding, the NemoClaw onboarding wizard presents a numbered list of providers to choose from. +Your selection determines where NemoClaw routes the agent's inference traffic. + + +For OpenClaw onboarding, use `nemoclaw onboard`. +The provider flow is the same, with the NVIDIA Endpoints route available for OpenClaw Agent. + + + +For Hermes onboarding, use `nemoclaw onboard`. +The provider flow is the same, with the Hermes Provider route available for Hermes Agent. + ## How Inference Routing Works @@ -37,7 +47,7 @@ NemoClaw uses provider-specific local tokens for those routes, and rebuilds of l The onboard wizard presents the following provider options by default. The first six are always available. -Ollama appears when it is installed or running on the host. +Ollama appears when you have installed or started it on the host. Local vLLM appears when NemoClaw detects a running vLLM server. The managed install/start vLLM entry appears by default on DGX Spark and DGX Station, and appears on generic Linux NVIDIA GPU hosts after opt-in. @@ -50,14 +60,14 @@ The managed install/start vLLM entry appears by default on DGX Spark and DGX Sta | Other Anthropic-compatible endpoint | Routes to any server that implements the Anthropic Messages API (`/v1/messages`). The wizard prompts for a base URL and model name. Set `COMPATIBLE_ANTHROPIC_API_KEY`. | You provide the model name. | | Google Gemini | Routes to Google's OpenAI-compatible chat-completions endpoint. NemoClaw skips the Responses-API probe because Gemini does not support `/v1/responses`. Set `GEMINI_API_KEY`. | `gemini-3.1-pro-preview`, `gemini-3.1-flash-lite-preview`, `gemini-3-flash-preview`, `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-2.5-flash-lite` | | Hermes Provider | Routes Hermes Agent through the host OpenShell provider registered by NemoClaw when onboarding Hermes Agent. | Curated Hermes Provider models such as `moonshotai/kimi-k2.6`, `openai/gpt-5.4-mini`, and `z-ai/glm-5.1`. | -| Local Ollama | Routes to a local Ollama instance on `localhost:11434`. NemoClaw detects installed models, offers starter models if none are present, pulls and warms the selected model, and validates it. | Selected during onboarding. For more information, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). | +| Local Ollama | Routes to a local Ollama instance on `localhost:11434`. NemoClaw detects installed models, offers starter models if none are present, pulls and warms the selected model, and validates it. | Selected during onboarding. For more information, refer to [Use a Local Inference Server](../SKILL.md). | | Model Router | Starts a host-side router on port `4000`, registers it as an OpenAI-compatible provider, and keeps the sandbox pointed at `inference.local`. Set `NEMOCLAW_PROVIDER=routed` for non-interactive setup. | The router pool defines the model names. | ## Choosing the Right Option for Nemotron NVIDIA Nemotron models expose OpenAI-compatible APIs across every supported deployment surface, so two onboarding options can route to Nemotron. -| Where Nemotron is hosted | Onboard wizard option | Why | +| Nemotron Host | Onboard Wizard Option | Why | |---|---|---| | `build.nvidia.com` (NVIDIA-hosted) | **Option 1: NVIDIA Endpoints** | NemoClaw sets the base URL to `https://integrate.api.nvidia.com/v1` for you and validates the model against the build catalog. | | Self-hosted NIM container | **Option 3: Other OpenAI-compatible endpoint** | NIM exposes an OpenAI-compatible `/v1/chat/completions` route. Point the base URL at your NIM service and enter the Nemotron model ID. | @@ -74,14 +84,53 @@ When you select it, NemoClaw starts the router proxy on the host, waits for its The sandbox does not call the router port directly. The router model pool lives in `nemoclaw-blueprint/router/pool-config.yaml`. +Edit that file to define which models the router can choose from. The default pool routes between NVIDIA-hosted Nemotron models and uses the `tolerance` value to choose the lowest-cost model whose predicted quality stays within the configured threshold. + +```yaml +routing: + method: prefill + checkpoint: llm-router/checkpoints/prefill_router_qwen08b.pt + tolerance: 0.20 + encoder: Qwen/Qwen3.5-0.8B + +models: + - name: nano + litellm_model: "openai/nvidia/nvidia/Nemotron-3-Nano-30B-A3B" + cost_per_m_input_tokens: 0.05 + api_base: "https://inference-api.nvidia.com" + + - name: super + litellm_model: "openai/nvidia/nvidia/nemotron-3-super-v3" + cost_per_m_input_tokens: 0.10 + api_base: "https://inference-api.nvidia.com" +``` + +The `tolerance` parameter controls the accuracy-cost tradeoff. + +| Value | Behavior | +|-------|----------| +| `0.0` | Always pick the most accurate model. | +| `0.20` | Allow up to 20 percentage points below the best for a cheaper model (default). | +| `1.0` | Always pick the cheapest model. | + +The router runs on the host, not inside the sandbox. + +```text +Sandbox (agent) ──> OpenShell Gateway (L7 proxy) ──> Model Router (:4000) ──> NVIDIA API + └── PrefillRouter selects model +``` + +Credentials flow through the OpenShell provider system. +The sandbox never sees raw API keys. + To use the router in scripted setup, set: -```console -$ NEMOCLAW_PROVIDER=routed NVIDIA_API_KEY= nemoclaw onboard --non-interactive +```bash +NEMOCLAW_PROVIDER=routed NVIDIA_API_KEY= nemoclaw onboard --non-interactive ``` -### Host Python requirement +### Host Python Requirement The Model Router runs in a host-side virtual environment that NemoClaw creates during onboarding. NemoClaw probes `python3.13`, `python3.12`, `python3.11`, `python3.10`, and bare `python3`, and adopts the first interpreter that satisfies both of: @@ -94,18 +143,19 @@ This surfaces issues like Homebrew `python@3.14` whose `pyexpat` extension fails To pin a specific interpreter, set `NEMOCLAW_MODEL_ROUTER_PYTHON` to its absolute path before running `nemoclaw onboard`: -```console -$ NEMOCLAW_MODEL_ROUTER_PYTHON=/opt/homebrew/bin/python3.12 nemoclaw onboard +```bash +NEMOCLAW_MODEL_ROUTER_PYTHON=/opt/homebrew/bin/python3.12 nemoclaw onboard ``` The pin is strict. NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to a different python on `PATH`. -Relative command names such as `python3.12` are rejected; use `command -v python3.12` to find the absolute path. +NemoClaw rejects relative command names such as `python3.12`. +Use `command -v python3.12` to find the absolute path. If `python -m venv` itself fails for a probe-clean interpreter (for example, a corrupt ensurepip seed), NemoClaw retries with the next healthy candidate when no pin is set; with a pin set, the failure stops onboarding so you can fix or repoint the pinned python. ## Caveated Local Options -The following local inference options are caveated. +The following local inference options have caveats. Local NIM and generic Linux managed vLLM install/start require `NEMOCLAW_EXPERIMENTAL=1`; DGX Spark and DGX Station managed vLLM entries appear by default. An already-running vLLM server appears directly in the onboarding selection list. @@ -114,29 +164,267 @@ An already-running vLLM server appears directly in the onboarding selection list | Local NVIDIA NIM | NIM-capable GPU detected | Pulls and manages a NIM container. | | Local vLLM | vLLM running on `localhost:8000`, or a supported DGX Spark, DGX Station, or Linux NVIDIA GPU profile | Auto-detects the loaded model when vLLM is already running. Can install or start a managed vLLM container by default on DGX Spark/Station and after opt-in on generic Linux NVIDIA GPU hosts. | -For setup instructions, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). +For setup instructions, refer to [Use a Local Inference Server](../SKILL.md). ## Validation NemoClaw validates the selected provider and model before creating the sandbox. If credential validation fails, the wizard asks whether to re-enter the API key, choose a different provider, retry, or exit. -Transient upstream validation failures are retried before the wizard reports a provider failure. +The wizard retries transient upstream validation failures before it reports a provider failure. The `nvapi-` prefix check applies only to `NVIDIA_API_KEY`. Other provider credentials, such as `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, and compatible endpoint keys, use provider-aware validation during retry. | Provider type | Validation method | |---|---| | OpenAI | Tries `/responses` first, then `/chat/completions`. | -| NVIDIA Endpoints | Validates via `/v1/chat/completions` only; the `/v1/responses` probe is skipped because NVIDIA Build does not expose `/v1/responses` (returns 404 for every model). | -| Google Gemini | Validates via Gemini's OpenAI-compatible chat-completions path only; the `/v1/responses` probe is skipped because Gemini does not support the Responses API. | +| NVIDIA Endpoints | Validates through `/v1/chat/completions` only; NemoClaw skips the `/v1/responses` probe because NVIDIA Build does not expose `/v1/responses` (returns 404 for every model). | +| Google Gemini | Validates through Gemini's OpenAI-compatible chat-completions path only; NemoClaw skips the `/v1/responses` probe because Gemini does not support the Responses API. | | Other OpenAI-compatible endpoint | Tries `/v1/responses` first with a tool-calling probe; falls back to `/v1/chat/completions`. Selected runtime API defaults to `/v1/chat/completions`; set `NEMOCLAW_PREFERRED_API=openai-responses` to allow `/v1/responses` at runtime when validation succeeds. | | Anthropic-compatible | Tries `/v1/messages`. | | NVIDIA Endpoints (manual model entry) | Validates the model name against the catalog API. | | Compatible endpoints | Sends a real inference request because many proxies do not expose a `/models` endpoint. For OpenAI-compatible endpoints, the probe tries `/v1/responses` first then falls back to `/v1/chat/completions`; the selected runtime API defaults to `/v1/chat/completions`. Set `NEMOCLAW_PREFERRED_API=openai-responses` to allow `/v1/responses` at runtime when validation succeeds. | -| Local NVIDIA NIM | Validates via `/v1/chat/completions` only; the `/v1/responses` probe is skipped (same as NVIDIA Endpoints). | +| Local NVIDIA NIM | Validates through `/v1/chat/completions` only; NemoClaw skips the `/v1/responses` probe (same as NVIDIA Endpoints). | + +## Setup Details for Local and Compatible Providers + +The sections below collect the detailed setup prompts and environment variables for local and compatible inference providers. +Use them when the quickstart or local inference guide points you here for exact command shapes. + +## OpenAI-Compatible Server + +This option works with any server that implements `/v1/chat/completions`, including vLLM, TensorRT-LLM, llama.cpp, LocalAI, and others. +For compatible endpoints, NemoClaw uses `/v1/chat/completions` by default. +This avoids a class of failures where local backends accept `/v1/responses` requests but silently drop the system prompt and tool definitions. +To opt in to `/v1/responses`, set `NEMOCLAW_PREFERRED_API=openai-responses` before running onboard. + +Start your model server. +The examples below use vLLM, but any OpenAI-compatible server works. + +```bash +vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000 +``` + +Run the onboard wizard. + +```bash +nemoclaw onboard +``` + +When the wizard asks you to choose an inference provider, select **Other OpenAI-compatible endpoint**. +Enter the base URL of your local server, for example `http://localhost:8000/v1`. + +The wizard prompts for an API key. +If your server does not require authentication, enter any non-empty string (for example, `dummy`). + +NemoClaw validates the endpoint by sending a test inference request before continuing. +The wizard probes `/v1/chat/completions` by default for the compatible-endpoint provider. +If you set `NEMOCLAW_PREFERRED_API=openai-responses`, NemoClaw probes `/v1/responses` instead and only selects it when the response includes the streaming events OpenClaw requires. +If a reasoning model returns only reasoning content before producing a final answer, NemoClaw retries the smoke request with a larger response budget. +Route, configuration, and authentication failures still fail immediately. + +### Non-Interactive Setup + +Set the following environment variables for scripted or CI/CD deployments. + +```bash +NEMOCLAW_PROVIDER=custom \ + NEMOCLAW_ENDPOINT_URL=http://localhost:8000/v1 \ + NEMOCLAW_MODEL=meta-llama/Llama-3.1-8B-Instruct \ + COMPATIBLE_API_KEY=dummy \ + nemoclaw onboard --non-interactive +``` + +| Variable | Purpose | +|---|---| +| `NEMOCLAW_PROVIDER` | Set to `custom` for an OpenAI-compatible endpoint. | +| `NEMOCLAW_ENDPOINT_URL` | Base URL of the local server. | +| `NEMOCLAW_MODEL` | Model ID as reported by the server. | +| `COMPATIBLE_API_KEY` | API key for the endpoint. Use any non-empty value if authentication is not required. | + +### Selecting the API Path + +For the compatible-endpoint provider, `/v1/chat/completions` is the default. +NemoClaw tests streaming events during onboarding and uses chat completions +without probing the Responses API. + +To opt in to `/v1/responses`, set `NEMOCLAW_PREFERRED_API` before running onboard: + +```bash +NEMOCLAW_PREFERRED_API=openai-responses nemoclaw onboard +``` + +The wizard then probes `/v1/responses` and only selects it when streaming +support is complete. +If the probe fails, the wizard falls back to `/v1/chat/completions` +automatically. +You can use this variable in both interactive and non-interactive mode. + +| Variable | Values | Default | +|---|---|---| +| `NEMOCLAW_PREFERRED_API` | `openai-completions`, `openai-responses` | `openai-completions` for compatible endpoints | + +If you already onboarded and the sandbox is failing at runtime, re-run `nemoclaw onboard` to re-probe the endpoint and bake the correct API path +into the image. +Refer to [Switch Inference Models](switch-inference-providers.md) for more information. + +## Anthropic-Compatible Server + +If your local server implements the Anthropic Messages API (`/v1/messages`), choose **Other Anthropic-compatible endpoint** during onboarding instead. + +```bash +nemoclaw onboard +``` + +For non-interactive setup, use `NEMOCLAW_PROVIDER=anthropicCompatible` and set `COMPATIBLE_ANTHROPIC_API_KEY`. + +```bash +NEMOCLAW_PROVIDER=anthropicCompatible \ + NEMOCLAW_ENDPOINT_URL=http://localhost:8080 \ + NEMOCLAW_MODEL=my-model \ + COMPATIBLE_ANTHROPIC_API_KEY=dummy \ + nemoclaw onboard --non-interactive +``` + +## vLLM + +When vLLM is already running on `localhost:8000`, NemoClaw can detect it automatically and query the `/v1/models` endpoint to determine the loaded model. +On supported Linux hosts with NVIDIA GPUs, the onboard wizard can also install or start a managed vLLM container for you. + +For an already-running vLLM server, run `nemoclaw onboard` and select **Local vLLM [experimental]** from the provider list. + +If vLLM is already running, NemoClaw detects the running model and validates the endpoint. +When vLLM exposes runtime metadata such as `max_model_len`, NemoClaw uses that value for the `contextWindow` baked into `openclaw.json` unless you set `NEMOCLAW_CONTEXT_WINDOW` yourself. +If vLLM is not running and your host matches a DGX Spark or DGX Station managed profile, NemoClaw shows the **Install vLLM** or **Start vLLM** entry by default. +Generic Linux NVIDIA GPU hosts still require `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm` before the managed entry appears. +NemoClaw pulls the vLLM image, downloads model weights into `~/.cache/huggingface`, starts the `nemoclaw-vllm` container on `localhost:8000`, streams Hugging Face download progress, and polls `/v1/models` until the model is ready. +If Docker pull output stops making progress, a watchdog stops the stalled pull instead of failing slow but active downloads on a fixed wall-clock timeout. +If vLLM never becomes ready, NemoClaw prints a short tail of the vLLM container logs before exiting. +The first run can take 10 to 30 minutes. +Later runs reuse the cached image and model weights. + +Managed vLLM uses these profiles: + +| Host profile | Default model | +|---|---| +| DGX Spark | `nvidia/Qwen3.6-35B-A3B-NVFP4` | +| DGX Station | `Qwen/Qwen3.6-27B-FP8` | +| Linux with an NVIDIA GPU | `nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8` | + +**Note:** + +NemoClaw forces the `chat/completions` API path for vLLM. +The vLLM `/v1/responses` endpoint does not run the `--tool-call-parser`, so tool calls arrive as raw text. + +### Non-Interactive Setup + +Use an already-running vLLM server: + +```bash +NEMOCLAW_PROVIDER=vllm \ + nemoclaw onboard --non-interactive +``` + +Install or start managed vLLM when NemoClaw detects a supported profile. +On DGX Spark and DGX Station, `NEMOCLAW_PROVIDER=install-vllm` is enough for non-interactive runs; add `NEMOCLAW_EXPERIMENTAL=1` on generic Linux NVIDIA GPU hosts. + +```bash +NEMOCLAW_PROVIDER=install-vllm \ + nemoclaw onboard --non-interactive +``` + +NemoClaw records the model returned by vLLM's `/v1/models` endpoint. +Start vLLM with the model you want before onboarding if you manage the server yourself. + +### Override the Managed-vLLM Model + +Managed vLLM serves the profile default unless you select a different registry entry. +Export `NEMOCLAW_VLLM_MODEL=` before invoking the installer to choose a different model from the registry. +NemoClaw uses the matching `vllm serve` flags, including the reasoning parser, tool-call parser, and `--max-model-len`. +Recognized slugs are: + +| Slug | Hugging Face model | Notes | +|---|---|---| +| `qwen3.6-27b` | `Qwen/Qwen3.6-27B-FP8` | Default on the DGX Station profile | +| `qwen3.6-35b-a3b-nvfp4` | `nvidia/Qwen3.6-35B-A3B-NVFP4` | Default on the DGX Spark profile | +| `nemotron-3-nano-4b` | `nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8` | Default on the generic Linux + NVIDIA GPU profile | +| `deepseek-r1-distill-70b` | `deepseek-ai/DeepSeek-R1-Distill-Llama-70B` | Gated. Requires Hugging Face license acceptance | + +The slug is case-insensitive; the full Hugging Face id is also accepted. +An unrecognized value fails fast with a list of valid slugs. + +Gated models require a Hugging Face token; export it before onboarding so NemoClaw can forward it into the managed vLLM container: + +```bash +export HF_TOKEN= +NEMOCLAW_PROVIDER=install-vllm \ + NEMOCLAW_VLLM_MODEL=deepseek-r1-distill-70b \ + nemoclaw onboard --non-interactive +``` + +NemoClaw accepts `HUGGING_FACE_HUB_TOKEN` as an alternative. +The token check runs on the host before any docker pull, so a missing or empty token aborts onboarding before bandwidth is spent on a 401. + +## NVIDIA NIM (Experimental) + +NemoClaw can pull, start, and manage a NIM container on hosts with a NIM-capable NVIDIA GPU. + +Set the experimental flag and run onboard. + +```bash +NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard +``` + +Select **Local NVIDIA NIM [experimental]** from the provider list. +NemoClaw filters available models by GPU VRAM, pulls the NIM container image, starts it, and waits for it to become healthy before continuing. +On hosts with mixed NVIDIA GPU models, the preflight summary shows each detected GPU model and the total VRAM so you can confirm which device class the model selection used. + +NVIDIA hosts NIM container images on `nvcr.io`, and `docker pull` requires NGC registry authentication. +If Docker is not already logged in to `nvcr.io`, onboard prompts for an [NGC API key](https://org.ngc.nvidia.com/setup/api-key) and runs `docker login nvcr.io` over `--password-stdin` so the key is never written to disk or shell history. +The prompt masks the key during input and retries one time on a bad key before failing. +In non-interactive mode, onboard exits with login instructions if Docker is not already authenticated; run `docker login nvcr.io` yourself, then re-run `nemoclaw onboard --non-interactive`. +If `NGC_API_KEY` or `NVIDIA_API_KEY` is already exported, NemoClaw passes it into the managed NIM container through the process environment instead of command-line arguments. +If the NIM container exits before the health endpoint becomes ready, onboarding stops early and prints the last container log lines. + +**Note:** + +NIM uses vLLM internally. +The same `chat/completions` API path restriction applies. + +## Timeout Configuration + +Local inference requests use a default timeout of 180 seconds. +Large prompts on hardware such as DGX Spark can exceed shorter timeouts, so NemoClaw sets a higher default for Ollama, vLLM, NIM, and compatible-endpoint setup. + +To override the timeout, set the `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` environment variable before onboarding: + +```bash +export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300 +nemoclaw onboard +``` + +The value is in seconds. +NemoClaw bakes this setting into the sandbox at build time. +Changing it after onboarding requires re-running `nemoclaw onboard`. + +`NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` only governs the inference-server validation probe. +During local Ollama setup, NemoClaw treats host-side curl process timeouts as retryable probe failures and retries with a larger timeout before it reports a validation failure. +NemoClaw also retries Docker runtime detection with a longer `docker info` timeout before it chooses the local inference route. +The post-create readiness wait (image build, gateway upload, in-sandbox boot) has its own budget, `NEMOCLAW_SANDBOX_READY_TIMEOUT`, also defaulting to 180 seconds. +On hosts where the sandbox image takes minutes to build or upload, raise both settings together. +Examples include large quantized models, DGX Station first runs, and remote VMs over a slow link. + +```bash +export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300 +export NEMOCLAW_SANDBOX_READY_TIMEOUT=600 +nemoclaw onboard +``` + +If onboard ends with `Sandbox '' was created but did not become ready within 180s`, refer to Troubleshooting (use the `nemoclaw-user-reference` skill). ## Next Steps -- Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill) for Ollama, vLLM, NIM, and compatible-endpoint setup details. -- Tool-Calling Reliability (use the `nemoclaw-user-configure-inference` skill) for deciding when Ollama is enough and when vLLM with a parser is safer. -- Switch Inference Models (use the `nemoclaw-user-configure-inference` skill) for changing the model at runtime without re-onboarding. +- [Use a Local Inference Server](../SKILL.md) for Ollama, vLLM, NIM, and compatible-endpoint setup details. + +- [Tool-Calling Reliability](tool-calling-reliability.md) for deciding when Ollama is enough and when vLLM with a parser is safer. + +- [Switch Inference Models](switch-inference-providers.md) for changing the model at runtime without re-onboarding. diff --git a/.agents/skills/nemoclaw-user-configure-inference/references/set-up-sub-agent.md b/.agents/skills/nemoclaw-user-configure-inference/references/set-up-sub-agent.md index c88799775a..48cf9b38c4 100644 --- a/.agents/skills/nemoclaw-user-configure-inference/references/set-up-sub-agent.md +++ b/.agents/skills/nemoclaw-user-configure-inference/references/set-up-sub-agent.md @@ -1,5 +1,3 @@ - - # Set Up Task-Specific Sub-Agents OpenClaw documents the sub-agent behavior, `sessions_spawn` tool, `agents.list` configuration, tool policy, nesting, and auth model in [Sub-Agents](https://docs.openclaw.ai/tools/subagents). @@ -37,17 +35,17 @@ It keeps the primary `main` agent on the normal NemoClaw inference route and add | Sub-agent model | `nvidia-omni/private/nvidia/nemotron-3-nano-omni-reasoning-30b-a3b` | | Delegation tool | `sessions_spawn` | -Omni is used as the specialist model for image tasks. +The sub-agent uses Omni as the specialist model for image tasks. The primary orchestration model remains responsible for conversation, planning, and deciding when to delegate. ## Update the Sandbox Config Fetch the current OpenClaw config from the sandbox, patch it with your auxiliary provider and `agents.list` changes, then upload it back. -```console -$ export SANDBOX=my-assistant -$ export DOCKER_CTR=openshell-cluster-nemoclaw -$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- cat /sandbox/.openclaw/openclaw.json > /tmp/openclaw.json +```bash +export SANDBOX=my-assistant +export DOCKER_CTR=openshell-cluster-nemoclaw +docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- cat /sandbox/.openclaw/openclaw.json > /tmp/openclaw.json ``` Create `/tmp/openclaw.updated.json` with the OpenClaw sub-agent config. @@ -56,13 +54,13 @@ For the Omni example, the demo provides `vlm-demo/vlm-subagent/openclaw-patch.py Upload the patched config and refresh the hash. In the default mutable state, this keeps the local hash consistent but does not make it tamper-proof; lock the config root-owned and read-only afterward if the sandbox should enforce config integrity at startup. -```console -$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 644 /sandbox/.openclaw/openclaw.json -$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 644 /sandbox/.openclaw/.config-hash -$ cat /tmp/openclaw.updated.json | docker exec -i "$DOCKER_CTR" kubectl exec -i -n openshell "$SANDBOX" -c agent -- sh -c 'cat > /sandbox/.openclaw/openclaw.json' -$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- /bin/bash -c "cd /sandbox/.openclaw && sha256sum openclaw.json > .config-hash" -$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 444 /sandbox/.openclaw/openclaw.json -$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 444 /sandbox/.openclaw/.config-hash +```bash +docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 644 /sandbox/.openclaw/openclaw.json +docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 644 /sandbox/.openclaw/.config-hash +cat /tmp/openclaw.updated.json | docker exec -i "$DOCKER_CTR" kubectl exec -i -n openshell "$SANDBOX" -c agent -- sh -c 'cat > /sandbox/.openclaw/openclaw.json' +docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- /bin/bash -c "cd /sandbox/.openclaw && sha256sum openclaw.json > .config-hash" +docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 444 /sandbox/.openclaw/openclaw.json +docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 444 /sandbox/.openclaw/.config-hash ``` Check `/tmp/gateway.log` after upload and confirm the gateway hot-reloaded the provider or `agents.list` change. @@ -77,10 +75,10 @@ For the Omni example: ``` Use the same provider ID that appears in `models.providers`, such as `nvidia-omni`. -After uploading the auth profile, make sure the sub-agent directory is owned by the sandbox user: +After uploading the auth profile, make sure the sandbox user owns the sub-agent directory: -```console -$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chown -R sandbox:sandbox /sandbox/.openclaw/agents/vision-operator +```bash +docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chown -R sandbox:sandbox /sandbox/.openclaw/agents/vision-operator ``` ## Allow Auxiliary Provider Egress @@ -116,5 +114,5 @@ Use the [`vlm-demo`](https://github.com/brevdev/nemoclaw-demos/tree/main/vlm-dem Use the following resources for more information: - Refer to [OpenClaw Sub-Agents](https://docs.openclaw.ai/tools/subagents) for `sessions_spawn`, `agents.list`, nesting, tool policy, and auth behavior. -- Refer to Switch Inference Providers (use the `nemoclaw-user-configure-inference` skill) to change the primary orchestration model instead of adding a sub-agent model. +- Refer to [Switch Inference Providers](switch-inference-providers.md) to change the primary orchestration model instead of adding a sub-agent model. - Refer to Workspace Files (use the `nemoclaw-user-manage-sandboxes` skill) to understand per-agent workspace directories. diff --git a/.agents/skills/nemoclaw-user-configure-inference/references/switch-inference-providers.md b/.agents/skills/nemoclaw-user-configure-inference/references/switch-inference-providers.md index f02a9e7794..92089e996c 100644 --- a/.agents/skills/nemoclaw-user-configure-inference/references/switch-inference-providers.md +++ b/.agents/skills/nemoclaw-user-configure-inference/references/switch-inference-providers.md @@ -1,9 +1,9 @@ - - # Switch Inference Models at Runtime +import { AgentOnly } from "../_components/AgentGuide"; + Change the active inference model while the sandbox is running. -No restart is required. +You do not need to restart the sandbox. ## Prerequisites @@ -12,100 +12,122 @@ No restart is required. ## Switch to a Different Model + Use `nemoclaw inference set` with the provider and model that match the upstream you want to use. The command updates the OpenShell inference route and synchronizes the running agent config. For OpenClaw, it updates `agents.defaults.model.primary` and the matching provider namespace. + + +Use `nemoclaw inference set` with the provider and model that match the upstream you want to use. +The command updates the OpenShell inference route and synchronizes the running agent config. For Hermes, it updates `/sandbox/.hermes/config.yaml` (`model.default`, `model.base_url`, and `model.provider: custom`) without rebuilding or restarting Hermes. +Pass `--sandbox ` when you do not want to use the default registered sandbox. +Under `nemoclaw`, pass `--sandbox ` when you have registered more than one Hermes sandbox. + + Pass `--sandbox ` when you do not want to use the default registered sandbox. -Under `nemohermes`, pass `--sandbox ` when more than one Hermes sandbox is registered. + ### NVIDIA Endpoints -```console -$ nemoclaw inference set --provider nvidia-prod --model nvidia/nemotron-3-super-120b-a12b +```bash +nemoclaw inference set --provider nvidia-prod --model nvidia/nemotron-3-super-120b-a12b ``` ### OpenAI -```console -$ nemoclaw inference set --provider openai-api --model gpt-5.4 +```bash +nemoclaw inference set --provider openai-api --model gpt-5.4 ``` ### Anthropic -```console -$ nemoclaw inference set --provider anthropic-prod --model claude-sonnet-4-6 +```bash +nemoclaw inference set --provider anthropic-prod --model claude-sonnet-4-6 ``` ### Google Gemini -```console -$ nemoclaw inference set --provider gemini-api --model gemini-2.5-flash +```bash +nemoclaw inference set --provider gemini-api --model gemini-2.5-flash ``` ### Compatible Endpoints If you onboarded a custom compatible endpoint, switch models with the provider created for that endpoint: -```console -$ nemoclaw inference set --provider compatible-endpoint --model +```bash +nemoclaw inference set --provider compatible-endpoint --model ``` -```console -$ nemoclaw inference set --provider compatible-anthropic-endpoint --model +```bash +nemoclaw inference set --provider compatible-anthropic-endpoint --model ``` + + ### Hermes Provider For a NemoClaw-managed Hermes sandbox, use the Hermes alias with the registered Hermes Provider route: -```console -$ nemohermes inference set --provider hermes-provider --model openai/gpt-5.4-mini +```bash +nemoclaw inference set --provider hermes-provider --model openai/gpt-5.4-mini ``` + + #### Switching from Responses API to Chat Completions -If onboarding selected `/v1/responses` but the agent fails at runtime (for -example, because the backend does not emit the streaming events OpenClaw -requires), re-run onboarding so the wizard re-probes the endpoint and bakes -the correct API path into the image: +If onboarding selected `/v1/responses` but the agent fails at runtime, re-run onboarding so the wizard re-probes the endpoint and bakes the correct API path into the image. +This can happen when the backend does not emit the streaming events OpenClaw requires. -```console -$ nemoclaw onboard +```bash +nemoclaw onboard ``` Select the same provider and endpoint again. -The updated streaming probe will detect incomplete `/v1/responses` support -and select `/v1/chat/completions` automatically. +The updated streaming probe detects incomplete `/v1/responses` support and selects `/v1/chat/completions` automatically. -For the compatible-endpoint provider, NemoClaw uses `/v1/chat/completions` by -default, so no env var is required to keep the safe path. -To opt in to `/v1/responses` for a backend you have verified end to end, set -`NEMOCLAW_PREFERRED_API` before onboarding: +For the compatible-endpoint provider, NemoClaw uses `/v1/chat/completions` by default, so you do not need an environment variable to keep the safe path. +To opt in to `/v1/responses` for a backend you have verified end to end, set `NEMOCLAW_PREFERRED_API` before onboarding: -```console -$ NEMOCLAW_PREFERRED_API=openai-responses nemoclaw onboard +```bash +NEMOCLAW_PREFERRED_API=openai-responses nemoclaw onboard ``` **Note:** -`NEMOCLAW_INFERENCE_API_OVERRIDE` patches the config at container startup but -does not update the Dockerfile ARG baked into the image. -If you recreate the sandbox without the override env var, the image reverts to -the original API path. +`NEMOCLAW_INFERENCE_API_OVERRIDE` patches the config at container startup but does not update the Dockerfile ARG baked into the image. +If you recreate the sandbox without the override environment variable, the image reverts to the original API path. A fresh `nemoclaw onboard` is the reliable fix because it updates both the session and the baked image. ## Cross-Provider Switching + Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) also uses `nemoclaw inference set`. The command updates both the gateway route and the OpenClaw provider namespace in the running sandbox config. +If the in-sandbox config sync fails after the gateway route is updated, NemoClaw keeps the host registry aligned with the gateway and prints a rebuild hint. +Run the rebuild before relying on the running agent if the warning says the image config could not be patched. -```console -$ nemoclaw inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify +```bash +nemoclaw inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify ``` + + +Switching to a different provider family (for example, from NVIDIA Endpoints to Anthropic) also uses `nemoclaw inference set`. +The command updates both the gateway route and `/sandbox/.hermes/config.yaml`. +If the Hermes config sync fails after the gateway route is updated, NemoClaw keeps the host registry aligned with the gateway and prints a rebuild hint. +Run the rebuild before relying on the running agent if the warning says the image config could not be patched. + +```bash +nemoclaw inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify +``` + + + Use `--no-verify` only when OpenShell cannot verify the provider at switch time but you have already confirmed the provider and credential. ## Tune Model Metadata @@ -122,26 +144,40 @@ To change these values, set the corresponding environment variables before runni | `NEMOCLAW_AGENT_TIMEOUT` | Positive integer (seconds) | `600` | | `NEMOCLAW_AGENT_HEARTBEAT_EVERY` | Go-style duration (`30m`, `1h`, `0m` to disable) | `unset` (OpenClaw default) | -Invalid values are ignored, and the default bakes into the image. +NemoClaw ignores invalid values and bakes the default into the image. +For Local Ollama, onboarding loads the selected model first and uses Ollama's reported runtime context length when `NEMOCLAW_CONTEXT_WINDOW` is unset. +For local vLLM, onboarding uses the runtime `max_model_len` value when the server reports one and `NEMOCLAW_CONTEXT_WINDOW` is unset. Use `NEMOCLAW_INFERENCE_INPUTS=text,image` only for a model that accepts image input through the selected provider. -```console -$ export NEMOCLAW_CONTEXT_WINDOW=65536 -$ export NEMOCLAW_MAX_TOKENS=8192 -$ export NEMOCLAW_REASONING=true -$ export NEMOCLAW_INFERENCE_INPUTS=text,image -$ export NEMOCLAW_AGENT_TIMEOUT=1800 -$ export NEMOCLAW_AGENT_HEARTBEAT_EVERY=0m -$ nemoclaw onboard -``` - -`NEMOCLAW_AGENT_TIMEOUT` controls the per-request inference timeout baked into -`agents.defaults.timeoutSeconds`. Increase it for slow local inference (for -example, CPU-only Ollama or vLLM on modest hardware). NemoClaw writes this -value into `openclaw.json` during onboarding. The default sandbox may keep that -file writable for agent state, but direct in-sandbox edits are not the supported -or durable way to change NemoClaw-managed defaults. Rebuild the sandbox via -`nemoclaw onboard` to apply a new value. +```bash +export NEMOCLAW_CONTEXT_WINDOW=65536 +export NEMOCLAW_MAX_TOKENS=8192 +export NEMOCLAW_REASONING=true +export NEMOCLAW_INFERENCE_INPUTS=text,image +export NEMOCLAW_AGENT_TIMEOUT=1800 +export NEMOCLAW_AGENT_HEARTBEAT_EVERY=0m +nemoclaw onboard +``` + + + +`NEMOCLAW_AGENT_TIMEOUT` controls the per-request inference timeout baked into `agents.defaults.timeoutSeconds`. +Increase it for slow local inference, such as CPU-only Ollama or vLLM on modest hardware. +NemoClaw writes this value into `openclaw.json` during onboarding. +The default sandbox can keep that file writable for agent state, but direct in-sandbox edits are not the supported or durable way to change NemoClaw-managed defaults. +Rebuild the sandbox with `nemoclaw onboard` to apply a new value. + + + + +`NEMOCLAW_AGENT_TIMEOUT` controls the per-request inference timeout baked into the Hermes sandbox image. +Increase it for slow local inference, such as CPU-only Ollama or vLLM on modest hardware. +Direct in-sandbox edits are not the supported or durable way to change NemoClaw-managed defaults. +Rebuild the sandbox with `nemoclaw onboard` to apply a new value. + + + + `NEMOCLAW_AGENT_HEARTBEAT_EVERY` sets `agents.defaults.heartbeat.every`. This controls OpenClaw's periodic main-session agent turn. @@ -150,15 +186,22 @@ The OpenClaw default is 30 minutes (1 hour for Anthropic OAuth / Claude CLI reus Tune the cadence with a duration string like `5m` or `2h`, or set `0m` to disable the periodic turns entirely. Disabling also drops `HEARTBEAT.md` from normal-run bootstrap context per upstream behavior, so the model no longer sees heartbeat-only instructions. NemoClaw writes this value into `openclaw.json` during onboarding. -The in-sandbox `openclaw config set` command is not the supported path for -NemoClaw-managed build-time defaults, and direct file edits are overwritten by a -rebuild. Rebuild the sandbox via `nemoclaw onboard --resume` to apply a new value. +The in-sandbox `openclaw config set` command is not the supported path for NemoClaw-managed build-time defaults, and a rebuild overwrites direct file edits. +Rebuild the sandbox with `nemoclaw onboard --resume` to apply a new value. + + + + +Hermes does not use OpenClaw's `HEARTBEAT.md` wake-up mechanism. +Rebuild the sandbox with `nemoclaw onboard --resume` to apply build-time inference metadata changes. + + These variables are build-time settings. If you change them on an existing sandbox, recreate the sandbox so the new values bake into the image: -```console -$ nemoclaw onboard --resume --recreate-sandbox +```bash +nemoclaw onboard --resume --recreate-sandbox ``` ## Verify the Active Model @@ -187,20 +230,33 @@ Run `nemoclaw onboard` to configure one. Run the status command when you also need sandbox, service, and messaging health: -```console -$ nemoclaw status +```bash +nemoclaw status ``` The status output includes the active provider, model, and endpoint with the rest of the sandbox state. ## Notes + + - The host keeps provider credentials. - The sandbox continues to use `inference.local`. - `nemoclaw inference set` patches the selected running OpenClaw or Hermes sandbox config and recomputes its config hash. - Use `nemoclaw onboard --resume --recreate-sandbox` for build-time settings such as context window, max tokens, reasoning mode, heartbeat cadence, or image contents. - Local Ollama and local vLLM routes use local provider tokens rather than `OPENAI_API_KEY`. Rebuilds of older local-inference sandboxes clear the stale OpenAI credential requirement automatically. + + + +- The host keeps provider credentials. +- The sandbox continues to use `inference.local`. +- `nemoclaw inference set` patches the selected running Hermes sandbox config and recomputes its config hash. +- Use `nemoclaw onboard --resume --recreate-sandbox` for build-time settings such as context window, max tokens, reasoning mode, heartbeat cadence, or image contents. +- Local Ollama and local vLLM routes use local provider tokens rather than `OPENAI_API_KEY`. Rebuilds of older local-inference sandboxes clear the stale OpenAI credential requirement automatically. + + + ## Related Topics -- Inference Options (use the `nemoclaw-user-configure-inference` skill) for the full list of providers available during onboarding. +- [Inference Options](inference-options.md) for the full list of providers available during onboarding. diff --git a/.agents/skills/nemoclaw-user-configure-inference/references/tool-calling-reliability.md b/.agents/skills/nemoclaw-user-configure-inference/references/tool-calling-reliability.md index 7d860c9d42..d415ce4dda 100644 --- a/.agents/skills/nemoclaw-user-configure-inference/references/tool-calling-reliability.md +++ b/.agents/skills/nemoclaw-user-configure-inference/references/tool-calling-reliability.md @@ -1,11 +1,7 @@ - - # Tool-Calling Reliability for Local Inference -Local inference is useful for privacy, cost control, and offline development, but -tool-calling agents place stricter demands on the model server than simple chat. -The model server must return structured `tool_calls`, not a JSON-looking string -inside normal assistant text. +Local inference is useful for privacy, cost control, and offline development, but tool-calling agents place stricter demands on the model server than simple chat. +The model server must return structured `tool_calls`, not a JSON-looking string inside normal assistant text. Use this page when the TUI shows raw JSON such as: @@ -13,8 +9,7 @@ Use this page when the TUI shows raw JSON such as: {"arguments":{"query":"robotics"},"name":"memory_search"} ``` -If that appears as text in the assistant reply, OpenClaw cannot dispatch the -tool because the inference response did not include a structured tool call. +If that appears as text in the assistant reply, OpenClaw cannot dispatch the tool because the inference response did not include a structured tool call. ## Quick Choice Guide @@ -28,9 +23,8 @@ tool because the inference response did not include a structured tool call. | Multi-turn tool dispatch | Risky | Yes | Ollama can work well for lightweight local chat and some simple tool surfaces. -For OpenClaw-style agent loops with multiple tools, long instructions, or -multi-turn dispatch, use a server that exposes OpenAI-compatible -`/v1/chat/completions` with a tool-call parser. vLLM is the common local choice. +For OpenClaw-style agent loops with multiple tools, long instructions, or multi-turn dispatch, use a server that exposes OpenAI-compatible `/v1/chat/completions` with a tool-call parser. +vLLM is the common local choice. ## Symptom @@ -41,20 +35,17 @@ The common failure mode is: - The gateway treats the response as normal text. - No tool runs, and the user sees raw JSON in the TUI. -This is different from a network or policy block. `nemoclaw status`, -`nemoclaw logs`, and `nemoclaw debug --quick` can all look healthy while -tool dispatch still fails inside the conversation. +This is different from a network or policy block. +`nemoclaw status`, `nemoclaw logs`, and `nemoclaw debug --quick` can all look healthy while tool dispatch still fails inside the conversation. ## Recommended Fix -For persistent NemoClaw use, start vLLM with auto tool choice and the parser that -matches your model family, then rerun onboarding and select **Local vLLM -[experimental]** or **Other OpenAI-compatible endpoint**. +For persistent NemoClaw use, start vLLM with auto tool choice and the parser that matches your model family, then rerun onboarding and select **Local vLLM [experimental]** or **Other OpenAI-compatible endpoint**. For Hermes 3 style models, a known-good vLLM command shape is: -```console -$ vllm serve /models/Hermes-3-Llama-3.1-8B \ +```bash +vllm serve /models/Hermes-3-Llama-3.1-8B \ --served-model-name hermes-3-llama-3.1-8b \ --enable-auto-tool-choice \ --tool-call-parser hermes \ @@ -93,22 +84,20 @@ services: Then onboard against that endpoint: -```console -$ NEMOCLAW_PROVIDER=custom \ +```bash +NEMOCLAW_PROVIDER=custom \ NEMOCLAW_ENDPOINT_URL=http://localhost:8002/v1 \ NEMOCLAW_MODEL=hermes-3-llama-3.1-8b \ COMPATIBLE_API_KEY=$VLLM_API_KEY \ nemoclaw onboard --non-interactive ``` -If the endpoint does not require authentication, set `COMPATIBLE_API_KEY` to any -non-empty placeholder, such as `dummy`. +If the endpoint does not require authentication, set `COMPATIBLE_API_KEY` to any non-empty placeholder, such as `dummy`. ## Advanced Temporary Repointing -NemoClaw-managed sandboxes normally block direct `openclaw config set` writes -inside the sandbox because those edits do not survive rebuilds. Prefer rerunning -`nemoclaw onboard` for a persistent provider change. +NemoClaw-managed sandboxes normally block direct `openclaw config set` writes inside the sandbox because those edits do not survive rebuilds. +Prefer rerunning `nemoclaw onboard` for a persistent provider change. If you are intentionally testing a mutable OpenClaw config, prepare a batch file like this: @@ -134,15 +123,13 @@ like this: } ``` -Apply it only in environments where OpenClaw config writes are allowed: +Apply it only in environments where OpenClaw allows config writes: -```console -$ openclaw config set --batch-file /sandbox/.openclaw/vllm-tool-calls.json +```bash +openclaw config set --batch-file /sandbox/.openclaw/vllm-tool-calls.json ``` -After testing, persist the working provider through `nemoclaw onboard` so the -sandbox image, OpenShell inference route, and host-managed credentials stay in -sync. +After testing, persist the working provider through `nemoclaw onboard` so the sandbox image, OpenShell inference route, and host-managed credentials stay in sync. ## Verify the Fix @@ -150,15 +137,12 @@ After switching to vLLM, ask for an action that should use a tool. Good signs: - The TUI does not show JSON blobs as assistant text. - The gateway log shows tool dispatch and a follow-up answer. -- `nemoclaw status` reports the local vLLM or compatible endpoint as the - active provider. +- `nemoclaw status` reports the local vLLM or compatible endpoint as the active provider. -If JSON still appears as text, confirm that vLLM was started with both -`--enable-auto-tool-choice` and the correct `--tool-call-parser` value for your -model. +If JSON still appears as text, confirm that you started vLLM with both `--enable-auto-tool-choice` and the correct `--tool-call-parser` value for your model. ## Next Steps -- Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill) -- Inference Options (use the `nemoclaw-user-configure-inference` skill) -- Switch Inference Models (use the `nemoclaw-user-configure-inference` skill) +- [Use a Local Inference Server](../SKILL.md) +- [Inference Options](inference-options.md) +- [Switch Inference Models](switch-inference-providers.md) diff --git a/.agents/skills/nemoclaw-user-configure-security/SKILL.md b/.agents/skills/nemoclaw-user-configure-security/SKILL.md index e6cd9522d5..865e4aa6d8 100644 --- a/.agents/skills/nemoclaw-user-configure-security/SKILL.md +++ b/.agents/skills/nemoclaw-user-configure-security/SKILL.md @@ -1,15 +1,13 @@ --- name: "nemoclaw-user-configure-security" description: "Presents a risk framework for every configurable security control in NemoClaw. Use when evaluating security posture, reviewing sandbox security defaults, or assessing control trade-offs. Trigger keywords - nemoclaw security best practices, sandbox security controls risk framework, nemoclaw credential storage, openshell provider, api key security, openclaw security controls, nemoclaw security boundary, prompt injection, tool access control." +license: "Apache-2.0" --- - - - -# NemoClaw Security Best Practices: Controls, Risks, and Posture Profiles +# NemoClaw User Configure Security ## References - **Load [references/best-practices.md](references/best-practices.md)** when evaluating security posture, reviewing sandbox security defaults, or assessing control trade-offs. Presents a risk framework for every configurable security control in NemoClaw. -- **Load [references/openclaw-controls.md](references/openclaw-controls.md)** when reviewing the security boundary between NemoClaw and OpenClaw or assessing what NemoClaw does not cover. Lists OpenClaw security controls that operate independently of NemoClaw, including prompt injection detection, tool access control, rate limiting, environment variable policy, audit framework, supply chain scanning, messaging access policy, context visibility, and safe regex. - **Load [references/credential-storage.md](references/credential-storage.md)** when reviewing how credentials are handled, locating a stored credential, or assessing the storage threat model. Covers where NemoClaw stores provider credentials, why nothing is persisted to host disk, and how the OpenShell gateway acts as the single system of record. +- **Load [references/openclaw-controls.md](references/openclaw-controls.md)** when reviewing the security boundary between NemoClaw and OpenClaw or assessing what NemoClaw does not cover. Lists OpenClaw security controls that operate independently of NemoClaw, including prompt injection detection, tool access control, rate limiting, environment variable policy, audit framework, supply chain scanning, messaging access policy, context visibility, and safe regex. diff --git a/.agents/skills/nemoclaw-user-configure-security/evals/evals.json b/.agents/skills/nemoclaw-user-configure-security/evals/evals.json new file mode 100644 index 0000000000..22708120bf --- /dev/null +++ b/.agents/skills/nemoclaw-user-configure-security/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-security-best-practices-001", + "question": "I'm evaluating NemoClaw security best practices. Help me understand the risk posture of each configurable control so I can justify the setup to my team or security reviewers.", + "expected_skill": "nemoclaw-user-configure-security", + "ground_truth": "A NemoClaw-specific answer that helps the user understand the risk posture of each configurable control and gives enough concrete guidance, decision criteria, verification steps, or risk framing to justify the setup to my team or security reviewers.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-configure-security/references/best-practices.md b/.agents/skills/nemoclaw-user-configure-security/references/best-practices.md index b0f3f0fb32..7893b472e4 100644 --- a/.agents/skills/nemoclaw-user-configure-security/references/best-practices.md +++ b/.agents/skills/nemoclaw-user-configure-security/references/best-practices.md @@ -1,24 +1,24 @@ - - # NemoClaw Security Best Practices: Controls, Risks, and Posture Profiles +import { AgentOnly } from "../_components/AgentGuide"; + NemoClaw ships with deny-by-default security controls across four layers: network, filesystem, process, and inference. You can tune every control, but each change shifts the risk profile. -This page documents every configurable knob, its default, what it protects, the concrete risk of relaxing it, and a recommendation for common use cases. +This page documents each configurable control, its default, what it protects, the concrete risk of relaxing it, and a recommendation for common use cases. For background on how the layers fit together, refer to How It Works (use the `nemoclaw-user-overview` skill). ## Protection Layers at a Glance NemoClaw enforces security at four layers. -NemoClaw locks some when it creates the sandbox and requires a restart to change them. +NemoClaw locks some controls when it creates the sandbox and requires a restart to change them. You can hot-reload others while the sandbox runs. -The following diagram shows the default posture immediately after `nemoclaw onboard`, before you approve any endpoints or apply any presets. +The following diagram shows the default posture immediately after onboarding, before you approve any endpoints or apply any presets. ```mermaid flowchart TB - subgraph HOST["Your Machine: default posture after nemoclaw onboard"] + subgraph HOST["Your Machine: default posture after onboarding"] direction TB YOU["👤 Operator"] @@ -70,15 +70,16 @@ flowchart TB | Network | Unauthorized outbound connections and data exfiltration. | OpenShell gateway | Yes. Use `openshell policy set` or operator approval. | | Filesystem | System binary tampering, credential theft, config manipulation. | Landlock LSM + container mounts | Landlock layout: no. Requires sandbox re-creation. Use host-side NemoClaw commands for durable config changes. | | Process | Privilege escalation, fork bombs, syscall abuse. | Container runtime (Docker/K8s `securityContext`) | No. Requires sandbox re-creation. | -| Inference | Credential exposure, unauthorized model access, cost overruns. | OpenShell gateway | Yes. Use `nemoclaw inference set`. | +| Inference | Credential exposure, unauthorized model access, cost overruns. | OpenShell gateway | Yes. Use the NemoClaw inference switching command. | ## Network Controls NemoClaw controls which hosts, ports, and HTTP methods the sandbox can reach, and lets operators approve or deny requests in real time. +Network policy allowlists do not disable OpenShell's SSRF guard; see Customize the Network Policy (use the `nemoclaw-user-manage-policy` skill) for the interaction between egress rules and internal-address blocking. ### Deny-by-Default Egress -The sandbox blocks all outbound connections unless you explicitly list the endpoint in the policy file `nemoclaw-blueprint/policies/openclaw-sandbox.yaml`. +The sandbox blocks all outbound connections unless you explicitly list the endpoint in the applicable baseline policy files. | Aspect | Detail | |---|---| @@ -92,7 +93,7 @@ The sandbox blocks all outbound connections unless you explicitly list the endpo Each network policy entry restricts which executables can reach the endpoint using the `binaries` field. OpenShell identifies the calling binary by reading `/proc//exe` (the kernel-trusted executable path, not `argv[0]`), walking the process tree for ancestor binaries, and computing a SHA256 hash of each binary on first use. -If someone replaces a binary while the sandbox runs, the hash mismatch triggers an immediate deny. +If someone replaces a binary while the sandbox runs, the hash mismatch immediately denies the request. | Aspect | Detail | |---|---| @@ -126,12 +127,12 @@ The `protocol` field on an endpoint controls whether the proxy also inspects ind ### Operator Approval Flow -When the agent reaches an unlisted endpoint, OpenShell blocks the request and prompts the operator in the TUI. +When the agent reaches an unlisted endpoint, OpenShell blocks the request and prompts you in the TUI. | Aspect | Detail | |---|---| | Default | Enabled. The gateway blocks all unlisted endpoints and requires approval. | -| What you can change | The system merges approved endpoints into the sandbox's policy as a new durable revision. They persist across sandbox restarts within the same sandbox instance. However, when you destroy and recreate the sandbox (for example, by running `nemoclaw onboard`), the policy resets to the baseline defined in the blueprint. | +| What you can change | The system merges approved endpoints into the sandbox's policy as a new durable revision. They persist across sandbox restarts within the same sandbox instance. However, when you destroy and recreate the sandbox through onboarding, the policy resets to the baseline defined in the blueprint. | | Risk if relaxed | Approving an endpoint permanently widens the running sandbox's policy. If you approve a broad domain (such as a CDN that hosts arbitrary content), the agent can fetch anything from that domain until you destroy and recreate the sandbox. | | Recommendation | Review each blocked request before approving. If you find yourself approving the same endpoint repeatedly, add it to the baseline policy with appropriate binary and path restrictions. To reset approved endpoints, destroy and recreate the sandbox. | @@ -143,6 +144,7 @@ NemoClaw ships preset policy files in `nemoclaw-blueprint/policies/presets/` for |---|---|---| | `brave` | Brave Search API. | Agent can issue search queries. | | `brew` | Homebrew (Linuxbrew) package manager. The sandbox base image includes the `brew` binary; this preset opens network egress to GitHub and the Homebrew formulae index so `brew install` can fetch bottles. | Allows installing arbitrary Homebrew packages, which may contain malicious code. | +| `claude-code` | Claude Code CLI API, telemetry, and crash-report endpoints. | Allows a separately installed Claude Code CLI to reach Anthropic and telemetry hosts with its own credentials. Do not use this preset for NemoClaw inference routing. | | `discord` | Discord REST API, WebSocket gateway, CDN. | CDN endpoint (`cdn.discordapp.com`) allows GET to any path. WebSocket uses `access: full` (no inspection). | | `github` | GitHub and GitHub REST API. | Gives agent read/write access to repositories and issues via `git`. | | `huggingface` | Hugging Face Hub (download-only) and inference router. | Allows downloading arbitrary models and datasets. POST is restricted to the inference router only. | @@ -173,6 +175,8 @@ The container mounts system directories read-only to prevent the agent from modi ### Agent Config Directory + + The `/sandbox/.openclaw` directory contains the OpenClaw gateway configuration (model routing, CORS settings, channel config). The current entrypoint reads the gateway auth token from OpenClaw config when present, exports it as `OPENCLAW_GATEWAY_TOKEN`, and writes it to `/tmp/nemoclaw-proxy-env.sh` so interactive sandbox sessions can reach the gateway through system-wide shell hooks. In root mode, the gateway process still runs as the separate `gateway` user, but the token is intentionally available to sandbox shells for local gateway access. @@ -180,19 +184,54 @@ In root mode, the gateway process still runs as the separate `gateway` user, but Writable agent state such as plugins, skills, hooks, and workspace metadata lives directly under `/sandbox/.openclaw`. By default, this directory starts writable so the agent can manage its own config, install skills, and write to standard home-directory paths natively. -For sensitive workloads, use a reviewed host-side immutability workflow after initial setup so config and writable state entry points cannot be changed by the sandbox user. - -- **DAC permissions (default).** The sandbox user owns `/sandbox/.openclaw` with mode `700` and `openclaw.json` with mode `600`, so the agent can read and write config directly. +For sensitive workloads, use a reviewed host-side immutability workflow after initial setup so the sandbox user cannot change config and high-risk state entry points. +The immutability workflow locks high-risk state directories (`skills`, `hooks`, `cron`, `agents`, `extensions`, `plugins`, `workspace`, `memory`, `devices`, `canvas`, `telegram`, `wechat`, `whatsapp`, `platforms`, `weixin`, `profiles`, `skins`) to `root:sandbox` with `chmod -R go-w`. +The OpenClaw gateway (a member of the `sandbox` group) keeps read access to plugin and agent code; the sandbox user can no longer write them. +The same workflow also locks the secret-bearing directories (`credentials`, `identity`, `pairing`) to `root:root 700` with `chmod -R go-rwX`. +Neither the sandbox user nor the gateway can read those secrets while the lock is active. +Restoring the mutable-default posture returns both groups to `sandbox:sandbox 2770`. +The list is the union of state directories declared by every shipped agent manifest; the lock helper silently skips dirs that aren't present in a given agent's config tree. +Two exemption kinds keep runtime data writable. +The lock inventory omits top-level Hermes runtime dirs (`sessions/`, `memories/`, `logs/`, `cache/`, `plans/`) and the image-build-regenerated `openclaw-weixin/`; the lock helper never touches those paths. +Inside a locked tree, the helper restores `agents//sessions/` to `sandbox:sandbox 2770` after the surrounding `agents/` lock so the OpenClaw TUI can create and write session metadata under an otherwise root-owned parent. +If any high-risk state-dir root is a symlink when the lock runs, the lock helper refuses to proceed and reports "Config not locked: state dir root is a symlink" instead of following the link with privileged `chown -R` / `chmod -R`. + +- **DAC permissions (default).** The sandbox user owns `/sandbox/.openclaw` with mode `2770` (setgid `sandbox:sandbox`) and `openclaw.json` with mode `660`, so the agent and its group can read and write config directly. A reviewed host-side immutability workflow should compare the intended ownership and mode with the live sandbox filesystem before treating the config tree as locked. - **Config integrity hash.** The image includes a SHA256 hash of `openclaw.json`. In the default mutable state, `.config-hash` is sandbox-owned and is not a tamper-proof trust anchor, so startup does not fail closed on that hash. When the hash is root-owned and read-only, startup enforces it and refuses to start if the hash does not match. +- **Content integrity seal.** + A clean immutable config lock can capture a SHA-256 seal of `openclaw.json` and other locked files into host-side state. + Verification recomputes hashes inside the sandbox and surfaces drift on mismatch, so a host-root tamper that flips permissions back to `444 root:root` after rewriting the file is still flagged. + Sandboxes locked before the seal landed have no recorded hash; permission-only verification cannot prove their bytes match the image original, so the seal is **not** a retroactive proof of integrity for legacy state. + The same limitation applies when the locked file set grew after the existing seal was captured. + Rebuild the sandbox for a known-good baseline before trusting a new seal. - **Gateway token environment.** The gateway exports `OPENCLAW_GATEWAY_TOKEN` and writes it to `/tmp/nemoclaw-proxy-env.sh` for interactive sandbox sessions. Keep this in mind when deciding whether a workload should run with mutable config or an immutable config posture. | Aspect | Detail | |---|---| -| Default | The sandbox keeps `/sandbox/.openclaw` writable (`700 sandbox:sandbox`), sets `openclaw.json` to `600 sandbox:sandbox`, lets the agent manage state directly, and has the gateway place `OPENCLAW_GATEWAY_TOKEN` in `/tmp/nemoclaw-proxy-env.sh` for interactive shells. | +| Default | The sandbox keeps `/sandbox/.openclaw` writable (`2770 sandbox:sandbox`), sets `openclaw.json` to `660 sandbox:sandbox`, lets the agent manage state directly, and has the gateway place `OPENCLAW_GATEWAY_TOKEN` in `/tmp/nemoclaw-proxy-env.sh` for interactive shells. | | What you can change | Apply a reviewed host-side immutability workflow to lock config and state directories with DAC permissions and the immutable flag where available. | | Risk of default | A writable `.openclaw` directory lets the agent modify its own gateway config: disabling CORS or redirecting inference to an attacker-controlled endpoint. | | Recommendation | For always-on assistants handling sensitive workloads, lock config after initial setup. For development workflows, the writable default is appropriate. | + + + +The `/sandbox/.hermes` directory contains Hermes runtime configuration, generated environment settings, logs, platform state, and durable database state. +NemoClaw writes `config.yaml` and `.env` during onboarding and rebuilds. +Direct edits to these files can be overwritten when NemoClaw regenerates the image. + +Hermes also stores runtime state such as `state.db`, logs, and platform sessions under the `.hermes` tree. +Messaging sessions such as WhatsApp pairing can remain mutable by design so they survive rebuilds. + +| Aspect | Detail | +|---|---| +| Default | The Hermes config tree contains NemoClaw-generated config plus mutable runtime state. | +| What you can change | Use host-side NemoClaw commands for durable model, provider, messaging, and policy changes; inspect files directly only for debugging. | +| Risk of direct edits | Direct edits to generated config can drift from the host registry and may be lost on rebuild. | +| Recommendation | For sensitive workloads, keep generated config under NemoClaw control and back up Hermes state before destructive operations. | + + + ### Writable Paths The agent has read-write access to `/sandbox`, `/tmp`, and `/dev/null`. @@ -228,20 +267,28 @@ When the entrypoint switches from root to the `sandbox` and `gateway` users, it The initial entrypoint drop removes `cap_sys_admin`, `cap_sys_ptrace`, `cap_net_raw`, `cap_dac_override`, `cap_sys_chroot`, `cap_fsetid`, `cap_setfcap`, `cap_mknod`, `cap_audit_write`, and `cap_net_bind_service`. During `setpriv` step-down, the child process also loses `cap_setuid`, `cap_setgid`, `cap_fowner`, `cap_chown`, and `cap_kill`. -This is best-effort: if `capsh` is not available or `CAP_SETPCAP` is not in the bounding set, the entrypoint logs a warning and continues with the default capability set. +This behavior is best effort: if `capsh` is not available or `CAP_SETPCAP` is not in the bounding set, the entrypoint logs a warning and continues with the default capability set. If `setpriv` is unavailable, the entrypoint falls back to `gosu` and logs a warning that the remaining bounding-set capabilities were retained for the child process. -For additional protection, pass `--cap-drop=ALL` with `docker run` or Compose (see Sandbox Hardening (use the `nemoclaw-user-deploy-remote` skill)). + +To make the drop fail-closed instead of best-effort, set `NEMOCLAW_REQUIRE_CAP_DROP=1` in the entrypoint environment. +The agent then refuses to start unless the agent process tree's bounding set is verified free of the dangerous capabilities, so it will not boot on a host whose bounding set still holds them — typically one that cannot perform the drop (no `CAP_SETPCAP`, or `capsh` missing) and was not given a clean bounding set by the container runtime. +This is opt-in because such hosts are common (many cloud VMs, Docker Desktop, WSL); leaving it unset preserves the best-effort default. +The check covers the agent process tree only — a `nemoclaw connect` shell is spawned by the container runtime outside that tree and is not affected (tracked in [NVIDIA/OpenShell#1452](https://github.com/NVIDIA/OpenShell/issues/1452)). + + +For additional protection, pass `--cap-drop=ALL` with `docker run` or Compose. Refer to Sandbox Hardening. + | Aspect | Detail | |---|---| | Default | The entrypoint drops dangerous capabilities at startup using `capsh`, then uses `setpriv` during user step-down when possible. Best-effort. | -| What you can change | When launching with `docker run` directly, pass `--cap-drop=ALL --cap-add=NET_BIND_SERVICE` for stricter enforcement. In the standard NemoClaw flow (with `nemoclaw onboard`), the entrypoint handles capability dropping automatically. | +| What you can change | When launching with `docker run` directly, pass `--cap-drop=ALL --cap-add=NET_BIND_SERVICE` for stricter enforcement. In the standard NemoClaw onboarding flow, the entrypoint handles capability dropping automatically. | | Risk if relaxed | `CAP_SYS_ADMIN` and `CAP_SYS_PTRACE` expand kernel and process attack surface. `CAP_NET_RAW` allows raw socket access for network sniffing. `CAP_DAC_OVERRIDE` bypasses filesystem permission checks. If `capsh` or `setpriv` cannot run, the container retains more of the runtime-provided capability set. | | Recommendation | Run on an image that includes `capsh` and `setpriv` (the NemoClaw image includes them). For defense-in-depth, also pass `--cap-drop=ALL` at the container runtime level. | ### Gateway Process Isolation -The OpenClaw gateway runs as a separate `gateway` user, not as the `sandbox` user that runs the agent. +The in-sandbox gateway runs as a separate `gateway` user, not as the `sandbox` user that runs the agent. | Aspect | Detail | |---|---| @@ -265,7 +312,7 @@ The `no-new-privileges` flag prevents processes from gaining additional privileg A process limit caps the number of processes the sandbox user can spawn. The entrypoint sets both soft and hard limits using `ulimit -u 512`. -This is best-effort: if the container runtime restricts `ulimit` modification, the entrypoint logs a security warning and continues without the limit. +This behavior is best effort: if the container runtime restricts `ulimit` modification, the entrypoint logs a security warning and continues without the limit. | Aspect | Detail | |---|---| @@ -318,7 +365,7 @@ A registry compromise or accidental force-push cannot silently swap the sandbox | Default | `nemoclaw-blueprint/blueprint.yaml` pins the sandbox image by digest. A CI regression test blocks any mutable-tag reference from merging. | | What you can change | Contributors bumping the sandbox image must update the digest in `blueprint.yaml`. Release tooling should rewrite the digest automatically. | | Risk if relaxed | Reverting to a mutable tag (`:latest`) allows a registry-side change to replace the sandbox image without any blueprint update, which is a supply-chain risk. | -| Recommendation | Always reference the sandbox image by digest. If you build a custom image with `nemoclaw onboard --from`, the digest constraint does not apply to your local build. | +| Recommendation | Always reference the sandbox image by digest. If you build a custom image with the onboarding `--from` path, the digest constraint does not apply to your local build. | ### Auth Profile Permissions @@ -334,6 +381,8 @@ This prevents other users on the host from reading stored credentials. ## Gateway Authentication Controls + + The OpenClaw gateway authenticates devices that connect to the Control UI dashboard. NemoClaw hardens these defaults at image build time. @@ -381,6 +430,15 @@ The auto-pair watcher automatically approves device pairing requests from recogn | Risk if relaxed | Approving all device types without validation lets rogue or unexpected clients pair with the gateway unchallenged. | | Recommendation | No action needed. The entrypoint handles this automatically. If you see `[auto-pair] rejected unknown client=...` in the logs, investigate the source of the unexpected connection. | + + + +Hermes exposes an OpenAI-compatible API on the forwarded Hermes port and can optionally expose the native Hermes dashboard. +Do not publish those endpoints on shared or public networks unless you put them behind your own access controls. +NemoClaw still keeps provider credentials in OpenShell and routes model traffic through `inference.local`. + + + ### CLI Secret Redaction The CLI automatically redacts secret patterns (API keys, bearer tokens, provider credentials) from command output and error messages before logging them. @@ -390,21 +448,31 @@ The CLI automatically redacts secret patterns (API keys, bearer tokens, provider | Default | Enabled. The runner redacts secrets from stdout, stderr, and thrown error messages. | | What you can change | This is not a user-facing knob. The CLI enforces it on all command output paths. | | Risk if relaxed | Without redaction, secrets could appear in terminal scrollback, log files, or debug output shared in bug reports. | -| Recommendation | No action needed. If you share `nemoclaw debug` output, verify that no secrets appear in the collected diagnostics. | +| Recommendation | No action needed. If you share NemoClaw debug output, verify that no secrets appear in the collected diagnostics. | ### Memory Secret Scanner + + The NemoClaw plugin blocks the agent from writing likely secrets (API keys, tokens, private keys) into persistent memory files. The scanner intercepts Write, Edit, and similar tool calls targeting memory and workspace paths before they reach disk. | Aspect | Detail | |---|---| | Default | Enabled. The plugin registers a `before_tool_call` hook that scans for 14 high-confidence secret patterns. | -| What it covers | Examples include `.openclaw/memory/`, `.openclaw/workspace/`, `.openclaw/agents/`, `.openclaw/skills/`, `.openclaw/hooks/`, `.openclaw/credentials/`, `.openclaw/openclaw.json`, `.nemoclaw/`, and `MEMORY.md`; the exact coverage is defined by `MEMORY_PATH_SEGMENTS` and enforced through `isMemoryPath()`. | +| What it covers | Three classifiers, all enforced through `isMemoryPath()`: (1) absolute `MEMORY_PATH_SEGMENTS` such as `/.openclaw/memory/`, `/.openclaw/workspace/`, `/.openclaw/agents/`, `/.openclaw/skills/`, `/.openclaw/hooks/`, `/.openclaw/credentials/`, `/.openclaw/openclaw.json`, `/.nemoclaw/`; (2) canonical workspace basenames in `MEMORY_BASENAMES` (`IDENTITY.md`, `MEMORY.md`, `SOUL.md`, `USER.md`, `AGENTS.md`) matched regardless of the surrounding path; and (3) lexically-normalized workspace-relative writes matching `MEMORY_RELATIVE_PREFIXES` (`.openclaw/`, `.nemoclaw/`, `memory/`) or named workspace daily memory paths, for embedded-fallback mode where the host's path resolver is unavailable. | | What you can change | This is not a user-facing knob. The plugin enforces it automatically. | | Risk if relaxed | Without scanning, the agent could persist API keys or tokens in memory files that survive across sessions and backups. | | Recommendation | No action needed. If a write is blocked, the agent receives an actionable error listing the detected patterns. | + + + +Hermes does not use the OpenClaw NemoClaw plugin memory scanner. +Keep secrets in environment variables or OpenShell providers, and avoid writing raw credentials to Hermes state files or workspace content. + + + ## Inference Controls OpenShell routes all inference traffic through the gateway to isolate provider credentials from the sandbox. @@ -419,7 +487,7 @@ The agent never receives the provider API key. | Default | The agent talks to `inference.local`. The host owns the credential and upstream endpoint. | | What you can change | You cannot configure this architecture. The system always enforces it. | | Risk if bypassed | If the agent could reach an inference endpoint directly (by adding it to the network policy), it would need an API key. Since the sandbox does not contain credentials, this acts as defense-in-depth. However, adding an inference provider's host to the network policy without going through OpenShell routing could let the agent use a stolen or hardcoded key. | -| Recommendation | Do not add inference provider hosts (such as `api.openai.com` or `api.anthropic.com`) to the network policy. Use OpenShell inference routing instead. | +| Recommendation | Do not add inference provider hosts (such as `api.openai.com` or `api.anthropic.com`) to the network policy for NemoClaw model traffic. Use OpenShell inference routing instead. The `claude-code` preset is a separate opt-in exception for running the Claude Code CLI with its own credentials, not a way to configure NemoClaw inference. | ### Provider Trust Tiers @@ -438,12 +506,14 @@ Different inference providers have different trust and cost profiles. ### Experimental Providers -The `NEMOCLAW_EXPERIMENTAL=1` environment variable gates local NVIDIA NIM and generic Linux managed vLLM install/start. DGX Spark and DGX Station managed vLLM entries are offered by default, and an already-running vLLM server on `localhost:8000` is offered in the menu without a flag, because selecting either is an explicit user action. +The `NEMOCLAW_EXPERIMENTAL=1` environment variable gates local NVIDIA NIM and generic Linux managed vLLM install/start. +DGX Spark and DGX Station managed vLLM entries appear by default. +An already-running vLLM server on `localhost:8000` also appears in the menu without a flag because selecting it is an explicit user action. | Aspect | Detail | |---|---| | Default | Local NVIDIA NIM and generic Linux managed vLLM install/start are hidden. DGX Spark and DGX Station managed vLLM entries, plus already-running vLLM on `localhost:8000`, are offered when detected. | -| What you can change | Set `NEMOCLAW_EXPERIMENTAL=1` before running `nemoclaw onboard` to surface Local NIM and generic Linux managed vLLM. To request only the managed vLLM path non-interactively, set `NEMOCLAW_PROVIDER=install-vllm`. | +| What you can change | Set `NEMOCLAW_EXPERIMENTAL=1` before onboarding to surface Local NIM and generic Linux managed vLLM. To request only the managed vLLM path non-interactively, set `NEMOCLAW_PROVIDER=install-vllm`. | | Risk if selected | NemoClaw has not fully validated these providers. NIM requires a NIM-capable GPU. The managed vLLM path pulls a container image and starts it on a supported NVIDIA GPU host. Misconfiguration can cause failed inference or unexpected behavior. | | Recommendation | Use experimental providers only for evaluation. Do not rely on them for always-on assistants. | @@ -489,16 +559,16 @@ The following patterns weaken security without providing meaningful benefit. | Omitting `protocol: rest` on REST API endpoints without a compatibility reason | Endpoints without a `protocol` field use L4-only enforcement. The proxy allows the TCP stream through after checking host, port, and binary, but cannot see or filter individual HTTP requests. | Add `protocol: rest` with explicit `rules` to enable per-request method and path control on REST APIs. Use L4 pass-through only for documented cases such as npm/Yarn on Node 22, where the client requires a CONNECT tunnel that L7 inspection would break. | | Adding endpoints to the baseline policy for one-off requests | Adding an endpoint to the baseline policy makes it permanently reachable across all sandbox instances. | Use operator approval. Approved endpoints persist within the sandbox instance but reset when you destroy and recreate the sandbox. | | Relying solely on the entrypoint for capability drops | The entrypoint drops dangerous capabilities using `capsh`, but this is best-effort. If `capsh` is unavailable or `CAP_SETPCAP` is not in the bounding set, the container runs with the default capability set. | Pass `--cap-drop=ALL` at the container runtime level as defense-in-depth. | -| Leaving `/sandbox/.openclaw` writable on sensitive workloads | This directory contains the OpenClaw gateway configuration. A writable `.openclaw` lets the agent disable CORS, redirect inference routing, or weaken gateway protections. | Lock config for always-on assistants handling sensitive data. | -| Adding inference provider hosts to the network policy | Direct network access to an inference host bypasses credential isolation and usage tracking. | Use OpenShell inference routing instead of adding hosts like `api.openai.com` or `api.anthropic.com` to the network policy. | +| Leaving generated agent config writable on sensitive workloads | The generated config tree contains model routing, channel settings, and runtime integration state (`/sandbox/.openclaw` for OpenClaw, `/sandbox/.hermes` for Hermes). Writable config lets the agent drift from host-managed policy and routing. | Keep generated config under NemoClaw control for always-on assistants handling sensitive data. | +| Adding inference provider hosts to the network policy for NemoClaw inference | Direct network access to an inference host bypasses credential isolation and usage tracking. | Use OpenShell inference routing instead of adding hosts like `api.openai.com` or `api.anthropic.com` to the network policy. Apply `claude-code` only when intentionally running the separate Claude Code CLI inside the sandbox. | | Disabling device auth for remote deployments | Without device auth, any device on the network can connect to the gateway without pairing. Combined with a cloudflared tunnel, this makes the dashboard publicly accessible and unauthenticated. | Keep `NEMOCLAW_DISABLE_DEVICE_AUTH` at its default (`0`). Only set it to `1` for local headless or development environments. | ## Known Limitations | Limitation | Impact | Mitigation | |-----------|--------|------------| -| `openclaw agent --local` bypasses gateway | Secret scanning, network policy, and inference auth are not enforced when the agent runs in local mode. | A runtime warning is emitted when `--local` is detected. Avoid `--local` for production workflows. A future OpenClaw-level hook will close this gap. | -| Direct filesystem writes bypass secret scanner | The scanner intercepts OpenClaw tool calls, not raw filesystem writes (e.g., `echo secret > file`). | Landlock restricts writable paths. The scanner is application-layer defense-in-depth, not a filesystem-level control. | +| Bypassing managed gateway paths | Network policy and inference auth are not enforced when an agent runtime is launched outside the NemoClaw-managed gateway path. | Use NemoClaw-managed sandbox entrypoints for production workflows. | +| Direct filesystem writes bypass application-layer scanners | Application-layer scanners can intercept agent tool calls, not arbitrary raw filesystem writes (e.g., `echo secret > file`). | Landlock restricts writable paths. Application-layer scanning is defense-in-depth, not a filesystem-level control. | | Base64/hex-encoded secrets are not detected | Content-based regex scanning cannot detect encoded or obfuscated secrets. | Use environment variables or credential stores instead of writing secrets to files. | ## Related Topics @@ -506,6 +576,8 @@ The following patterns weaken security without providing meaningful benefit. - Network Policies (use the `nemoclaw-user-reference` skill) for the full baseline policy reference. - Customize the Network Policy (use the `nemoclaw-user-manage-policy` skill) for static and dynamic policy changes. - Approve or Deny Network Requests (use the `nemoclaw-user-manage-policy` skill) for the operator approval flow. -- Sandbox Hardening (use the `nemoclaw-user-deploy-remote` skill) for container-level security measures. + +- Sandbox Hardening for container-level security measures. + - Inference Options (use the `nemoclaw-user-configure-inference` skill) for provider configuration details. - How It Works (use the `nemoclaw-user-overview` skill) for the protection layer architecture. diff --git a/.agents/skills/nemoclaw-user-configure-security/references/credential-storage.md b/.agents/skills/nemoclaw-user-configure-security/references/credential-storage.md index 24b520cb1d..707cc961d5 100644 --- a/.agents/skills/nemoclaw-user-configure-security/references/credential-storage.md +++ b/.agents/skills/nemoclaw-user-configure-security/references/credential-storage.md @@ -1,31 +1,37 @@ - - # Credential Storage +import { AgentOnly } from "../_components/AgentGuide"; + NemoClaw does not persist provider credentials to host disk. The OpenShell gateway is the only system of record for stored credentials. -When you provide a provider credential — interactively during `nemoclaw onboard` or via an environment variable — NemoClaw holds the value in memory only long enough to register it with the OpenShell gateway through `openshell provider create` or `openshell provider update`. +When you provide a provider credential, either interactively during `nemoclaw onboard` or through an environment variable, NemoClaw holds the value in memory only long enough to register it with the OpenShell gateway through `openshell provider create` or `openshell provider update`. The gateway stores the credential and the OpenShell L7 proxy substitutes it into outbound requests at egress, so sandboxed agents see placeholders instead of the raw secret. + The sandbox-side OpenClaw gateway token is generated at container startup and is not rotated through provider credential commands. + + +Hermes API credentials and provider credentials are managed through the same OpenShell provider boundary; generated Hermes runtime files are recreated during rebuilds. + ## Where Credentials Live Provider credentials live in the OpenShell gateway store. List what is registered with: -```console -$ openshell provider list +```bash +openshell provider list ``` Or, equivalently, through NemoClaw: -```console -$ nemoclaw credentials list +```bash +nemoclaw credentials list ``` -Both surface the provider names that the gateway holds credentials for. The values themselves cannot be read back from the CLI; this is a deliberate property of OpenShell. +Both commands show the provider names registered with the gateway. +The values themselves cannot be read back from the CLI; this is a deliberate property of OpenShell. NemoClaw still keeps non-secret operational state under `~/.nemoclaw/` (such as the sandbox registry). That directory is created with mode `0700` and contains no credential material. @@ -44,8 +50,8 @@ This means you can: `nemoclaw deploy` (which provisions a remote Brev box) cannot read secrets back from the gateway, so it requires every credential to be present in the host environment at invocation time. A typical deploy invocation looks like: -```console -$ NVIDIA_API_KEY=nvapi-... \ +```bash +NVIDIA_API_KEY=nvapi-... \ HF_TOKEN=hf_... \ TELEGRAM_BOT_TOKEN=... \ nemoclaw deploy my-instance @@ -61,7 +67,8 @@ When a private repo requires authentication NemoClaw runs `gh auth token`, which The GitHub CLI prefers an OS keychain when one is reachable: macOS Keychain on macOS, Windows Credential Manager on Windows, and Linux Secret Service (libsecret + a running D-Bus session) on Linux. On hosts where no keychain is reachable (CI runners, headless launches, WSL without a session bus, macOS contexts where Keychain access is blocked, etc.) `gh auth login` falls back to a `gh`-managed file under `~/.config/gh/` with mode `0600`. -NemoClaw treats both backends identically: `gh auth token` returns the value, and NemoClaw stages it in `process.env` for the current run only. +NemoClaw treats both backends identically. +`gh auth token` returns the value, and NemoClaw stages it in `process.env` for the current run only. If `gh` is not installed or not logged in, NemoClaw prompts for a personal access token for that single run; the prompted value is held in process memory and is not written to host disk. Run `gh auth login` if you want a persistent backing store (whichever one applies on your host) so future runs do not prompt. @@ -84,14 +91,14 @@ If `~/.nemoclaw/credentials.json` remains after a rebuild or other credential lo The simplest way to replace a stored value is to rerun onboarding with the new value in your environment: -```console -$ NVIDIA_API_KEY=nvapi-new-value nemoclaw onboard +```bash +NVIDIA_API_KEY=nvapi-new-value nemoclaw onboard ``` To remove a credential from the gateway entirely: -```console -$ nemoclaw credentials reset +```bash +nemoclaw credentials reset ``` `` is the OpenShell provider name (run `nemoclaw credentials list` first if you are not sure). @@ -107,4 +114,4 @@ On the next run NemoClaw prompts again unless the credential is supplied through ## Related Files -For the broader sandbox security model and operational trade-offs, see Security Best Practices (use the `nemoclaw-user-configure-security` skill) and Architecture (use the `nemoclaw-user-reference` skill). +For the broader sandbox security model and operational trade-offs, see [Security Best Practices](best-practices.md) and Architecture (use the `nemoclaw-user-reference` skill). diff --git a/.agents/skills/nemoclaw-user-configure-security/references/openclaw-controls.md b/.agents/skills/nemoclaw-user-configure-security/references/openclaw-controls.md index 155cfbe04b..13e31c8702 100644 --- a/.agents/skills/nemoclaw-user-configure-security/references/openclaw-controls.md +++ b/.agents/skills/nemoclaw-user-configure-security/references/openclaw-controls.md @@ -1,5 +1,3 @@ - - # OpenClaw Security Controls Beyond NemoClaw's Scope NemoClaw provides infrastructure-layer security through sandbox isolation, network policy, filesystem restrictions, SSRF validation, and credential handling. @@ -7,7 +5,7 @@ It delegates all application-layer security to OpenClaw. This page documents areas where NemoClaw adds no independent protection beyond what OpenClaw already provides. The details below reflect the OpenClaw documentation at the time of writing. -Consult the [OpenClaw Security docs](https://docs.openclaw.ai/gateway/security/index) for the current state. +Consult the [OpenClaw Security docs](https://docs.openclaw.ai/gateway/security) for the current state. ## Prompt Injection Detection and Prevention @@ -58,7 +56,7 @@ OpenClaw blocks environment variables that could enable code injection, privileg ## Security Audit Framework -OpenClaw runs automated security checks (50+ distinct check types) that cover configuration, credential handling, and sandbox posture. +OpenClaw runs more than 50 distinct automated security checks that cover configuration, credential handling, and sandbox posture. Run `openclaw security audit` to see all findings for your deployment. These checks include: @@ -94,7 +92,7 @@ OpenClaw controls who can interact with the agent through direct messages and gr | Control | Detail | |---|---| -| DM policy modes | 4 modes: open, disabled, pairing, allowlist | +| DM policy modes | Four modes: open, disabled, pairing, allowlist | | Group policies | Per-group access rules | | Per-sender authorization | Individual sender gating | | Command authorization | Command-level access control | @@ -112,10 +110,10 @@ OpenClaw restricts what supplemental context the agent can see and how it can mo ## Safe Regex (ReDoS Prevention) -OpenClaw includes safe regex compilation to prevent Regular Expression Denial of Service (ReDoS) attacks. +OpenClaw includes safe regex compilation to prevent regular expression denial of service (ReDoS) attacks. The implementation detects unsafe nested quantifiers, bounds input length, and caches results. ## Next Steps -- Security Best Practices (use the `nemoclaw-user-configure-security` skill) for NemoClaw's own security controls and risk framework. -- Credential Storage (use the `nemoclaw-user-configure-security` skill) for how NemoClaw stores and protects provider credentials. +- [Security Best Practices](best-practices.md) for NemoClaw's own security controls and risk framework. +- [Credential Storage](credential-storage.md) for how NemoClaw stores and protects provider credentials. diff --git a/.agents/skills/nemoclaw-user-deploy-remote/SKILL.md b/.agents/skills/nemoclaw-user-deploy-remote/SKILL.md index 60c8b060da..cb0a62547b 100644 --- a/.agents/skills/nemoclaw-user-deploy-remote/SKILL.md +++ b/.agents/skills/nemoclaw-user-deploy-remote/SKILL.md @@ -1,17 +1,15 @@ --- name: "nemoclaw-user-deploy-remote" description: "Explains how to run NemoClaw on a remote GPU instance, including the deprecated Brev compatibility path and the preferred installer plus onboard flow. Use when deploying NemoClaw to a remote VM, onboarding a Brev instance, or migrating away from the legacy `nemoclaw deploy` wrapper. Trigger keywords - deploy nemoclaw remote gpu, nemoclaw brev cloud deployment, nemoclaw plugins, openclaw plugins, install openclaw plugin, nemoclaw onboard from dockerfile, nemoclaw brev web ui, nemoclaw getting started, brev quickstart, nvidia nemotron agent, nemoclaw sandbox hardening, container security, docker capabilities, process limits." +license: "Apache-2.0" --- - - - # Deploy NemoClaw to a Remote GPU Instance ## Gotchas - The `nemoclaw deploy` command is deprecated. -- On Brev, set `CHAT_UI_URL` in the launchable environment configuration so it is available when the installer builds the sandbox image. +- On Brev, set `CHAT_UI_URL` in the launchable environment configuration so the installer can read it when it builds the sandbox image. ## Prerequisites @@ -23,20 +21,6 @@ description: "Explains how to run NemoClaw on a remote GPU instance, including t Run NemoClaw on a remote GPU instance through [Brev](https://brev.nvidia.com). The preferred path is to provision the VM, run the standard NemoClaw installer on that host, and then run `nemoclaw onboard`. -## Quick Start - -If your Brev instance is already up and has already been onboarded with a sandbox, start with the standard sandbox chat flow: - -```console -$ nemoclaw my-assistant connect -$ openclaw tui -``` - -This gets you into the sandbox shell first and opens the OpenClaw chat UI right away. -If the VM is fresh, run the standard installer on that host and then run `nemoclaw onboard` before trying `nemoclaw my-assistant connect`. - -If you are connecting from your local machine and still need to provision the remote VM, you can still use `nemoclaw deploy ` as the legacy compatibility path described below. - ## Deploy the Instance **Warning:** @@ -46,8 +30,8 @@ Prefer provisioning the remote host separately, then running the standard NemoCl Create a Brev instance and run the legacy compatibility flow: -```console -$ nemoclaw deploy +```bash +nemoclaw deploy ``` Replace `` with a name for your remote instance, for example `my-gpu-box`. @@ -60,7 +44,7 @@ The legacy compatibility flow performs the following steps on the VM: 1. Installs Docker and the NVIDIA Container Toolkit if a GPU is present. 2. Installs the OpenShell CLI. 3. Runs `nemoclaw onboard` (the setup wizard) to create the gateway, register providers, and launch the sandbox. -4. Starts optional host auxiliary services (for example the cloudflared tunnel) when `cloudflared` is available. Channel messaging is configured during onboarding and runs through OpenShell-managed processes, not through `nemoclaw tunnel start`. +4. Starts optional host auxiliary services, such as the cloudflared tunnel, when `cloudflared` is available. Onboarding configures channel messaging, and the channels run through OpenShell-managed processes, not through `nemoclaw tunnel start`. By default, the compatibility wrapper asks Brev to provision on `gcp`. Override this with `NEMOCLAW_BREV_PROVIDER` if you need a different Brev cloud provider. If you export `HF_TOKEN` or `HUGGING_FACE_HUB_TOKEN`, the wrapper forwards those values to the VM so remote setup can pull gated Hugging Face model repositories. @@ -70,47 +54,43 @@ If you export `HF_TOKEN` or `HUGGING_FACE_HUB_TOKEN`, the wrapper forwards those After deployment finishes, the deploy command opens an interactive shell inside the remote sandbox. To reconnect after closing the session, run the command again: -```console -$ nemoclaw deploy +```bash +nemoclaw deploy ``` ## Monitor the Remote Sandbox SSH to the instance and run the OpenShell TUI to monitor activity and approve network requests: -```console -$ ssh 'cd ~/nemoclaw && set -a && . .env && set +a && openshell term' +```bash +ssh 'cd ~/nemoclaw && set -a && . .env && set +a && openshell term' ``` ## Verify Inference Run a test agent prompt inside the remote sandbox: -```console -$ openclaw agent --agent main -m "Hello from the remote sandbox" --session-id test +```bash +openclaw agent --agent main -m "Hello from the remote sandbox" --session-id test ``` ## Remote Dashboard Access -The NemoClaw dashboard validates the browser origin against an allowlist baked -into the sandbox image at build time. By default the allowlist only contains -`http://127.0.0.1:18789`. When accessing the dashboard from a remote browser -(for example through a Brev public URL or an SSH port-forward), set -`CHAT_UI_URL` to the origin the browser will use **before** running setup: +The NemoClaw dashboard validates the browser origin against an allowlist baked into the sandbox image at build time. +By default, the allowlist only contains `http://127.0.0.1:18789`. +When you access the dashboard from a remote browser, for example through a Brev public URL or an SSH port-forward, set `CHAT_UI_URL` to the origin the browser uses before running setup: -```console -$ export CHAT_UI_URL="https://openclaw0-.brevlab.com" -$ nemoclaw deploy +```bash +export CHAT_UI_URL="https://openclaw0-.brevlab.com" +nemoclaw deploy ``` -For SSH port-forwarding, the origin is typically `http://127.0.0.1:18789` (the -default), so no extra configuration is needed. +For SSH port-forwarding, the origin is typically the default `http://127.0.0.1:18789`, so you do not need extra configuration. **Warning:** -On Brev, set `CHAT_UI_URL` in the launchable environment configuration so it is -available when the installer builds the sandbox image. If `CHAT_UI_URL` is not -set on a headless host, the compatibility wrapper prints a warning. +On Brev, set `CHAT_UI_URL` in the launchable environment configuration so the installer can read it when it builds the sandbox image. +If you do not set `CHAT_UI_URL` on a headless host, the compatibility wrapper prints a warning. `NEMOCLAW_DISABLE_DEVICE_AUTH` is also evaluated at image build time. When `CHAT_UI_URL` points at a non-loopback origin, NemoClaw disables OpenClaw device pairing in the generated sandbox configuration because browser-only remote users cannot complete terminal-based pairing. @@ -118,37 +98,37 @@ Any device that can reach the configured dashboard origin can connect without pa ## First-Run Readiness Budget -On a remote GPU host, the first `nemoclaw onboard` typically does the slowest work of the lifecycle: the sandbox image is built locally and uploaded into the OpenShell gateway, which can stream hundreds of MiB over the VM's link before the readiness wait even starts. -The post-create readiness wait defaults to 180 seconds (`NEMOCLAW_SANDBOX_READY_TIMEOUT`), which is sized for warm-cache, workstation-class onboarding and can be exceeded on: +On a remote GPU host, the first `nemoclaw onboard` typically does the slowest work of the lifecycle: the host builds the sandbox image locally and uploads it into the OpenShell gateway, which can stream hundreds of MiB over the VM's link before the readiness wait even starts. +The post-create readiness wait defaults to 180 seconds (`NEMOCLAW_SANDBOX_READY_TIMEOUT`), which fits warm-cache, workstation-class onboarding but can be too short for: -- DGX Station first runs with large quantised models (70B+ parameter footprints, NVFP4 weights). +- DGX Station first runs with large quantized models (70B+ parameter footprints, NVFP4 weights). - Cloud VMs where the local image-build cache is cold and the upload runs over the public network. - Hosts onboarding the Brave Web Search preset on the first run (the egress policy stack adds boot work). Raise the budget before re-running onboard: -```console -$ export NEMOCLAW_SANDBOX_READY_TIMEOUT=600 -$ nemoclaw onboard +```bash +export NEMOCLAW_SANDBOX_READY_TIMEOUT=600 +nemoclaw onboard ``` -If onboard ends with `Sandbox '' was created but did not become ready within 180s`, onboard deletes the partially-created sandbox first, so the next attempt with the raised budget starts from a clean state. -For the inference-probe budget that runs earlier in onboarding, see `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` (use the `nemoclaw-user-configure-inference` skill). +If onboard ends with `Sandbox '' was created but did not become ready within 180s`, onboard first deletes the partially created sandbox, so the next attempt with the raised budget starts from a clean state. +For the inference-probe budget that runs earlier in onboarding, refer to `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` (use the `nemoclaw-user-configure-inference` skill). ## Proxy Configuration NemoClaw routes sandbox traffic through a gateway proxy that defaults to `10.200.0.1:3128`. If your network requires a different proxy, set `NEMOCLAW_PROXY_HOST` and `NEMOCLAW_PROXY_PORT` before onboarding: -```console -$ export NEMOCLAW_PROXY_HOST=proxy.example.com -$ export NEMOCLAW_PROXY_PORT=8080 -$ nemoclaw onboard +```bash +export NEMOCLAW_PROXY_HOST=proxy.example.com +export NEMOCLAW_PROXY_PORT=8080 +nemoclaw onboard ``` -These values are baked into the sandbox image at build time. -They are also forwarded into the runtime container during sandbox creation, so `/tmp/nemoclaw-proxy-env.sh` uses the same host and port that the image build used. -Only alphanumeric characters, dots, hyphens, and colons are accepted for the host. +NemoClaw bakes these values into the sandbox image at build time. +NemoClaw also forwards them into the runtime container during sandbox creation, so `/tmp/nemoclaw-proxy-env.sh` uses the same host and port that the image build used. +NemoClaw accepts only alphanumeric characters, dots, hyphens, and colons for the host. The port must be numeric (0-65535). Changing the proxy after onboarding requires re-running `nemoclaw onboard`. @@ -158,9 +138,9 @@ The deploy script uses the `NEMOCLAW_GPU` environment variable to select the GPU The default value is `a2-highgpu-1g:nvidia-tesla-a100:1`. Set this variable before running `nemoclaw deploy` to use a different GPU configuration: -```console -$ export NEMOCLAW_GPU="a2-highgpu-1g:nvidia-tesla-a100:2" -$ nemoclaw deploy +```bash +export NEMOCLAW_GPU="a2-highgpu-1g:nvidia-tesla-a100:2" +nemoclaw deploy ``` ## References @@ -173,4 +153,4 @@ $ nemoclaw deploy - `nemoclaw-user-manage-sandboxes` — Set Up Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill) to connect Telegram, Discord, or Slack through OpenShell-managed channel messaging - `nemoclaw-user-monitor-sandbox` — Monitor Sandbox Activity (use the `nemoclaw-user-monitor-sandbox` skill) for sandbox monitoring tools -- `nemoclaw-user-reference` — Commands (use the `nemoclaw-user-reference` skill) for the full `deploy` command reference +- `nemoclaw-user-reference` — `nemoclaw deploy` (use the `nemoclaw-user-reference` skill) for the full `deploy` command reference diff --git a/.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json b/.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json new file mode 100644 index 0000000000..3238159be1 --- /dev/null +++ b/.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-deployment-deploy-to-remote-gpu-001", + "question": "I'm deploying NemoClaw to a remote GPU instance. Help me move the sandboxed assistant off my local machine so I can support persistent or GPU-backed operation.", + "expected_skill": "nemoclaw-user-deploy-remote", + "ground_truth": "A NemoClaw-specific answer that helps the user move the sandboxed assistant off my local machine and gives enough concrete guidance, decision criteria, verification steps, or risk framing to support persistent or GPU-backed operation.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md b/.agents/skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md index 502e2f3143..f0059b3c54 100644 --- a/.agents/skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md +++ b/.agents/skills/nemoclaw-user-deploy-remote/references/brev-web-ui.md @@ -1,5 +1,3 @@ - - # Launch NemoClaw with the Brev Web UI Use the Brev web UI to launch a hosted NemoClaw sandbox from your browser. @@ -8,7 +6,7 @@ This flow provisions a remote VM, configures inference, starts OpenClaw inside a **Note:** Use this guide when you want to try NemoClaw without installing the CLI or using a local GPU. -If you want to manage the remote host from a terminal, see Deploy to a Remote GPU Instance (use the `nemoclaw-user-deploy-remote` skill). +If you want to manage the remote host from a terminal, see [Deploy to a Remote GPU Instance](../SKILL.md). ## What This Flow Creates @@ -29,7 +27,8 @@ You do not need to install local software for this flow. ## Get Your NVIDIA API Key -If you already have an NVIDIA API key skip this section. Otherwise, follow these steps to generate a new key: +If you already have an NVIDIA API key, skip this section. +Otherwise, follow these steps to generate a new key: 1. Go to [build.nvidia.com](https://build.nvidia.com). 2. Sign in or create an account. @@ -48,7 +47,7 @@ Use the [NemoClaw Brev launchable](https://brev.nvidia.com/launchable/deploy/now 2. Review the instance type, cloud provider, and estimated hourly cost on the NemoClaw setup page. 3. Click **Deploy NemoClaw**. -The right-side deployment panel shows progress while Brev deploys the CPU instance and prepares VM mode. +The deployment panel on the right shows progress while Brev deploys the CPU instance and prepares VM mode. Keep this page open until the deployment completes. When the panel shows the **NemoClaw** button, click it to open the agent setup page. @@ -98,7 +97,8 @@ Click **Chat With Agent** to open the OpenClaw dashboard. The dashboard might initially show a **Pairing required** warning. This means the gateway is still completing pairing in the background. -Wait for about a few minutes for pairing to finish automatically. Refresh the dashboard to see if the warning is resolved and the connection is established. +Wait a few minutes for pairing to finish automatically. +Refresh the dashboard to check whether the warning has cleared and the dashboard has connected. If pairing does not finish, go to the **Overview** page in the OpenClaw UI, find the **Gateway Access** panel, and click **Connect**. ## Start a Chat @@ -110,7 +110,7 @@ Hello! What can you do for me? What skills do you have available? ``` The agent reads its workspace files and introduces itself. -The starter workspace includes example skills such as: +The starter workspace includes these example skills: - **Weather** gets current weather and forecasts. - **Healthcheck** runs security audit and hardening checks. @@ -151,5 +151,5 @@ After your agent is running, explore these related tasks: - Set Up Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill) to learn how to connect Telegram, Slack, or Discord. - Switch Inference Providers (use the `nemoclaw-user-configure-inference` skill) to learn how to change the model provider after setup. - Monitor Sandbox Activity (use the `nemoclaw-user-monitor-sandbox` skill) to learn how to inspect sandbox health and logs. -- Deploy to a Remote GPU Instance (use the `nemoclaw-user-deploy-remote` skill) to learn how to deploy NemoClaw to a remote GPU instance using the CLI. +- [Deploy to a Remote GPU Instance](../SKILL.md) to learn how to deploy NemoClaw to a remote GPU instance using the CLI. - Troubleshooting (use the `nemoclaw-user-reference` skill) to learn how to fix common setup and runtime issues. diff --git a/.agents/skills/nemoclaw-user-deploy-remote/references/install-openclaw-plugins.md b/.agents/skills/nemoclaw-user-deploy-remote/references/install-openclaw-plugins.md index cc4edd05ac..4094f9a924 100644 --- a/.agents/skills/nemoclaw-user-deploy-remote/references/install-openclaw-plugins.md +++ b/.agents/skills/nemoclaw-user-deploy-remote/references/install-openclaw-plugins.md @@ -1,22 +1,18 @@ - - # Install OpenClaw Plugins -OpenClaw plugins extend the OpenClaw runtime with hooks, services, tools, or -provider integrations. They are different from NemoClaw-managed agent skills: +OpenClaw plugins extend the OpenClaw runtime with hooks, services, tools, or provider integrations. +They are different from NemoClaw-managed agent skills: - **Plugins** are code packages loaded by OpenClaw. - **Skills** are `SKILL.md` directories that teach an agent how to perform a task. - **Policy presets** are network-egress rules that control what sandboxed code can reach. -Today, the supported NemoClaw path for OpenClaw plugins is to bake the plugin -into a custom sandbox image and onboard from that Dockerfile. +The supported NemoClaw path for OpenClaw plugins is to bake the plugin into a custom sandbox image and onboard from that Dockerfile. ## Prepare a Build Directory Put the Dockerfile and everything it needs to `COPY` in one directory. -`nemoclaw onboard --from ` uses the Dockerfile's parent directory as -the Docker build context. +`nemoclaw onboard --from ` uses the Dockerfile's parent directory as the Docker build context. ```text my-plugin-sandbox/ @@ -28,8 +24,7 @@ my-plugin-sandbox/ ## Example Dockerfile -Use the custom image to copy the plugin into the OpenClaw extensions directory -and let OpenClaw refresh its config before NemoClaw starts the sandbox. +Use the custom image to copy the plugin into the OpenClaw extensions directory and let OpenClaw refresh its config before NemoClaw starts the sandbox. ```dockerfile ARG SANDBOX_BASE=ghcr.io/nvidia/nemoclaw/sandbox-base:latest @@ -46,48 +41,37 @@ RUN mkdir -p /sandbox/.openclaw/extensions \ WORKDIR /opt/nemoclaw ``` -If the plugin needs configuration in `openclaw.json`, apply it after -`openclaw doctor --fix` so the base config exists first. +If the plugin needs configuration in `openclaw.json`, apply it after `openclaw doctor --fix` so the base config exists first. ## Create the Sandbox Point `nemoclaw onboard --from` at the Dockerfile in the build directory. -```console -$ nemoclaw onboard --from ./my-plugin-sandbox/Dockerfile +```bash +nemoclaw onboard --from ./my-plugin-sandbox/Dockerfile ``` -If you need a second sandbox alongside an existing one, use a dedicated build -directory and rerun onboarding with the sandbox name and ports you intend to -use. +If you need a second sandbox alongside an existing one, use a dedicated build directory and rerun onboarding with the sandbox name and ports you intend to use. ## Network Access -Plugins still run inside the sandbox policy boundary. If a plugin needs network -egress, add or update a policy preset for the required hostnames and binaries -before rebuilding the sandbox. +Plugins still run inside the sandbox policy boundary. +If a plugin needs network egress, add or update a policy preset for the required hostnames and binaries before rebuilding the sandbox. -For example, see Network Policies (use the `nemoclaw-user-reference` skill) for -policy concepts and Customize Network Policy (use the `nemoclaw-user-manage-policy` skill) -for custom preset workflows. +For policy concepts, refer to Network Policies (use the `nemoclaw-user-reference` skill). +For custom preset workflows, refer to Customize Network Policy (use the `nemoclaw-user-manage-policy` skill). ## Common Mistakes -These are the most common places where plugin installation gets mixed up with -other NemoClaw extension paths. +These are the most common places where plugin installation gets mixed up with other NemoClaw extension paths. -- Do not use `nemoclaw skill install` for OpenClaw plugins. That - command only installs `SKILL.md` agent skills. -- Do not put a Dockerfile in a broad directory such as `/tmp` unless you intend - to send that whole directory as the Docker build context. +- Do not use `nemoclaw skill install` for OpenClaw plugins. That command only installs `SKILL.md` agent skills. +- Do not put a Dockerfile in a broad directory such as `/tmp` unless you intend to send that whole directory as the Docker build context. - Keep plugin dependencies in the build stage or plugin directory; avoid copying unrelated host files into the sandbox image. ## Next Steps -- Review Sandbox Hardening (use the `nemoclaw-user-deploy-remote` skill) before adding plugin code to a - shared or long-lived sandbox. -- Review Network Policies (use the `nemoclaw-user-reference` skill) to plan plugin - egress rules. -- Follow Customize Network Policy (use the `nemoclaw-user-manage-policy` skill) - if the plugin needs a custom preset. +- Review [Sandbox Hardening](sandbox-hardening.md) before adding plugin code to a shared or long-lived sandbox. +- Review Network Policies (use the `nemoclaw-user-reference` skill) to plan plugin egress rules. +- Follow Customize Network Policy (use the `nemoclaw-user-manage-policy` skill) if the plugin needs a custom preset. diff --git a/.agents/skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md b/.agents/skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md index 669096f180..11937ecd8f 100644 --- a/.agents/skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md +++ b/.agents/skills/nemoclaw-user-deploy-remote/references/sandbox-hardening.md @@ -1,50 +1,45 @@ - - # Sandbox Image Hardening -The NemoClaw sandbox image applies several security measures to reduce attack -surface and limit the blast radius of untrusted workloads. +The NemoClaw sandbox image applies several security measures to reduce attack surface and limit the blast radius of untrusted workloads. ## Removed Unnecessary Tools -Build toolchains (`gcc`, `g++`, `make`) and network probes (`netcat`) are -explicitly purged from the runtime image. These tools are not needed at runtime -and would unnecessarily widen the attack surface. +NemoClaw explicitly purges build toolchains (`gcc`, `g++`, `make`) and network probes (`netcat`) from the runtime image. +These tools are not needed at runtime and would unnecessarily widen the attack surface. -The runtime image keeps a small set of operational utilities for normal sandbox -workflows, including `vi`, `jq`, and `dos2unix`. Use these for lightweight -inspection and file cleanup inside the sandbox, but make durable image or policy -changes in the NemoClaw source tree and rebuild the sandbox. +The runtime image keeps a small set of operational utilities for normal sandbox workflows, including `vi`, `jq`, and `dos2unix`. +Use these utilities for lightweight inspection and file cleanup inside the sandbox, but make durable image or policy changes in the NemoClaw source tree and rebuild the sandbox. -If you need a compiler during build, use the existing multi-stage build -(the `builder` stage has full Node.js tooling) and copy only artifacts into the -runtime stage. +If you need a compiler during build, use the existing multi-stage build. +The `builder` stage has full Node.js tooling. +Copy only artifacts into the runtime stage. ## Process Limits -The container ENTRYPOINT sets `ulimit -u 512` to cap the number of processes -a sandbox user can spawn. This mitigates fork-bomb attacks. The startup script -(`nemoclaw-start.sh`) applies the same limit. +The container ENTRYPOINT sets `ulimit -u 512` to cap the number of processes a sandbox user can spawn. +This mitigates fork-bomb attacks. +The startup script (`nemoclaw-start.sh`) applies the same limit. -Adjust the value via the `--ulimit nproc=512:512` flag if launching with -`docker run` directly. +Adjust the value with the `--ulimit nproc=512:512` flag if you launch with `docker run` directly. ## Dropping Linux Capabilities -The NemoClaw entrypoint drops dangerous capabilities from the process bounding -set before it starts agent services. +The NemoClaw entrypoint drops dangerous capabilities from the process bounding set before it starts agent services. It removes `CAP_SYS_ADMIN`, `CAP_SYS_PTRACE`, `CAP_NET_RAW`, `CAP_DAC_OVERRIDE`, `CAP_SYS_CHROOT`, `CAP_FSETID`, `CAP_SETFCAP`, `CAP_MKNOD`, `CAP_AUDIT_WRITE`, and `CAP_NET_BIND_SERVICE`. -When `setpriv` is available, the entrypoint also removes the remaining -privilege-separation capabilities during the switch from root to the -`sandbox` and `gateway` users. +When `setpriv` is available, the entrypoint also removes the remaining privilege-separation capabilities during the switch from root to the `sandbox` and `gateway` users. + +The bounding-set drop is best effort: if `capsh` or `CAP_SETPCAP` is unavailable the entrypoint logs a warning and continues with the runtime-provided capability set. +If `setpriv` is unavailable, the entrypoint falls back to `gosu`. +To make the drop fail-closed instead, set `NEMOCLAW_REQUIRE_CAP_DROP=1` in the entrypoint environment: the agent then refuses to start unless the agent process tree's bounding set is verified free of the dangerous capabilities. +This is opt-in because hosts that cannot drop capabilities (no `CAP_SETPCAP` — many cloud VMs, Docker Desktop, WSL) are common, and the check covers the agent process tree only. For defense-in-depth, also drop all Linux capabilities at the container runtime when you launch the image directly: -```console -$ docker run --rm \ +```bash +docker run --rm \ --cap-drop=ALL \ --ulimit nproc=512:512 \ nemoclaw-sandbox @@ -83,11 +78,15 @@ The agent's home directory (`/sandbox`) is writable by default: | Path | Access | Purpose | |------|--------|---------| -| `/sandbox` | read-write | Home directory — agents can create files and use standard home paths | +| `/sandbox` | read-write | Home directory where agents can create files and use standard home paths | | `/sandbox/.openclaw` | read-write | Agent config, state, workspace, plugins | -| `/sandbox/.nemoclaw` | read-write | Plugin state and config; blueprints within are DAC-protected (root-owned) | +| `/sandbox/.nemoclaw` | read-write (Landlock); DAC-restricted | Parent directory is `root:root` mode `1755`; the sandbox user can write only to `state/`, `migration/`, `snapshots/`, `staging/`, and `config.json`. `blueprints/` and the parent itself are root-owned to prevent tampering. | | `/tmp` | read-write | Temporary files and logs | +The `Access` column reflects the Landlock policy declaration only. +Actual write success additionally requires POSIX (DAC) ownership and permissions to allow it. +For example, Landlock lists `/sandbox/.nemoclaw` as writable, but the sandbox user cannot create files directly under it because the parent directory is root-owned; writes must target the sandbox-owned subdirectories listed above. + This writable default is intentional. Seeing the sandbox user create files under `/sandbox` or `/sandbox/.openclaw` in a fresh sandbox does not mean Landlock failed. Landlock still enforces the fixed read-only system paths below. @@ -99,7 +98,7 @@ System paths remain read-only to prevent agents from: - Tampering with libraries or shell configuration outside `/sandbox` The image build pre-creates locked shell init files `.bashrc` and `.profile` without proxy entries. -Runtime proxy configuration is sourced from system-wide shell hooks that read `/tmp/nemoclaw-proxy-env.sh`. +System-wide shell hooks that read `/tmp/nemoclaw-proxy-env.sh` source the runtime proxy configuration. ### Landlock Kernel Requirements @@ -111,8 +110,8 @@ Files outside the writable paths would be inaccessible to the agent regardless o Operators should verify Landlock availability: -```console -$ ls /sys/kernel/security/landlock +```bash +ls /sys/kernel/security/landlock ``` For production deployments, kernel 5.13+ with Landlock enabled is strongly recommended. diff --git a/.agents/skills/nemoclaw-user-get-started/SKILL.md b/.agents/skills/nemoclaw-user-get-started/SKILL.md index eec110b31a..0cc6f4005b 100644 --- a/.agents/skills/nemoclaw-user-get-started/SKILL.md +++ b/.agents/skills/nemoclaw-user-get-started/SKILL.md @@ -1,20 +1,24 @@ --- name: "nemoclaw-user-get-started" description: "Installs NemoClaw, launches a sandbox, and runs the first agent prompt. Use when onboarding, installing, or launching a NemoClaw sandbox for the first time. Trigger keywords - nemoclaw quickstart, install nemoclaw openclaw sandbox, nemohermes quickstart, hermes agent nemoclaw, run hermes openshell sandbox, nemoclaw prerequisites, nemoclaw supported platforms, nemoclaw hardware software, nemoclaw windows wsl2 setup, nemoclaw install windows docker desktop." +license: "Apache-2.0" --- - - - # NemoClaw Quickstart with OpenClaw Follow these steps to get started with NemoClaw and your first sandboxed OpenClaw agent. **Note:** -Make sure you have completed reviewing the Prerequisites (use the `nemoclaw-user-get-started` skill) before following this guide. +Review the [Prerequisites](references/prerequisites.md) before following this guide. + +**Use Agent Skills:** + +NemoClaw ships user skills for AI coding assistants. +Load them when you want your assistant to walk through installation, inference choices, policy approvals, monitoring, or troubleshooting with NemoClaw-specific guidance. +Refer to Agent Skills (use the `nemoclaw-user-agent-skills` skill). -## Install NemoClaw and Onboard OpenClaw Agent +## Install NemoClaw and Onboard an OpenClaw Agent Download and run the installer script. The script installs Node.js if it is not already present, then runs the guided onboard wizard to create a sandbox, configure inference, and apply security policies. @@ -27,34 +31,51 @@ NemoClaw creates a fresh OpenClaw instance inside the sandbox during the onboard curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash ``` -The piped installer prompts through your terminal. In headless scripts or CI, -pass explicit acceptance to the `bash` side of the pipe: +The third-party software notice runs before the installer installs Node.js or the NemoClaw CLI. +The piped installer can prompt through your terminal when a TTY is available. +In non-TTY contexts, such as CI, an SSH command with piped stdin, or a shell script, pass explicit acceptance to the `bash` side of the pipe: -```console -$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 bash +```bash +curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 bash ``` +Or pass the installer flag through `bash -s`: + +```bash +curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash -s -- --yes-i-accept-third-party-software +``` + +To run both installation and onboarding without prompts, also set non-interactive mode and the provider variables your chosen inference path requires: + +```bash +curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 bash +``` + +Do not place `NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1` before `curl`. +In `NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 curl ... | bash`, the variable applies only to `curl`, so the installer process cannot see the acceptance. + If you use nvm or fnm to manage Node.js, the installer might not update your current shell's PATH. If `nemoclaw` is not found after install, run `source ~/.bashrc` (or `source ~/.zshrc` for zsh) or open a new terminal. On Linux, the installer checks Docker before it installs NemoClaw. If Docker is missing, the installer downloads the official Docker convenience script, asks for `sudo`, installs Docker, and starts the Docker service when systemd is available. -If Docker is installed but your current shell cannot use the Docker socket yet, the installer adds your user to the `docker` group when needed and exits with a recovery command. +If you installed Docker but your current shell cannot use the Docker socket yet, the installer adds your user to the `docker` group when needed and exits with a recovery command. On macOS, the installer uses the Docker-driver OpenShell gateway path with Docker Desktop or Colima. -```console -$ newgrp docker -$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash +```bash +newgrp docker +curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash ``` On DGX Spark, DGX Station, and Windows WSL, an interactive installer offers express install after you accept the third-party software notice. Express install switches onboarding to non-interactive mode, allows `sudo` password prompts for required host changes, and selects the managed local inference path for that platform. Unless `NEMOCLAW_POLICY_TIER` is set, it applies sandbox policy in `suggested` mode with the `balanced` tier by default, using the base sandbox policy plus supported package, model, web-search, and local-inference presets. +On DGX Spark, express install uses `my-spark-assistant` as the sandbox name unless `NEMOCLAW_SANDBOX_NAME` is already set. On WSL, express install selects the Windows-host Ollama setup path. Set `NEMOCLAW_NO_EXPRESS=1` to skip the express prompt, or set `NEMOCLAW_PROVIDER` before launching the installer when you want to choose a provider yourself. -The installer auto-launches `nemoclaw onboard` when it can locate the freshly-installed binary. +The installer auto-launches `nemoclaw onboard` when it can locate the freshly installed binary. If it cannot locate the binary, or if blocking host preflight checks fail, it does not launch the wizard automatically. In that case, the installer prints the relevant diagnostics and a `To finish setup, run:` block with the explicit `nemoclaw onboard` command. @@ -66,7 +87,9 @@ If you export `NEMOCLAW_DISABLE_DEVICE_AUTH` after onboarding finishes, it has n ### Respond to the Onboard Wizard -After the installer launches `nemoclaw onboard`, the wizard runs preflight checks, starts or reuses the OpenShell gateway, and asks for an inference provider, sandbox name, optional web search, optional messaging channels, and network policy presets. +After the installer launches `nemoclaw onboard`, the wizard runs preflight checks, starts or reuses the OpenShell gateway, asks for an inference provider and model, collects any required credential, then asks for the sandbox name. +It prints a review summary before it registers the provider with OpenShell. +After you confirm, NemoClaw registers inference, prompts for optional web search and messaging channels, builds and starts the sandbox, sets up OpenClaw, then applies the selected network policy tier and presets. At any prompt, press Enter to accept the default shown in `[brackets]`, type `back` to return to the previous prompt, or type `exit` to quit. If existing sandbox sessions are running, the installer warns before onboarding because the setup can rebuild or upgrade sandboxes after the new sandbox launches. @@ -87,7 +110,7 @@ The inference provider prompt presents a numbered list. Pick the option that matches where you want inference traffic to go, then expand the matching helper below for the follow-up prompts and the API key environment variable to set. For the full list of providers and validation behavior, refer to Inference Options (use the `nemoclaw-user-configure-inference` skill). Local Ollama appears when NemoClaw detects a usable local Ollama path or can offer an install or start action for your platform. -The Model Router option appears when the blueprint router profile is enabled. +A configured blueprint router profile makes the Model Router option appear. **Tip:** @@ -95,138 +118,25 @@ Export the API key before launching the installer so the wizard does not have to For example, run `export NVIDIA_API_KEY=` before `curl ... | bash`. If you entered a key incorrectly, refer to Reset a Stored Credential (use the `nemoclaw-user-manage-sandboxes` skill) to clear and re-enter it. -**Option 1: NVIDIA Endpoints:** - -Routes inference to models hosted on [build.nvidia.com](https://build.nvidia.com). - -Use `NVIDIA_API_KEY` for the API key. Get one from the [NVIDIA build API keys page](https://build.nvidia.com/settings/api-keys). - -Respond to the wizard as follows. - -1. At the `Choose [1]:` prompt, press Enter (or type `1`) to select **NVIDIA Endpoints**. -2. At the `NVIDIA_API_KEY:` prompt, paste your key if it is not already exported. -3. At the `Choose model [1]:` prompt, pick a curated model from the list (for example, `Nemotron 3 Super 120B`, `GLM-5`, `MiniMax M2.7`, `GPT-OSS 120B`, or `DeepSeek V4 Pro`), or pick `Other...` to enter any model ID from the [NVIDIA Endpoints catalog](https://build.nvidia.com). - -NemoClaw validates the model against the catalog API before creating the sandbox. - -**Tip:** - -Use this option for Nemotron and other models hosted on `build.nvidia.com`. If you run NVIDIA Nemotron from a self-hosted NIM, an enterprise gateway, or any other endpoint, choose **Option 3** instead, since all Nemotron models expose OpenAI-compatible APIs. - -**Option 2: OpenAI:** - -Routes inference to the OpenAI API at `https://api.openai.com/v1`. - -Use `OPENAI_API_KEY` for the API key. Get one from the [OpenAI API keys page](https://platform.openai.com/api-keys). - -Respond to the wizard as follows. - -1. At the `Choose [1]:` prompt, type `2` to select **OpenAI**. -2. At the `OPENAI_API_KEY:` prompt, paste your key if it is not already exported. -3. At the `Choose model [1]:` prompt, pick a curated model (for example, `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano`, or `gpt-5.4-pro-2026-03-05`), or pick **Other...** to enter any OpenAI model ID. - -**Option 3: Other OpenAI-Compatible Endpoint:** - -Routes inference to any server that implements `/v1/chat/completions`, including OpenRouter, LocalAI, llama.cpp, vLLM behind a proxy, and any compatible gateway. - -Use `COMPATIBLE_API_KEY` for the API key. Set it to whatever credential your endpoint expects. If your endpoint does not require auth, use any non-empty placeholder. - -Respond to the wizard as follows. - -1. At the `Choose [1]:` prompt, type `3` to select **Other OpenAI-compatible endpoint**. -2. At the `OpenAI-compatible base URL` prompt, enter the provider's base URL. Find the exact value in your provider's API documentation. NemoClaw appends `/v1` automatically, so leave that suffix off. -3. At the `COMPATIBLE_API_KEY:` prompt, paste your key if it is not already exported. -4. At the `Other OpenAI-compatible endpoint model []:` prompt, enter the model ID exactly as it appears in your provider's model catalog. - -For example, when you use NVIDIA's OpenAI-compatible inference endpoint, enter `https://inference-api.nvidia.com` as the base URL and the model ID your endpoint exposes, such as `openai/openai/gpt-5.5`. - -NemoClaw sends a real inference request to validate the endpoint and model. -If the endpoint does not return the streaming events OpenClaw needs from the Responses API, NemoClaw falls back to the chat completions API and configures OpenClaw to use `openai-completions`. - -**Tip:** - -NVIDIA Nemotron models expose OpenAI-compatible APIs, so this option is the right choice for any Nemotron deployment that does not live on `build.nvidia.com`. Common examples include a self-hosted NIM container, an enterprise NVIDIA AI Enterprise gateway, or a vLLM/SGLang server running Nemotron weights. Point the base URL at your endpoint and enter the Nemotron model ID exactly as your server reports it. - -**Option 4: Anthropic:** - -Routes inference to the Anthropic Messages API at `https://api.anthropic.com`. - -Use `ANTHROPIC_API_KEY` for the API key. Get one from the [Anthropic console keys page](https://console.anthropic.com/settings/keys). - -Respond to the wizard as follows. - -1. At the `Choose [1]:` prompt, type `4` to select **Anthropic**. -2. At the `ANTHROPIC_API_KEY:` prompt, paste your key if it is not already exported. -3. At the `Choose model [1]:` prompt, pick a curated model (for example, `claude-sonnet-4-6`, `claude-haiku-4-5`, or `claude-opus-4-6`), or pick **Other...** to enter any Claude model ID. - -**Option 5: Other Anthropic-Compatible Endpoint:** - -Routes inference to any server that implements the Anthropic Messages API at `/v1/messages`, including Claude proxies, Bedrock-compatible gateways, and self-hosted Anthropic-compatible servers. - -Use `COMPATIBLE_ANTHROPIC_API_KEY` for the API key. Set it to whatever credential your endpoint expects. - -Respond to the wizard as follows. - -1. At the `Choose [1]:` prompt, type `5` to select **Other Anthropic-compatible endpoint**. -2. At the `Anthropic-compatible base URL` prompt, enter the proxy or gateway's base URL from its documentation. -3. At the `COMPATIBLE_ANTHROPIC_API_KEY:` prompt, paste your key if it is not already exported. -4. At the `Other Anthropic-compatible endpoint model []:` prompt, enter the model ID exactly as it appears in your gateway's model catalog. - -**Option 6: Google Gemini:** - -Routes inference to Google's OpenAI-compatible Gemini endpoint at `https://generativelanguage.googleapis.com/v1beta/openai/`. - -Use `GEMINI_API_KEY` for the API key. Get one from [Google AI Studio API keys](https://aistudio.google.com/app/apikey). - -Respond to the wizard as follows. - -1. At the `Choose [1]:` prompt, type `6` to select **Google Gemini**. -2. At the `GEMINI_API_KEY:` prompt, paste your key if it is not already exported. -3. At the `Choose model [5]:` prompt, pick a curated model (for example, `gemini-3.1-pro-preview`, `gemini-3.1-flash-lite-preview`, `gemini-3-flash-preview`, `gemini-2.5-pro`, `gemini-2.5-flash`, or `gemini-2.5-flash-lite`), or pick **Other...** to enter any Gemini model ID. - -**Option 7: Local Ollama:** - -Routes inference to a local Ollama instance. Depending on your platform, the wizard can use an existing daemon, start an installed daemon, or offer an install action. - -No API key is required. On non-WSL hosts, NemoClaw generates a token and starts an authenticated proxy so containers can reach Ollama without exposing the daemon directly to your network. -On WSL, NemoClaw can also use Ollama on the Windows host through `host.docker.internal`. - -Respond to the wizard as follows. - -1. At the `Choose [1]:` prompt, type `7` to select **Local Ollama**. -2. At the `Choose model [1]:` prompt, pick from **Ollama models** if any are already installed. If none are installed, pick a **starter model** to pull and load now, or pick **Other...** to enter any Ollama model ID. - -For setup details, including GPU recommendations and starter model choices, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). - -**Option 8: Model Router:** - -Starts a host-side model router and routes sandbox inference through OpenShell to that router. -The router chooses from the model pool in `nemoclaw-blueprint/router/pool-config.yaml` for each request. - -Use `NVIDIA_API_KEY` for the model pool credentials. - -Respond to the wizard as follows. - -1. At the `Choose [1]:` prompt, type `8` to select **Model Router (experimental)**. -2. At the `NVIDIA_API_KEY:` prompt, paste your key if it is not already exported. -3. Review the configuration summary and continue with the sandbox build. - -For scripted setup, set: - -```console -$ NEMOCLAW_PROVIDER=routed NVIDIA_API_KEY= nemoclaw onboard --non-interactive -``` - -The router listens on the host at port `4000`. -The sandbox still calls `https://inference.local/v1`, so do not point in-sandbox tools at the host router port directly. +### Choose an Inference Provider -**Local NIM and Local vLLM:** +Pick the option that matches where you want inference traffic to go. +For full provider behavior, curated models, validation details, and local-runtime setup notes, refer to Inference Options (use the `nemoclaw-user-configure-inference` skill). +For Ollama, vLLM, NIM, and compatible local servers, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). -- **Local NVIDIA NIM** appears when `NEMOCLAW_EXPERIMENTAL=1` is set and the host has a NIM-capable GPU. NemoClaw pulls and manages a NIM container. -- **Local vLLM (already running)** appears whenever NemoClaw detects a vLLM server on `localhost:8000`. No flag is required for the menu entry. NemoClaw auto-detects the loaded model. -- **Local vLLM (managed install/start)** appears by default on DGX Spark and DGX Station. Generic Linux NVIDIA GPU hosts require `NEMOCLAW_EXPERIMENTAL=1` or `NEMOCLAW_PROVIDER=install-vllm`. NemoClaw pulls and starts a vLLM container on supported hosts. +| Option | Use when | Credential variable | +|---|---|---| +| NVIDIA Endpoints | You want hosted models from `build.nvidia.com`, including hosted Nemotron models. | `NVIDIA_API_KEY` | +| OpenAI | You want the OpenAI API at `https://api.openai.com/v1`. | `OPENAI_API_KEY` | +| Other OpenAI-compatible endpoint | You have OpenRouter, LocalAI, llama.cpp, vLLM, NIM, SGLang, an enterprise gateway, or another `/v1/chat/completions` endpoint. | `COMPATIBLE_API_KEY` | +| Anthropic | You want the Anthropic Messages API. | `ANTHROPIC_API_KEY` | +| Other Anthropic-compatible endpoint | You have a Claude proxy, Bedrock-compatible gateway, or self-hosted `/v1/messages` endpoint. | `COMPATIBLE_ANTHROPIC_API_KEY` | +| Google Gemini | You want Google's OpenAI-compatible Gemini endpoint. | `GEMINI_API_KEY` | +| Local Ollama | You want a host-local Ollama model. | None | +| Model Router | You want NemoClaw to start the host-side model router. | `NVIDIA_API_KEY` | -For setup, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). +Export the relevant key before launching the installer when possible. +If your compatible endpoint does not require authentication, set its credential variable to any non-empty placeholder. ### Review the Configuration Before the Sandbox Build @@ -239,8 +149,9 @@ For example, if you picked an OpenAI-compatible endpoint, the summary looks like ────────────────────────────────────────────────── Provider: compatible-endpoint Model: openai/openai/gpt-5.5 - API key: COMPATIBLE_API_KEY (staged for OpenShell gateway registration) + API key: configured for OpenShell gateway registration Web search: disabled + Managed tools: none Messaging: none Sandbox name: my-gpt-claw Note: Sandbox build typically takes 5–15 minutes on this host. @@ -249,7 +160,7 @@ For example, if you picked an OpenAI-compatible endpoint, the summary looks like Apply this configuration? [Y/n]: ``` -The default is `Y`, so you can press Enter once to continue. Answer `n` to abort cleanly, fix the entries, and re-run `nemoclaw onboard`. +The default is `Y`, so you can press Enter one time to continue. Answer `n` to abort cleanly, fix the entries, and re-run `nemoclaw onboard`. Non-interactive runs (`NEMOCLAW_NON_INTERACTIVE=1`) print the summary for log clarity but skip the prompt. @@ -261,6 +172,7 @@ If you enable it, enter a Brave Search API key when prompted. The wizard also offers messaging channels such as Telegram, Discord, Slack, WeChat, and WhatsApp. Press a channel number to toggle it, then press Enter to continue. +If you leave all channels unselected, pressing Enter skips messaging setup. If you select a channel, NemoClaw validates the token format before it bakes the channel configuration into the sandbox. For example, Slack bot tokens must start with `xoxb-`. WeChat and WhatsApp are experimental. @@ -269,6 +181,7 @@ Review Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill) befor ### Choose Network Policy Presets After the sandbox image builds and OpenClaw starts inside the sandbox, NemoClaw asks which network policy tier to apply. +Web search and messaging selections happen before this point so the sandbox image and the policy suggestions stay aligned. The default **Balanced** tier includes common development presets such as npm, PyPI, Hugging Face, Homebrew, and Brave Search when the selected agent supports web search. Use the arrow keys or `j` and `k` to move, Space to select, and Enter to confirm. @@ -277,7 +190,7 @@ Press `r` to toggle a selected preset between read-only and read-write when the When the install completes, a summary confirms the running environment. Before printing the summary, NemoClaw verifies that the sandbox gateway and dashboard port forward are reachable. -Inference route and messaging bridge checks are reported as warnings when they need more time or additional configuration. +NemoClaw reports inference route and messaging bridge checks as warnings when they need more time or additional configuration. The `Model` and provider line reflects the inference option you picked during onboarding. The example below shows the result if you picked an OpenAI-compatible endpoint during onboarding. @@ -323,7 +236,7 @@ You can chat with the agent from the terminal or the browser. The onboard wizard starts a background port forward to the sandbox dashboard, then prints the dashboard URL in the install summary. The default host port is `18789`. If that port is already taken, NemoClaw uses the next free dashboard port, such as `18790`, and prints that port in the final URL. -If the chosen port becomes occupied after the sandbox build starts, onboarding rolls back the newly-created sandbox and asks you to retry instead of printing an unreachable dashboard URL. +If the chosen port becomes occupied after the sandbox build starts, onboarding rolls back the newly created sandbox and asks you to retry instead of printing an unreachable dashboard URL. The install transcript does not print the gateway token. If the browser requires authentication, use the `dashboard-url --quiet` command to print a complete URL explicitly. @@ -354,3 +267,4 @@ openclaw tui ## Related Skills - `nemoclaw-user-overview` — NemoClaw Overview (use the `nemoclaw-user-overview` skill) to learn what NemoClaw is and its capabilities +- `nemoclaw-user-agent-skills` — Agent Skills (use the `nemoclaw-user-agent-skills` skill) to load NemoClaw guidance into an AI coding assistant diff --git a/.agents/skills/nemoclaw-user-get-started/evals/evals.json b/.agents/skills/nemoclaw-user-get-started/evals/evals.json new file mode 100644 index 0000000000..e4f3b9a98b --- /dev/null +++ b/.agents/skills/nemoclaw-user-get-started/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-get-started-prerequisites-001", + "question": "I'm checking prerequisites before installation. Help me verify my host has the required hardware, software, and platform support so I can avoid a failed first setup.", + "expected_skill": "nemoclaw-user-get-started", + "ground_truth": "A NemoClaw-specific answer that helps the user verify my host has the required hardware, software, and platform support and gives enough concrete guidance, decision criteria, verification steps, or risk framing to avoid a failed first setup.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-get-started/references/prerequisites.md b/.agents/skills/nemoclaw-user-get-started/references/prerequisites.md index ba66a9b69f..4e7b25437f 100644 --- a/.agents/skills/nemoclaw-user-get-started/references/prerequisites.md +++ b/.agents/skills/nemoclaw-user-get-started/references/prerequisites.md @@ -1,8 +1,6 @@ - - # Prerequisites -Before getting started, check the prerequisites to ensure you have the necessary software and hardware to run NemoClaw. +Before you start, verify that your machine has the software and hardware needed to run NemoClaw. ## Hardware @@ -12,7 +10,11 @@ Before getting started, check the prerequisites to ensure you have the necessary | RAM | 8 GB | 16 GB | | Disk | 20 GB free | 40 GB free | -The sandbox image is approximately 2.4 GB compressed. During image push, the Docker daemon, k3s, and the OpenShell gateway run alongside the export pipeline. The pipeline buffers decompressed layers in memory. On machines with less than 8 GB of RAM, this combined usage can trigger the OOM killer. If you cannot add memory, configuring at least 8 GB of swap can work around the issue at the cost of slower performance. +The sandbox image is approximately 2.4 GB compressed. +During image push, the Docker daemon, k3s, and the OpenShell gateway run alongside the export pipeline. +The pipeline buffers decompressed layers in memory. +On machines with less than 8 GB of RAM, this combined usage can trigger the OOM killer. +If you cannot add memory, configure at least 8 GB of swap to work around the issue at the cost of slower performance. ## Software @@ -26,8 +28,9 @@ The sandbox image is approximately 2.4 GB compressed. During image push, the Doc On Linux, the installer can install Docker, start the Docker service, and add your user to the `docker` group. If the group change is not active in the current shell, the installer exits with `newgrp docker` guidance before it starts onboarding. If you choose the native Linux Ollama install path, the onboard wizard also requires `zstd` for Ollama archive extraction. +The installer also requires `strings` from `binutils` to verify the OpenShell binary before it continues with OpenShell install work. -**Docker group access:** +**Docker Group Access:** NemoClaw needs Docker access. On personal Linux development machines, adding your user to the `docker` group is the standard way to run Docker without sudo. @@ -35,6 +38,11 @@ Members of the `docker` group can control the daemon with root-level impact, so For background, review Docker's [daemon attack surface guidance](https://docs.docker.com/engine/security/#docker-daemon-attack-surface). On Debian and Ubuntu, NemoClaw installs `zstd` with `apt-get` if it is missing; on other Linux distributions, install `zstd` before onboarding. +If the installer reports that `strings` is missing, install `binutils` and rerun the installer: + +```bash +sudo apt-get install -y binutils +``` On macOS, NemoClaw uses the Docker-driver OpenShell gateway path with Docker Desktop or Colima. You do not need to install or sign a separate OpenShell VM driver helper for standard macOS onboarding. @@ -44,17 +52,17 @@ You do not need to install or sign a separate OpenShell VM driver helper for sta For NemoClaw-managed environments, use `nemoclaw onboard` when you need to create or recreate the OpenShell gateway or sandbox. Avoid `openshell self-update`, `npm update -g openshell`, `openshell gateway start --recreate`, or `openshell sandbox create` directly unless you intend to manage OpenShell separately and then rerun `nemoclaw onboard`. -**Docker storage driver:** +**Docker Storage Driver:** On Linux hosts running Docker 26 or later with the [containerd image store](https://docs.docker.com/engine/storage/containerd/) enabled (the install-time default for fresh `docker-ce` installations on Ubuntu 24.04 and similar distros), `nemoclaw onboard` transparently builds a `fuse-overlayfs`-enabled cluster image to bypass a kernel-level nested-overlay limitation in k3s. -No manual setup is required. -See the troubleshooting guide (use the `nemoclaw-user-reference` skill) for the override knobs and a manual `daemon.json` alternative. +You do not need manual setup. +Refer to the troubleshooting guide (use the `nemoclaw-user-reference` skill) for the override knobs and a manual `daemon.json` alternative. ## Platforms The following table lists tested platform and runtime combinations. Availability is not limited to these entries, but untested configurations can have issues. -The table is generated from [`ci/platform-matrix.json`](https://github.com/NVIDIA/NemoClaw/blob/main/ci/platform-matrix.json), the single source of truth kept in sync by CI and QA. +The table comes from [`ci/platform-matrix.json`](https://github.com/NVIDIA/NemoClaw/blob/main/ci/platform-matrix.json), the single source of truth kept in sync by CI and QA. | OS | Container runtime | Status | Notes | |----|-------------------|--------|-------| @@ -65,5 +73,6 @@ The table is generated from [`ci/platform-matrix.json`](https://github.com/NVIDI ## Next Steps -- Prepare Windows for NemoClaw (use the `nemoclaw-user-get-started` skill) if you are using Windows. -- Quickstart (use the `nemoclaw-user-get-started` skill) to install NemoClaw and launch your first sandbox. +- Prepare Windows for NemoClaw if you are using Windows. +- [Quickstart](../SKILL.md) to install NemoClaw and launch your first sandboxed agent. +- Agent Skills (use the `nemoclaw-user-agent-skills` skill) to load NemoClaw guidance into an AI coding assistant before setup. diff --git a/.agents/skills/nemoclaw-user-get-started/references/quickstart-hermes.md b/.agents/skills/nemoclaw-user-get-started/references/quickstart-hermes.md index b658f3679e..d0b15db496 100644 --- a/.agents/skills/nemoclaw-user-get-started/references/quickstart-hermes.md +++ b/.agents/skills/nemoclaw-user-get-started/references/quickstart-hermes.md @@ -1,16 +1,13 @@ - - # NemoClaw Quickstart with Hermes Use NemoHermes when you want NemoClaw to create an OpenShell sandbox that runs Hermes instead of the default OpenClaw agent. The `nemohermes` command is an alias for `nemoclaw` with the Hermes agent pre-selected. -**Experimental Feature:** - -The Hermes agent option is experimental. -Interfaces, defaults, and supported features may change without notice, and it is not recommended for production use. - -Review the Prerequisites (use the `nemoclaw-user-get-started` skill) before starting. +Review the [Prerequisites](prerequisites.md) before starting. +Install Docker, start it, and verify that the current shell can reach it before Hermes onboarding builds the sandbox image. +On Linux, the installer can install Docker, start the service, and add your user to the `docker` group. +If it changes group membership, run the printed `newgrp docker` recovery command before rerunning the installer. +On macOS, start Docker Desktop or Colima before you run the installer. The first Hermes build can take several minutes because NemoClaw builds the Hermes sandbox base image if it is not already cached. ## Install and Onboard @@ -18,20 +15,35 @@ The first Hermes build can take several minutes because NemoClaw builds the Herm Start the installer with `NEMOCLAW_AGENT=hermes` set in your shell. The installer installs the CLI, selects the `nemohermes` alias, and runs the guided onboarding flow. -```console -$ export NEMOCLAW_AGENT=hermes -$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash +```bash +export NEMOCLAW_AGENT=hermes +curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash +``` + +If a headless host needs to expose the Hermes API through a remote URL or tunnel, set `CHAT_UI_URL` before onboarding. +Use the externally reachable origin for port `8642`, without the `/v1` path. +NemoClaw derives the forwarded port from this value, binds the forward for remote access when the origin is non-loopback, and prints the final OpenAI-compatible base URL with `/v1` in the ready summary. + +```bash +export NEMOCLAW_AGENT=hermes +export CHAT_UI_URL="https://hermes.example.com:8642" +curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash ``` +For SSH local port forwarding to `127.0.0.1:8642`, leave `CHAT_UI_URL` unset. +Do not append an OpenClaw `#token=` fragment to the Hermes URL. +Hermes API clients authenticate with the bearer token from the generated Hermes environment instead of an OpenClaw dashboard URL token. + If NemoClaw is already installed, start Hermes onboarding directly. -```console -$ nemohermes onboard +```bash +nemohermes onboard ``` ## Respond to the Wizard -The onboard wizard asks for a sandbox name, inference provider, model, credentials, and network policy preset. +The onboard wizard asks for an inference provider, model, any required credential, and sandbox name before it prints the review summary. +After you confirm, NemoClaw registers inference, prompts for supported messaging channels, builds and starts the sandbox, sets up Hermes, then applies the selected network policy tier and presets. At any prompt, press Enter to accept the default shown in `[brackets]`, type `back` to return to the previous prompt, or type `exit` to quit. The default Hermes sandbox name is `hermes`. @@ -44,10 +56,10 @@ Sandbox name [hermes]: my-hermes Choose the inference provider that matches where you want Hermes model traffic to go. The provider options and credential environment variables are the same as the standard NemoClaw quickstart. -For provider-specific prompts, refer to the Respond to the Onboard Wizard (use the `nemoclaw-user-get-started` skill) section and the Inference Options (use the `nemoclaw-user-configure-inference` skill) page. +For provider-specific prompts, refer to the Inference Options (use the `nemoclaw-user-configure-inference` skill) page. The Hermes wizard does not ask for Brave Web Search because Hermes does not use NemoClaw's OpenClaw web-search configuration. -After provider and policy selection, review the summary and confirm the build. +After provider and model selection, review the summary and confirm the build. NemoClaw writes Hermes configuration into `/sandbox/.hermes`, routes model traffic through `inference.local`, and starts the Hermes gateway inside the sandbox. The Hermes image includes runtime dependencies for the supported NemoClaw messaging integrations, API service, and health endpoint. The base image does not include unsupported Hermes integrations. @@ -61,13 +73,13 @@ Hermes uses an agent-specific baseline policy that allows the Hermes binary and For CI or scripted installs, set the required environment variables before running the installer. The example below uses NVIDIA Endpoints and creates a sandbox named `my-hermes`. -```console -$ export NEMOCLAW_AGENT=hermes -$ export NEMOCLAW_NON_INTERACTIVE=1 -$ export NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 -$ export NEMOCLAW_SANDBOX_NAME=my-hermes -$ export NVIDIA_API_KEY= -$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash +```bash +export NEMOCLAW_AGENT=hermes +export NEMOCLAW_NON_INTERACTIVE=1 +export NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 +export NEMOCLAW_SANDBOX_NAME=my-hermes +export NVIDIA_API_KEY= +curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash ``` Use the provider variables from Inference Options (use the `nemoclaw-user-configure-inference` skill) when you choose a different provider. @@ -76,6 +88,17 @@ Use the provider variables from Inference Options (use the `nemoclaw-user-config When onboarding completes, NemoClaw prints the sandbox name, model, lifecycle commands, and Hermes API endpoint. Hermes exposes an OpenAI-compatible API on port `8642`, not a browser dashboard. +To also launch the native Hermes web dashboard, opt in before onboarding: + +```bash +export NEMOCLAW_HERMES_DASHBOARD=1 +nemohermes onboard +``` + +The dashboard uses port `9119` by default. +Set `NEMOCLAW_HERMES_DASHBOARD_PORT` before onboarding to choose a different port. +Set `NEMOCLAW_HERMES_DASHBOARD_TUI=1` to enable Hermes' optional in-browser TUI tab. +For upstream dashboard features, refer to the [Hermes web dashboard documentation](https://hermes-agent.nousresearch.com/docs/user-guide/features/web-dashboard). ```text ────────────────────────────────────────────────── @@ -90,6 +113,10 @@ Access Port 8642 must be forwarded before connecting. http://127.0.0.1:8642/v1 + Hermes Agent Web dashboard + Port 9119 must be forwarded before opening this URL. + http://127.0.0.1:9119/ + Terminal: nemohermes my-hermes connect @@ -107,14 +134,14 @@ To chat with the agent from a terminal, follow these steps: 1. Connect to the sandbox and start the Hermes CLI. - ```console - $ nemohermes my-hermes connect + ```bash + nemohermes my-hermes connect ``` 2. Inside the sandbox, run the Hermes CLI. - ```console - $ hermes + ```bash + hermes ``` ## Check the API Endpoint @@ -122,44 +149,58 @@ To chat with the agent from a terminal, follow these steps: The onboard flow starts the port forward automatically. Check the health endpoint from the host to confirm that the Hermes API is reachable. -```console -$ curl -sf http://127.0.0.1:8642/health +```bash +curl -sf http://127.0.0.1:8642/health ``` If the command cannot connect after a reboot or terminal restart, start the forward again. -```console -$ openshell forward start --background 8642 my-hermes +```bash +openshell forward start --background 8642 my-hermes ``` Configure an OpenAI-compatible client with the base URL `http://127.0.0.1:8642/v1`. Hermes uses API header authentication for client requests. Do not append an OpenClaw `#token=` URL fragment to the Hermes endpoint. +## Open the Optional Dashboard + +When you set `NEMOCLAW_HERMES_DASHBOARD=1` during onboarding, NemoClaw starts `hermes dashboard --no-open` inside the sandbox and forwards `http://127.0.0.1:9119/` on the host. +The API endpoint remains separate on `8642`. + +If the dashboard forward is missing after a reboot or terminal restart, start it again: + +```bash +openshell forward start --background 9119 my-hermes +``` + +Treat the dashboard as a local management UI. +Avoid exposing it on shared or public networks unless you put it behind your own access controls. + ## Manage the Sandbox Use the same lifecycle commands as a standard NemoClaw sandbox. The `nemohermes` alias keeps help text and recovery messages aligned with Hermes, while targeting the same registered sandbox. `nemoclaw list` shows the agent type for each sandbox so you can distinguish Hermes and OpenClaw entries. -```console -$ nemohermes my-hermes status -$ nemohermes my-hermes logs --follow -$ nemohermes my-hermes snapshot create --name before-change -$ nemohermes my-hermes rebuild +```bash +nemohermes my-hermes status +nemohermes my-hermes logs --follow +nemohermes my-hermes snapshot create --name before-change +nemohermes my-hermes rebuild ``` To change the active model or provider without rebuilding the sandbox, use `nemohermes inference set`. It updates the OpenShell inference route and patches `/sandbox/.hermes/config.yaml` without restarting Hermes. -```console -$ nemohermes inference set --model --provider +```bash +nemohermes inference set --model --provider ``` To remove the sandbox when you are done, destroy it explicitly. -```console -$ nemohermes my-hermes destroy +```bash +nemohermes my-hermes destroy ``` ## Next Steps diff --git a/.agents/skills/nemoclaw-user-get-started/references/windows-preparation.md b/.agents/skills/nemoclaw-user-get-started/references/windows-preparation.md index 2d09365550..95e0eec6c5 100644 --- a/.agents/skills/nemoclaw-user-get-started/references/windows-preparation.md +++ b/.agents/skills/nemoclaw-user-get-started/references/windows-preparation.md @@ -1,21 +1,35 @@ - - # Prepare Windows for NemoClaw +import { AgentOnly } from "../_components/AgentGuide"; + You can run NemoClaw inside Windows Subsystem for Linux (WSL 2) on Windows. -Complete these steps before following the Quickstart (use the `nemoclaw-user-get-started` skill). + +Complete these steps before following the Quickstart. + + +Complete these steps before following Quickstart with Hermes. + Linux and macOS users do not need this page and can go directly to the Quickstart. **Note:** -This guide has been tested on x86-64. +NVIDIA tested this guide on x86-64. ## Prerequisites Verify the following before you begin: - Windows 10 (build 19041 or later) or Windows 11. -- Hardware requirements are the same as the Quickstart (use the `nemoclaw-user-get-started` skill). + + +- Hardware requirements are the same as the Quickstart. + + + + +- Hardware requirements are the same as Quickstart with Hermes. + + ## Option: Use the Bootstrap Script @@ -29,6 +43,8 @@ The command downloads the script to a temporary file before running it. `-ExecutionPolicy Bypass` applies only to that PowerShell process and avoids local policy blocking the downloaded script. Run it from Windows, not from inside WSL. The script requests Administrator privileges when needed, enables the required WSL 2 Windows features, installs or opens Ubuntu 24.04, and installs and starts Docker Desktop. +When Ubuntu needs first-run account setup, the script opens a handoff window and waits for that account to exist before it changes Docker settings. +It enables Docker Desktop WSL integration for the target distro, restarts Docker Desktop only when Docker was already running, and leaves your global default WSL distro unchanged. If the target Ubuntu distro is already registered, the script confirms it uses WSL 2, converts it from WSL 1 when needed, and verifies Docker is reachable from WSL. If Windows requires a reboot after enabling WSL features, the script prompts for the reboot and registers a one-time continuation for the next sign-in. If Docker Desktop shows first-run prompts, complete them and return to the PowerShell window. @@ -45,10 +61,11 @@ When Windows preparation is complete, it opens Ubuntu and prints the standard in curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash ``` -If the bootstrap script reports that Docker is not reachable from Ubuntu, open Docker Desktop Settings and confirm that WSL integration is enabled for Ubuntu (Settings > Resources > WSL integration), then rerun the script. +If the bootstrap script reports that Ubuntu cannot reach Docker, open Docker Desktop Settings and confirm that Docker Desktop enables WSL integration for Ubuntu (**Settings** > **Resources** > **WSL integration**), make sure Docker Desktop is running, then rerun the script. If the bootstrap script reports that `winget.exe` is not available (common on Windows Server or stripped Windows installs), install **App Installer** from the Microsoft Store (which provides `winget`), or download and install Docker Desktop manually from [docker.com](https://www.docker.com/products/docker-desktop/). -Rerun the bootstrap script after Docker Desktop is installed; the script skips the install step once it detects Docker Desktop is present. +After you install Docker Desktop, rerun the bootstrap script. +The script skips the install step after it detects Docker Desktop. The manual steps below describe the same Windows preparation pieces and are useful when you need to verify or repair WSL, Ubuntu, or Docker Desktop by hand. @@ -78,9 +95,9 @@ Let the distribution launch and complete first-run setup (pick a Unix username a Do not use the `--no-launch` flag. The `--no-launch` flag downloads the package but does not register the distribution with WSL. -Commands like `wsl -d Ubuntu-24.04` fail with "There is no distribution with the supplied name" until the distribution has been launched at least once. +Commands like `wsl -d Ubuntu-24.04` fail with "There is no distribution with the supplied name" until you launch the distribution at least one time. -Verify the distribution is registered and running WSL 2: +Verify that WSL registered the distribution and runs it with WSL 2: ```powershell wsl -l -v @@ -97,7 +114,7 @@ Expected output: Install [Docker Desktop](https://www.docker.com/products/docker-desktop/) with the WSL 2 backend (the default on Windows 11). -After installation, open Docker Desktop Settings and confirm that WSL integration is enabled for your Ubuntu distribution (Settings > Resources > WSL integration). +After installation, open Docker Desktop Settings and confirm that Docker Desktop enables WSL integration for your Ubuntu distribution (**Settings** > **Resources** > **WSL integration**). Open WSL from PowerShell: @@ -112,7 +129,7 @@ docker info ``` `docker info` prints server information. -If you see "Cannot connect to the Docker daemon", confirm that Docker Desktop is running and that WSL integration is enabled. +If you see "Cannot connect to the Docker daemon", confirm that Docker Desktop is running and that Docker Desktop enables WSL integration. ## Set Up Local Inference with Ollama (Optional) @@ -123,7 +140,7 @@ You can install Ollama inside WSL yourself: curl -fsSL https://ollama.com/install.sh | sh ``` -If Ollama is installed but not already running in WSL, the onboarding process starts it for you. +If you installed Ollama but it is not already running in WSL, onboarding starts it for you. You can also start it yourself beforehand with `ollama serve`. You can also use Ollama for Windows. @@ -137,10 +154,15 @@ Use one instance, or move one of them to a different port before running `nemocl Your Windows environment is ready. If you used the bootstrap script, follow the installer command it printed inside Ubuntu. -If you prepared Windows manually, open a WSL terminal (type `wsl` in PowerShell, or open Ubuntu from Windows Terminal) and continue with the Quickstart (use the `nemoclaw-user-get-started` skill) to install NemoClaw and launch your first sandbox. + +If you prepared Windows manually, open a WSL terminal (type `wsl` in PowerShell, or open Ubuntu from Windows Terminal) and continue with the Quickstart to install NemoClaw and launch your first sandbox. + + +If you prepared Windows manually, open a WSL terminal (type `wsl` in PowerShell, or open Ubuntu from Windows Terminal) and continue with Quickstart with Hermes to install NemoClaw and launch your first Hermes sandbox. + All NemoClaw commands run inside WSL, not in PowerShell. ## Troubleshooting -For Windows-specific troubleshooting, refer to the Windows Subsystem for Linux section (use the `nemoclaw-user-reference` skill) in the Troubleshooting guide. +For Windows-specific troubleshooting, refer to the Windows Subsystem for Linux section in the Troubleshooting guide. diff --git a/.agents/skills/nemoclaw-user-manage-policy/SKILL.md b/.agents/skills/nemoclaw-user-manage-policy/SKILL.md index b375cd7e44..cb1e41036e 100644 --- a/.agents/skills/nemoclaw-user-manage-policy/SKILL.md +++ b/.agents/skills/nemoclaw-user-manage-policy/SKILL.md @@ -1,15 +1,14 @@ --- name: "nemoclaw-user-manage-policy" description: "Adds, removes, or modifies allowed endpoints in the sandbox policy. Use when customizing network policy, changing egress rules, or configuring sandbox endpoint access. Trigger keywords - customize nemoclaw network policy, sandbox egress policy configuration, nemoclaw integration policy examples, post-install policy setup, openshell approval workflow, policy preset, nemoclaw approve network requests, sandbox egress approval tui." +license: "Apache-2.0" --- - - - # Customize the Sandbox Network Policy ## Gotchas +- Adding a host to the egress policy permits the connection only after the endpoint, port, method, and binary rules match. - Custom preset hosts bypass NemoClaw's review process and can widen sandbox egress to arbitrary destinations. ## Prerequisites @@ -17,9 +16,11 @@ description: "Adds, removes, or modifies allowed endpoints in the sandbox policy - A running NemoClaw sandbox for dynamic changes, or the NemoClaw source repository for static changes. - The OpenShell CLI on your `PATH`. -Add, remove, or modify the endpoints that the sandbox is allowed to reach. +import { AgentOnly } from "../_components/AgentGuide"; + +Add, remove, or modify the endpoints the sandbox can reach. -The sandbox policy is defined in a declarative YAML file in the NemoClaw repository and enforced at runtime by [NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell). +The NemoClaw repository defines the sandbox policy in a declarative YAML file, and [NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell) enforces it at runtime. NemoClaw supports both static policy changes that persist across restarts and dynamic updates applied to a running sandbox through the OpenShell CLI. **Note:** @@ -29,18 +30,34 @@ Apply a custom NemoClaw preset with `nemoclaw policy-add --from-file`. Do not rely on `host.docker.internal` as a general host-service path because it bypasses the OpenShell policy path and may not be reachable in every sandbox runtime. See Agent cannot reach a host-side HTTP service (use the `nemoclaw-user-reference` skill). +**Warning:** + +Adding a host to the egress policy permits the connection only after the endpoint, port, method, and binary rules match. +OpenShell still applies SSRF protection separately, so a request can be denied if the final address resolves to a loopback, private, link-local, or otherwise blocked internal range. +If a package installer or browser runtime download still fails with an SSRF-style denial after you add the public host, install that binary into the sandbox image at build time with `nemoclaw onboard --from` (use the `nemoclaw-user-reference` skill) instead of relying on runtime egress. + ## Static Changes Static changes modify the baseline policy file and take effect after the next sandbox creation. ### Edit the Policy File + Open `nemoclaw-blueprint/policies/openclaw-sandbox.yaml` and add or modify endpoint entries. If you want a built-in preset to be part of the baseline policy, merge its `network_policies` entries into this file and re-run `nemoclaw onboard`. If you only need to apply a preset to a running sandbox, use `nemoclaw policy-add` under [Dynamic Changes](#dynamic-changes). That updates the live policy and does not edit `openclaw-sandbox.yaml`. + + +Open the Hermes policy additions and shared sandbox policy files under `agents/hermes/` and `nemoclaw-blueprint/policies/`, then add or modify endpoint entries. + +If you want a built-in preset to be part of the baseline policy, merge its `network_policies` entries into the appropriate policy file and re-run `nemoclaw onboard`. + +If you only need to apply a preset to a running sandbox, use `nemoclaw policy-add` under [Dynamic Changes](#dynamic-changes). +That updates the live policy and does not edit the baseline policy files. + Use a manual YAML edit when you need to allow custom hosts that are not covered by a preset, such as an internal API or a weather service. @@ -59,18 +76,18 @@ Each entry in the `network` section defines an endpoint group with the following Apply the updated policy by re-running the onboard wizard: -```console -$ nemoclaw onboard +```bash +nemoclaw onboard ``` -The wizard picks up the modified policy file and applies it to the sandbox. +The wizard reads the modified policy file and applies it to the sandbox. ### Verify the Policy Check that the sandbox is running with the updated policy: -```console -$ nemoclaw status +```bash +nemoclaw status ``` ### Add Blueprint Policy Additions @@ -85,7 +102,7 @@ Dynamic changes apply a policy update to a running sandbox without restarting it > [!WARNING] > `openshell policy set` **replaces** the sandbox's live policy with the contents of the file you provide; it does not merge. -> A running sandbox's live policy is the baseline from `openclaw-sandbox.yaml` plus every preset that was layered on during onboarding. +> A running sandbox's live policy is the baseline policy plus every preset that was layered on during onboarding. > Applying a file that contains only the baseline (or only a single preset) silently drops every other preset that was in effect. ### Option 1: Drop a Preset File and Use `policy-add` (Recommended) @@ -115,41 +132,46 @@ This is the non-destructive path and the only flow NemoClaw supports out of the 2. Apply it to the running sandbox: - ```console - $ nemoclaw my-assistant policy-add - ``` +```bash +nemoclaw my-assistant policy-add +``` - NemoClaw reads the live policy via `openshell policy get --full`, structurally merges your preset's `network_policies` into it, and writes the merged result back. - Existing presets and the baseline remain in place. - The preset file under `presets/` also persists across sandbox recreations. +NemoClaw reads the live policy via `openshell policy get --full`, structurally merges your preset's `network_policies` into it, and writes the merged result back. +Existing presets and the baseline remain in place. +The preset file under `presets/` also persists across sandbox recreations. -### Option 2: Snapshot, Edit, and Set via OpenShell +### Option 2: Snapshot, Edit, and Set with OpenShell Use this path only when you cannot add a file under the NemoClaw source tree. -You must start from the **live** policy, not from `openclaw-sandbox.yaml`, so the presets layered on at onboarding are preserved in the file you apply. +You must start from the **live** policy, not from a baseline policy file, so the presets layered on at onboarding are preserved in the file you apply. -```console -$ openshell policy get --full my-assistant > live-policy.yaml +```bash +openshell policy get --full my-assistant > live-policy.yaml ``` Edit `live-policy.yaml` to add your entries under `network_policies:`, keeping the existing `version` field intact, then apply: -```console -$ openshell policy set --policy live-policy.yaml my-assistant +```bash +openshell policy set --policy live-policy.yaml my-assistant ``` ### Scope of Dynamic Changes Dynamic changes apply only to the current session. -When the sandbox stops, the running policy resets to the baseline composed from `openclaw-sandbox.yaml` plus the presets recorded for the sandbox. -To make a custom policy survive a sandbox recreation, ship the preset file in the repository (Option 1 above — the file under `presets/` persists) or edit `openclaw-sandbox.yaml` and re-run `nemoclaw onboard`. +When the sandbox stops, the running policy resets to the baseline policy plus the presets recorded for the sandbox. + +To make a custom policy survive a sandbox recreation, ship the preset file in the repository (Option 1 above; the file under `presets/` persists) or edit `openclaw-sandbox.yaml` and re-run `nemoclaw onboard`. + + +To make a custom policy survive a sandbox recreation, ship the preset file in the repository (Option 1 above; the file under `presets/` persists) or edit the Hermes policy additions and re-run `nemoclaw onboard`. + ### Approve Requests Interactively For one-off access, you can approve blocked requests in the OpenShell TUI instead of editing the baseline policy: -```console -$ openshell term +```bash +openshell term ``` This is useful when you want to test a destination before deciding whether it belongs in a permanent preset or custom policy file. @@ -158,7 +180,7 @@ This is useful when you want to test a destination before deciding whether it be NemoClaw ships preset policy files for common integrations in `nemoclaw-blueprint/policies/presets/`. Apply a preset as-is or use it as a starting template for a custom policy. -For guided post-install examples, see Common Integration Policy Examples (use the `nemoclaw-user-manage-policy` skill). +For guided post-install examples, see [Common Integration Policy Examples](references/integration-policy-examples.md). During onboarding, the policy tier (use the `nemoclaw-user-reference` skill) you select determines which presets are enabled by default. You can add or remove individual presets in the interactive preset screen that follows tier selection. @@ -175,6 +197,7 @@ Available presets: | `jira` | Atlassian Jira API | | `local-inference` | Local Ollama and vLLM through the host gateway | | `npm` | npm and Yarn registries | +| `openclaw-pricing` | OpenClaw model-pricing reference fetch (LiteLLM and OpenRouter) | | `outlook` | Microsoft 365 and Outlook | | `pypi` | Python Package Index | | `slack` | Slack API and webhooks | @@ -184,8 +207,8 @@ Available presets: To apply a preset to a running sandbox: -```console -$ nemoclaw policy-add +```bash +nemoclaw policy-add ``` **Note:** @@ -195,29 +218,33 @@ Pass a preset name with `--yes` for scripted workflows. For example, to interactively add PyPI access to a running sandbox: -```console -$ nemoclaw my-assistant policy-add +```bash +nemoclaw my-assistant policy-add ``` To list which presets are applied to a sandbox: -```console -$ nemoclaw policy-list +```bash +nemoclaw policy-list ``` + To include a preset in the baseline, merge its entries into `openclaw-sandbox.yaml` and re-run `nemoclaw onboard`. + + +To include a preset in the baseline, merge its entries into the Hermes policy additions and re-run `nemoclaw onboard`. + **Note:** -The `openshell policy set --policy ` command operates on raw policy files and does not -accept the `preset:` metadata block used in preset YAML files. Use `nemoclaw policy-add` for -presets. +The `openshell policy set --policy ` command operates on raw policy files and does not accept the `preset:` metadata block used in preset YAML files. +Use `nemoclaw policy-add` for presets. For scripted workflows, `policy-add` and `policy-remove` accept the preset name as a positional argument: -```console -$ nemoclaw my-assistant policy-add pypi --yes -$ nemoclaw my-assistant policy-remove pypi --yes +```bash +nemoclaw my-assistant policy-add pypi --yes +nemoclaw my-assistant policy-remove pypi --yes ``` Set `NEMOCLAW_NON_INTERACTIVE=1` instead of `--yes` to drive the same flow from an environment variable. @@ -256,16 +283,16 @@ Rename `preset.name` if NemoClaw refuses to apply the file because of a collisio ### Apply a Single File -```console -$ nemoclaw my-assistant policy-add --from-file ./presets/my-internal-api.yaml +```bash +nemoclaw my-assistant policy-add --from-file ./presets/my-internal-api.yaml ``` Preview the endpoints without applying with `--dry-run`, and skip the confirmation prompt with `--yes` or by exporting `NEMOCLAW_NON_INTERACTIVE=1`. ### Apply Every File in a Directory -```console -$ nemoclaw my-assistant policy-add --from-dir ./presets/ --yes +```bash +nemoclaw my-assistant policy-add --from-dir ./presets/ --yes ``` Files are processed in lexicographic order. @@ -279,10 +306,11 @@ Review every host in a custom preset before applying it, especially when the fil ### Remove a Custom Preset -Custom presets applied with `--from-file` or `--from-dir` are recorded in the NemoClaw sandbox registry alongside their full YAML content, so they can be removed by name — the original file does not need to be kept on disk: +NemoClaw records custom presets applied with `--from-file` or `--from-dir` in the sandbox registry alongside their full YAML content. +You can remove them by name without keeping the original file on disk: -```console -$ nemoclaw my-assistant policy-remove my-internal-api --yes +```bash +nemoclaw my-assistant policy-remove my-internal-api --yes ``` `policy-remove` accepts both built-in and custom preset names. Run `nemoclaw policy-list` to see every preset currently applied to the sandbox. @@ -294,6 +322,8 @@ $ nemoclaw my-assistant policy-remove my-internal-api --yes ## Related Skills +- [Approve or Deny Agent Network Requests](references/approve-network-requests.md) for real-time operator approval. +- [Common Integration Policy Examples](references/integration-policy-examples.md) for maintained preset examples such as Outlook, messaging, GitHub, Jira, Brave Search, package managers, Hugging Face, and local inference. - `nemoclaw-user-reference` — Network Policies (use the `nemoclaw-user-reference` skill) for the full baseline policy reference - OpenShell [Policy Schema](https://docs.nvidia.com/openshell/latest/reference/policy-schema.html) for the full YAML policy schema reference. - OpenShell [Sandbox Policies](https://docs.nvidia.com/openshell/latest/sandboxes/policies.html) for applying, iterating, and debugging policies at the OpenShell layer. diff --git a/.agents/skills/nemoclaw-user-manage-policy/evals/evals.json b/.agents/skills/nemoclaw-user-manage-policy/evals/evals.json new file mode 100644 index 0000000000..2736fb87a6 --- /dev/null +++ b/.agents/skills/nemoclaw-user-manage-policy/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-network-policy-customize-network-policy-001", + "question": "I'm customizing sandbox network policy. Help me allow the agent to reach a required external service so I can enable the integration while preserving least privilege.", + "expected_skill": "nemoclaw-user-manage-policy", + "ground_truth": "A NemoClaw-specific answer that helps the user allow the agent to reach a required external service and gives enough concrete guidance, decision criteria, verification steps, or risk framing to enable the integration while preserving least privilege.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-manage-policy/references/approve-network-requests.md b/.agents/skills/nemoclaw-user-manage-policy/references/approve-network-requests.md index c84bb1fe53..6c97de9f7c 100644 --- a/.agents/skills/nemoclaw-user-manage-policy/references/approve-network-requests.md +++ b/.agents/skills/nemoclaw-user-manage-policy/references/approve-network-requests.md @@ -1,5 +1,3 @@ - - # Approve or Deny Agent Network Requests Review and act on network requests that the agent makes to endpoints not listed in the sandbox policy. @@ -14,14 +12,14 @@ OpenShell intercepts these requests and presents them in the TUI for operator ap Start the OpenShell terminal UI to monitor sandbox activity: -```console -$ openshell term +```bash +openshell term ``` For a remote sandbox, pass the instance name: -```console -$ ssh my-gpu-box 'cd ~/nemoclaw && . .env && openshell term' +```bash +ssh my-gpu-box 'cd ~/nemoclaw && . .env && openshell term' ``` The TUI displays the sandbox state, active inference provider, and a live feed of network activity. @@ -44,14 +42,14 @@ The TUI presents an approval prompt for each blocked request. Approved endpoints remain in the running policy until the sandbox stops. They are not persisted to the baseline policy file. -To keep an endpoint allowed after a restart, update the policy YAML or apply a preset as described in Customize the Sandbox Network Policy (use the `nemoclaw-user-manage-policy` skill). +To keep an endpoint allowed after a restart, update the policy YAML or apply a preset as described in [Customize the Sandbox Network Policy](../SKILL.md). ## Run the Walkthrough From the NemoClaw repository root, run the walkthrough script after you have onboarded at least one sandbox and it is reachable: -```console -$ ./scripts/walkthrough.sh +```bash +./scripts/walkthrough.sh ``` This script opens a split tmux session with the TUI on the left and the agent on the right. @@ -59,6 +57,6 @@ The walkthrough requires tmux and the `NVIDIA_API_KEY` environment variable, and ## Related Topics -- Customize the Sandbox Network Policy (use the `nemoclaw-user-manage-policy` skill) to add endpoints permanently. +- [Customize the Sandbox Network Policy](../SKILL.md) to add endpoints permanently. - Network Policies (use the `nemoclaw-user-reference` skill) for the full baseline policy reference. - Monitor Sandbox Activity (use the `nemoclaw-user-monitor-sandbox` skill) for general sandbox monitoring. diff --git a/.agents/skills/nemoclaw-user-manage-policy/references/integration-policy-examples.md b/.agents/skills/nemoclaw-user-manage-policy/references/integration-policy-examples.md index 3a715b7863..18d77758ba 100644 --- a/.agents/skills/nemoclaw-user-manage-policy/references/integration-policy-examples.md +++ b/.agents/skills/nemoclaw-user-manage-policy/references/integration-policy-examples.md @@ -1,7 +1,7 @@ - - # Common NemoClaw Integration Policy Examples +import { AgentOnly } from "../_components/AgentGuide"; + Use these examples when a sandbox is already installed and an integration needs network access. This page covers only integrations that NemoClaw currently ships as maintained policy preset YAML under `nemoclaw-blueprint/policies/presets/`. Integration setup usually has two separate parts: @@ -18,19 +18,19 @@ Replace `my-assistant` with your sandbox name in the examples. Check the current policy state first: -```console -$ nemoclaw my-assistant policy-list +```bash +nemoclaw my-assistant policy-list ``` For a live view of blocked requests, open the OpenShell TUI in a separate host terminal: -```console -$ openshell term +```bash +openshell term ``` When the agent reaches an endpoint that is not in policy, the TUI shows the host, port, requesting binary, method, and path when available. Approve a request only when you understand why the integration needs it. -An approval updates the running policy, but it does not create a NemoClaw preset entry that can be reviewed and replayed like `policy-add`. +An approval updates the running policy, but it does not create a reviewable NemoClaw preset entry that `policy-add` can replay. ## Supported Integration Presets @@ -45,6 +45,7 @@ NemoClaw ships maintained policy presets for common services in `nemoclaw-bluepr | Hugging Face Hub and Inference API | `huggingface` | | Jira and Atlassian Cloud | `jira` | | Local Ollama or vLLM through the host gateway | `local-inference` | +| OpenClaw model-pricing reference fetch | `openclaw-pricing` | | npm and Yarn packages | `npm` | | Microsoft 365, Outlook, and Graph API | `outlook` | | Python Package Index | `pypi` | @@ -55,20 +56,20 @@ NemoClaw ships maintained policy presets for common services in `nemoclaw-bluepr Preview the endpoints before applying: -```console -$ nemoclaw my-assistant policy-add outlook --dry-run +```bash +nemoclaw my-assistant policy-add outlook --dry-run ``` Apply the preset: -```console -$ nemoclaw my-assistant policy-add outlook --yes +```bash +nemoclaw my-assistant policy-add outlook --yes ``` Remove it later if the sandbox no longer needs that access: -```console -$ nemoclaw my-assistant policy-remove outlook --yes +```bash +nemoclaw my-assistant policy-remove outlook --yes ``` ## Email and Calendar With Microsoft 365 @@ -76,9 +77,9 @@ $ nemoclaw my-assistant policy-remove outlook --yes Use the `outlook` preset for Microsoft 365 email and calendar workflows that use Microsoft Graph or Outlook endpoints. The preset allows `graph.microsoft.com`, Microsoft login, and Outlook service endpoints. -```console -$ nemoclaw my-assistant policy-add outlook --dry-run -$ nemoclaw my-assistant policy-add outlook --yes +```bash +nemoclaw my-assistant policy-add outlook --dry-run +nemoclaw my-assistant policy-add outlook --yes ``` Then configure the email or calendar tool credentials through the integration you are running in the sandbox. @@ -92,23 +93,23 @@ If the blocked endpoint is not covered by the maintained `outlook` preset, treat Telegram needs both channel configuration and egress policy. If you already enabled Telegram during onboarding but did not include the preset, add it to the running sandbox: -```console -$ nemoclaw my-assistant policy-add telegram --yes +```bash +nemoclaw my-assistant policy-add telegram --yes ``` To add Telegram after onboarding, set the token on the host, add the channel, rebuild so the image picks up the channel config, and make sure the policy preset is applied: -```console -$ export TELEGRAM_BOT_TOKEN= -$ NEMOCLAW_NON_INTERACTIVE=1 nemoclaw my-assistant channels add telegram -$ nemoclaw my-assistant rebuild -$ nemoclaw my-assistant policy-add telegram --yes +```bash +export TELEGRAM_BOT_TOKEN= +NEMOCLAW_NON_INTERACTIVE=1 nemoclaw my-assistant channels add telegram +nemoclaw my-assistant rebuild +nemoclaw my-assistant policy-add telegram --yes ``` If delivery fails, open the TUI and send a test message to the bot: -```console -$ openshell term +```bash +openshell term ``` The matching preset for each supported messaging channel is the channel name (`telegram`, `discord`, `slack`, `wechat`, or `whatsapp`). @@ -120,60 +121,61 @@ Use the matching policy preset after you configure the channel credentials. For Slack: -```console -$ export SLACK_BOT_TOKEN= -$ export SLACK_APP_TOKEN= -$ NEMOCLAW_NON_INTERACTIVE=1 nemoclaw my-assistant channels add slack -$ nemoclaw my-assistant rebuild -$ nemoclaw my-assistant policy-add slack --yes +```bash +export SLACK_BOT_TOKEN= +export SLACK_APP_TOKEN= +NEMOCLAW_NON_INTERACTIVE=1 nemoclaw my-assistant channels add slack +nemoclaw my-assistant rebuild +nemoclaw my-assistant policy-add slack --yes ``` For Discord: -```console -$ export DISCORD_BOT_TOKEN= -$ export DISCORD_SERVER_ID= -$ NEMOCLAW_NON_INTERACTIVE=1 nemoclaw my-assistant channels add discord -$ nemoclaw my-assistant rebuild -$ nemoclaw my-assistant policy-add discord --yes +```bash +export DISCORD_BOT_TOKEN= +export DISCORD_SERVER_ID= +NEMOCLAW_NON_INTERACTIVE=1 nemoclaw my-assistant channels add discord +nemoclaw my-assistant rebuild +nemoclaw my-assistant policy-add discord --yes ``` If you enabled Slack or Discord during onboarding, apply only the matching preset: -```console -$ nemoclaw my-assistant policy-add slack --yes -$ nemoclaw my-assistant policy-add discord --yes +```bash +nemoclaw my-assistant policy-add slack --yes +nemoclaw my-assistant policy-add discord --yes ``` ## WeChat or WhatsApp Messaging (Experimental) WeChat and WhatsApp are experimental. -Both rely on QR-based pairing flows that are more fragile than token-based bots, and the upstream client libraries can change behavior without notice. +Both rely on QR-based pairing flows that are more fragile than token-based bots. +The upstream client libraries can change behavior without notice. WeChat uses Tencent's iLink Bot API for personal accounts. The bot token is captured by a host-side QR scan during onboarding rather than pasted from a developer portal. Add the channel interactively and apply the preset: -```console -$ nemoclaw my-assistant channels add wechat -$ nemoclaw my-assistant rebuild -$ nemoclaw my-assistant policy-add wechat --yes +```bash +nemoclaw my-assistant channels add wechat +nemoclaw my-assistant rebuild +nemoclaw my-assistant policy-add wechat --yes ``` -WhatsApp Web pairs entirely inside the sandbox via QR scan, so `channels add` does not collect a host-side token. +WhatsApp Web pairs entirely inside the sandbox through QR scan, so `channels add` does not collect a host-side token. Apply the preset and complete the in-sandbox pairing after the rebuild: -```console -$ NEMOCLAW_NON_INTERACTIVE=1 nemoclaw my-assistant channels add whatsapp -$ nemoclaw my-assistant rebuild -$ nemoclaw my-assistant policy-add whatsapp --yes +```bash +NEMOCLAW_NON_INTERACTIVE=1 nemoclaw my-assistant channels add whatsapp +nemoclaw my-assistant rebuild +nemoclaw my-assistant policy-add whatsapp --yes ``` If you enabled WeChat or WhatsApp during onboarding, apply only the matching preset: -```console -$ nemoclaw my-assistant policy-add wechat --yes -$ nemoclaw my-assistant policy-add whatsapp --yes +```bash +nemoclaw my-assistant policy-add wechat --yes +nemoclaw my-assistant policy-add whatsapp --yes ``` ## GitHub and Jira @@ -183,37 +185,38 @@ Use `jira` when the agent needs Atlassian Jira access. Preview first: -```console -$ nemoclaw my-assistant policy-add github --dry-run -$ nemoclaw my-assistant policy-add jira --dry-run +```bash +nemoclaw my-assistant policy-add github --dry-run +nemoclaw my-assistant policy-add jira --dry-run ``` Apply the preset that matches the workflow: -```console -$ nemoclaw my-assistant policy-add github --yes -$ nemoclaw my-assistant policy-add jira --yes +```bash +nemoclaw my-assistant policy-add github --yes +nemoclaw my-assistant policy-add jira --yes ``` The `jira` preset intentionally allows Node.js access to Atlassian Cloud and does not allow `curl`. When validating it manually, avoid plain `curl -s` against `auth.atlassian.com`. Atlassian can return an empty redirect body even when the request succeeds. -Use an explicit status probe instead: +Use a body-visible API probe instead: -```console -$ node -e "require('https').get('https://api.atlassian.com', r => console.log(r.statusCode))" -$ curl -sS -o /dev/null -w '%{http_code}' --max-time 10 https://auth.atlassian.com +```bash +node -e "require('https').get('https://api.atlassian.com', r => console.log(r.statusCode))" +curl -sS --max-time 10 -w '\n%{http_code}\n' https://api.atlassian.com/oauth/token/accessible-resources ``` Before approval, the curl probe should report `000` or a local policy denial. -After approving the blocked request in OpenShell, it should report an HTTP -status such as `301` or `200`. +After explicitly approving curl for `api.atlassian.com` in OpenShell, it should return Atlassian's unauthenticated `401` JSON response. +That `401` is the expected success signal for this manual probe. +This manual probe proves curl reached Atlassian, but no Jira credentials were supplied. Remove access when the task is done: -```console -$ nemoclaw my-assistant policy-remove github --yes -$ nemoclaw my-assistant policy-remove jira --yes +```bash +nemoclaw my-assistant policy-remove github --yes +nemoclaw my-assistant policy-remove jira --yes ``` ## Brave Search @@ -221,9 +224,9 @@ $ nemoclaw my-assistant policy-remove jira --yes The default Balanced policy tier includes `brave`. If you chose Restricted during onboarding or removed the preset later, add it before enabling Brave Search workflows: -```console -$ nemoclaw my-assistant policy-add brave --dry-run -$ nemoclaw my-assistant policy-add brave --yes +```bash +nemoclaw my-assistant policy-add brave --dry-run +nemoclaw my-assistant policy-add brave --yes ``` The Brave Search API key is still configured separately during onboarding or through the web search setup flow. @@ -235,75 +238,106 @@ Use these presets when an agent workflow installs packages or downloads model as | Workflow | Preset | |----------|--------| | npm or Yarn packages | `npm` | -| Python packages from PyPI | `pypi` | +| Python packages from PyPI with `pip`, Python, or `uv` | `pypi` | | Homebrew packages | `brew` | | Hugging Face model or dataset access | `huggingface` | Add only the preset required for the task: -```console -$ nemoclaw my-assistant policy-add npm --yes -$ nemoclaw my-assistant policy-add pypi --yes -$ nemoclaw my-assistant policy-add brew --yes -$ nemoclaw my-assistant policy-add huggingface --yes +```bash +nemoclaw my-assistant policy-add npm --yes +nemoclaw my-assistant policy-add pypi --yes +nemoclaw my-assistant policy-add brew --yes +nemoclaw my-assistant policy-add huggingface --yes ``` Remove package access after a one-time setup task if the sandbox no longer needs it: -```console -$ nemoclaw my-assistant policy-remove npm --yes -$ nemoclaw my-assistant policy-remove pypi --yes -$ nemoclaw my-assistant policy-remove brew --yes -$ nemoclaw my-assistant policy-remove huggingface --yes +```bash +nemoclaw my-assistant policy-remove npm --yes +nemoclaw my-assistant policy-remove pypi --yes +nemoclaw my-assistant policy-remove brew --yes +nemoclaw my-assistant policy-remove huggingface --yes ``` +The `pypi` preset allows Python, `pip`, virtual-environment Python and `pip`, and `/usr/local/bin/uv` to reach PyPI endpoints. +If `uv` is installed somewhere else in the sandbox, add a custom preset for that binary path instead of broadening the maintained preset locally. + ### Homebrew Specifics The sandbox base image includes Homebrew (Linuxbrew), so applying the `brew` preset is the only step needed before installing a formula. -A `/usr/local/bin/brew` symlink puts the entry point on the sandbox `PATH`, so the agent can run `brew install ` directly: +A `/usr/local/bin/brew` wrapper puts the entry point on the sandbox `PATH` while delegating to the Linuxbrew prefix. +Installed formula commands are available from the Linuxbrew bin directory in sandbox shell sessions: -```console -$ nemoclaw my-assistant policy-add brew --yes -$ nemoclaw my-assistant exec -- brew install +```bash +nemoclaw my-assistant policy-add brew --yes +nemoclaw my-assistant exec -- brew install +nemoclaw my-assistant exec -- bash -lc '' ``` You do not need to bootstrap Homebrew, install build dependencies, or source `brew shellenv` inside the sandbox. +## Model Pricing + + + +OpenClaw's gateway fetches reference pricing from LiteLLM and OpenRouter on every start to populate `usage.cost` in session JSONL records. +The default-strict egress policy denies both hosts. +The fetch fails closed, the gateway logs `[gateway/model-pricing] LiteLLM pricing fetch failed: TypeError: fetch failed` (and the matching OpenRouter line) on every startup, and every session record records `usage.cost = 0` even though the input and output token counts populate correctly. +Tools that read the session log to display per-turn cost (audit dashboards, compliance review surfaces) cannot distinguish a real free run from this silent failure. + +Apply the `openclaw-pricing` preset to allow both pricing endpoints. +The preset pins each host to a single read-only path so it does not widen egress beyond the pricing fetch: + +```bash +nemoclaw my-assistant policy-add openclaw-pricing --dry-run +nemoclaw my-assistant policy-add openclaw-pricing --yes +``` + +After the next gateway restart, the WARN entries stop and `usage.cost` populates from the fetched pricing tables. + + + + +Hermes does not use OpenClaw's model-pricing reference fetch. + + + ## Local Inference Use `local-inference` when the sandbox needs access to host-side local inference services such as Ollama or vLLM through the OpenShell host gateway. Onboarding auto-suggests this preset when you choose a local provider. If you need to add it after onboarding: -```console -$ nemoclaw my-assistant policy-add local-inference --dry-run -$ nemoclaw my-assistant policy-add local-inference --yes +```bash +nemoclaw my-assistant policy-add local-inference --dry-run +nemoclaw my-assistant policy-add local-inference --yes ``` Then verify the sandbox status: -```console -$ nemoclaw my-assistant status +```bash +nemoclaw my-assistant status ``` ## Inspect or Replace the Live Policy Use `policy-list` for normal preset state: -```console -$ nemoclaw my-assistant policy-list +```bash +nemoclaw my-assistant policy-list ``` Use OpenShell when you need the full enforced YAML: -```console -$ openshell policy get --full my-assistant > live-policy.yaml +```bash +openshell policy get --full my-assistant > live-policy.yaml ``` If you must replace the live policy, edit the full policy file and set it back: -```console -$ openshell policy set --policy live-policy.yaml my-assistant --wait +```bash +openshell policy set --policy live-policy.yaml my-assistant --wait ``` `openshell policy set` replaces the live policy with the file you provide. @@ -312,7 +346,7 @@ Use `nemoclaw my-assistant policy-add` for maintained NemoClaw presets. ## Next Steps -- Approve or Deny Agent Network Requests (use the `nemoclaw-user-manage-policy` skill) for the interactive OpenShell TUI flow. -- Customize the Sandbox Network Policy (use the `nemoclaw-user-manage-policy` skill) for static policy edits and raw OpenShell policy files. +- [Approve or Deny Agent Network Requests](approve-network-requests.md) for the interactive OpenShell TUI flow. +- [Customize the Sandbox Network Policy](../SKILL.md) for static policy edits and raw OpenShell policy files. - Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill) for Telegram, Discord, Slack, WeChat, and WhatsApp channel configuration. - Commands (use the `nemoclaw-user-reference` skill) for the full `policy-add`, `policy-list`, `policy-remove`, and `channels` command reference. diff --git a/.agents/skills/nemoclaw-user-manage-sandboxes/SKILL.md b/.agents/skills/nemoclaw-user-manage-sandboxes/SKILL.md index dbed6690e0..7802405ca6 100644 --- a/.agents/skills/nemoclaw-user-manage-sandboxes/SKILL.md +++ b/.agents/skills/nemoclaw-user-manage-sandboxes/SKILL.md @@ -1,74 +1,89 @@ --- name: "nemoclaw-user-manage-sandboxes" -description: "Explains operational tasks after the quickstart: listing sandboxes, status and health checks, logs, diagnostics, port forwards, multiple sandboxes, credential reset, rebuilds, network presets, upgrades, and uninstall. Trigger keywords - manage nemoclaw sandboxes, nemoclaw status, nemoclaw list, nemoclaw dashboard port, nemoclaw rebuild, nemoclaw upgrade sandboxes, nemoclaw uninstall, sandbox mutability, sandbox runtime configuration, sandbox rebuild, nemoclaw backup, nemoclaw restore, workspace backup, openshell sandbox download upload, nemoclaw messaging channels, nemoclaw telegram, nemoclaw discord, nemoclaw slack, nemoclaw wechat, nemoclaw whatsapp, openshell channel messaging, nemoclaw workspace files, soul.md, user.md, identity.md, agents.md, sandbox persistence." +description: "Explains operational tasks after the quickstart: listing sandboxes, status and health checks, logs, diagnostics, port forwards, multiple sandboxes, credential reset, rebuilds, network presets, upgrades, and uninstall. Trigger keywords - manage nemoclaw sandboxes, nemoclaw status, nemoclaw list, nemoclaw dashboard port, nemoclaw rebuild, nemoclaw upgrade sandboxes, nemoclaw uninstall, sandbox mutability, sandbox runtime configuration, sandbox rebuild, nemoclaw backup, nemoclaw restore, workspace backup, openshell sandbox download upload, nemoclaw messaging channels, nemoclaw telegram, nemoclaw discord, nemoclaw slack, nemoclaw wechat, nemoclaw whatsapp, openshell channel messaging, install hermes plugins, hermes plugins nemoclaw, nemoclaw hermes plugins, nemoclaw workspace files, soul.md, user.md, identity.md, agents.md, sandbox persistence." +license: "Apache-2.0" --- - - - # Manage Sandbox Lifecycle +import { AgentOnly } from "../_components/AgentGuide"; + + Use this guide after you finish the OpenClaw quickstart (use the `nemoclaw-user-get-started` skill). + + +Use this guide after you finish Quickstart with Hermes (use the `nemoclaw-user-get-started` skill). + It covers day-two sandbox operations such as listing sandboxes, checking health, managing ports, rebuilding safely, upgrading, and uninstalling. + When a workflow uses the lower-level OpenShell CLI, see CLI Selection Guide (use the `nemoclaw-user-reference` skill) for the boundary between `nemoclaw` and `openshell`. + + +When a workflow uses the lower-level OpenShell CLI, see CLI Selection Guide (use the `nemoclaw-user-reference` skill) for the boundary between `nemoclaw`, `nemoclaw`, and `openshell`. + ## List Sandboxes List every sandbox registered on this host: -```console -$ nemoclaw list +```bash +nemoclaw list ``` -The list shows each sandbox's model, provider, policy presets, active SSH session indicator, and dashboard URL when a dashboard port is recorded. +The list shows each sandbox's model, provider, policy presets, active SSH session indicator, and dashboard URL when NemoClaw records a dashboard port. Use JSON output for scripts: -```console -$ nemoclaw list --json +```bash +nemoclaw list --json ``` ## Check Sandbox Health Check a specific sandbox's health, inference route, active connections, live policy, update status, and messaging-channel overlap warnings: -```console -$ nemoclaw my-assistant status +```bash +nemoclaw my-assistant status ``` Use the host-level status command when you want the sandbox inventory plus host auxiliary service state, such as cloudflared: -```console -$ nemoclaw status +```bash +nemoclaw status ``` ## Inspect Logs View recent sandbox logs: -```console -$ nemoclaw my-assistant logs +```bash +nemoclaw my-assistant logs ``` Stream logs while you reproduce a problem: -```console -$ nemoclaw my-assistant logs --follow +```bash +nemoclaw my-assistant logs --follow ``` + The log command reads both OpenClaw gateway output and OpenShell audit events, so policy denials appear beside gateway logs. + + +The log command reads both Hermes gateway output and OpenShell audit events, so policy denials appear beside gateway logs. + ## Collect Diagnostics Collect diagnostics for bug reports or support handoff: -```console -$ nemoclaw debug --sandbox my-assistant --output nemoclaw-debug.tar.gz +```bash +nemoclaw debug --sandbox my-assistant --output nemoclaw-debug.tar.gz ``` Use `--quick` for a smaller local summary: -```console -$ nemoclaw debug --quick --sandbox my-assistant +```bash +nemoclaw debug --quick --sandbox my-assistant ``` The debug command gathers system information, Docker state, gateway logs, and sandbox status. @@ -77,37 +92,44 @@ The debug command gathers system information, Docker state, gateway logs, and sa If the forward stopped, or the installer reported that no active forward was found and the URL does not load, restart it manually with the port from the install summary. -```console -$ openshell forward start --background my-gpt-claw +```bash +openshell forward start --background my-gpt-claw ``` To list active forwards across all sandboxes, run the following command. -```console -$ openshell forward list +```bash +openshell forward list ``` ## Run Multiple Sandboxes Each sandbox needs its own dashboard port, since `openshell forward` refuses to bind a port that another sandbox is already using. + When the default port is already held by another sandbox, `nemoclaw onboard` scans ports `18789` through `18799` and uses the next free port. + + +When the default API port is already held by another sandbox, `nemoclaw onboard` scans for the next free port and records it for the sandbox. + +If you intentionally run separate OpenShell gateways on the same host, set a different `NEMOCLAW_GATEWAY_PORT` before each onboarding run. +NemoClaw isolates the gateway name and local state by port so one port-specific gateway does not replace another. -```console -$ nemoclaw onboard # first sandbox uses 18789 -$ nemoclaw onboard # second sandbox uses the next free port, such as 18790 +```bash +nemoclaw onboard # first sandbox uses 18789 +nemoclaw onboard # second sandbox uses the next free port, such as 18790 ``` To choose a specific port, pass `--control-ui-port`: -```console -$ nemoclaw onboard --control-ui-port 19000 +```bash +nemoclaw onboard --control-ui-port 19000 ``` You can also set `CHAT_UI_URL` or `NEMOCLAW_DASHBOARD_PORT` before onboarding: -```console -$ CHAT_UI_URL=http://127.0.0.1:19000 nemoclaw onboard -$ NEMOCLAW_DASHBOARD_PORT=19000 nemoclaw onboard +```bash +CHAT_UI_URL=http://127.0.0.1:19000 nemoclaw onboard +NEMOCLAW_DASHBOARD_PORT=19000 nemoclaw onboard ``` For full details on port conflicts and overrides, refer to Port already in use (use the `nemoclaw-user-reference` skill). @@ -120,18 +142,23 @@ Recover from a misconfigured sandbox without re-running the full onboard wizard Change the active model or provider at runtime without rebuilding the sandbox: -```console -$ nemoclaw inference set --model --provider +```bash +nemoclaw inference set --model --provider ``` Refer to Switch Inference Providers (use the `nemoclaw-user-configure-inference` skill) for provider-specific model IDs and API compatibility notes. ### Restart the Gateway and Port Forward + If `nemoclaw status` reports the sandbox is alive but the gateway is not running, run the recover command instead of opening a shell. + + +If `nemoclaw status` reports the sandbox is alive but the Hermes gateway is not running, run the recover command instead of opening a shell. + -```console -$ nemoclaw recover +```bash +nemoclaw recover ``` The command restarts the in-sandbox gateway and re-establishes the dashboard port-forward in one step. @@ -140,22 +167,27 @@ Refer to `nemoclaw recover` (use the `nemoclaw-user-reference` skill) for ### Reset a Stored Credential -If a provider credential was entered incorrectly during onboarding, clear the gateway-registered value and re-enter it on the next onboard run: +If you entered a provider credential incorrectly during onboarding, clear the gateway-registered value and re-enter it on the next onboard run: -```console -$ nemoclaw credentials list # see which providers are registered -$ nemoclaw credentials reset # clear a single provider, for example nvidia-prod -$ nemoclaw onboard # re-run to re-enter the cleared provider +```bash +nemoclaw credentials list # see which providers are registered +nemoclaw credentials reset # clear a single provider, for example nvidia-prod +nemoclaw onboard # re-run to re-enter the cleared provider ``` -The credentials command is documented in full at `nemoclaw credentials reset ` (use the `nemoclaw-user-reference` skill). +The command reference documents `nemoclaw credentials reset ` (use the `nemoclaw-user-reference` skill) in full. ### Rebuild a Sandbox While Preserving Workspace State + If you changed the underlying Dockerfile, upgraded OpenClaw, or want to pick up a new base image without losing your sandbox's workspace files, use `rebuild` instead of destroying and recreating: + + +If you changed the underlying Dockerfile, upgraded Hermes, or want to pick up a new base image without losing your sandbox's state files, use `rebuild` instead of destroying and recreating: + -```console -$ nemoclaw rebuild +```bash +nemoclaw rebuild ``` Rebuild preserves the mounted workspace and registered policies while recreating the container. @@ -166,8 +198,8 @@ Refer to `nemoclaw rebuild` (use the `nemoclaw-user-reference` skill) for Apply an additional preset, such as Telegram or GitHub, to a running sandbox without re-onboarding: -```console -$ nemoclaw policy-add +```bash +nemoclaw policy-add ``` Refer to `nemoclaw policy-add` (use the `nemoclaw-user-reference` skill) for usage details and flags. @@ -176,69 +208,30 @@ Non-interactive re-onboards in the default `suggested` policy mode preserve pres To make a re-onboard authoritative, set `NEMOCLAW_POLICY_MODE=custom` and provide `NEMOCLAW_POLICY_PRESETS` with the exact list to apply; onboarding removes anything else. See `NEMOCLAW_POLICY_MODE` (use the `nemoclaw-user-reference` skill) for the full table. -## Update to the Latest Version +## Update to the Maintained Version -When a new NemoClaw release becomes available, update the `nemoclaw` CLI on your host and check existing sandboxes for stale agent/runtime versions. +When a maintained NemoClaw release becomes available, update the host CLI and then check whether existing sandboxes need rebuilds. +The standard installer follows the admin-promoted `lkg` release tag by default. +If you need a specific release, set `NEMOCLAW_INSTALL_TAG` on the `bash` side of the install pipeline. -### Update the NemoClaw CLI - -Re-run the installer. -Before it onboards anything, the installer calls `nemoclaw backup-all` (use the `nemoclaw-user-reference` skill) automatically, storing a snapshot of each running sandbox in `~/.nemoclaw/rebuild-backups/` as a safety net. -If your existing gateway is from OpenShell earlier than `0.0.37`, the installer prompts before it runs the new automatic gateway upgrade path. -The automatic path is offered only when the existing `nemoclaw` CLI supports `backup-all`; older installs must preserve sandbox state manually before retiring the gateway. -For unattended installs, set `NEMOCLAW_ACCEPT_EXPERIMENTAL_OPENSHELL_UPGRADE=1`, or manually run `nemoclaw backup-all` and `openshell gateway destroy -g nemoclaw || openshell gateway destroy` before rerunning the installer as `curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_OPENSHELL_UPGRADE_PREPARED=1 bash`. - -```console -$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash -``` - -### Upgrade Sandboxes with Stale Agent and Runtime Versions - -The installer checks registered sandboxes after onboarding succeeds and runs `nemoclaw upgrade-sandboxes --auto` for stale running sandboxes. -Use `upgrade-sandboxes` directly to verify the result, rebuild when you skipped the installer or onboarding step, or handle sandboxes that were stopped or could not be version-checked. -The upgrade flow is non-destructive by default because NemoClaw preserves manifest-defined workspace state, but a manual snapshot before any major upgrade gives you a state restore point. - -```console -$ nemoclaw snapshot create --name pre-upgrade # optional, recommended -$ nemoclaw update --yes # updates CLI through the maintained installer flow -$ nemoclaw upgrade-sandboxes --check # verify or list remaining stale/unknown sandboxes -$ nemoclaw upgrade-sandboxes # manually rebuild remaining stale running sandboxes +```bash +curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_INSTALL_TAG=v0.0.56 bash +nemoclaw upgrade-sandboxes --check ``` -`nemoclaw update` is the CLI wrapper around the same installer path as `curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash`. -Use `nemoclaw update --check` when you only want to inspect version state and see the maintained update command. - -For scripted manual rebuilds, use `nemoclaw upgrade-sandboxes --auto` to skip the confirmation prompt. +Before upgrade work, the installer runs `nemoclaw backup-all` when the installed CLI supports it. +For manual upgrade flows, create a snapshot first and then run the update or rebuild command you need: -If the upgraded sandbox needs its workspace state reverted, restore the pre-upgrade snapshot into the running sandbox. -This restores saved state directories only; it does not downgrade the sandbox image or agent/runtime: - -```console -$ nemoclaw snapshot restore pre-upgrade +```bash +nemoclaw snapshot create --name pre-upgrade +nemoclaw update --yes +nemoclaw upgrade-sandboxes --check ``` -### What Changes During a Rebuild - -Each rebuild destroys the existing container and creates a new one. -NemoClaw protects your data through the same backup-and-restore flow as `nemoclaw rebuild` (use the `nemoclaw-user-reference` skill): - -- NemoClaw preserves manifest-defined workspace state. Before deleting the old container, NemoClaw snapshots the state directories and durable state files defined in the agent manifest, typically `/sandbox/.openclaw/workspace/`; for Hermes this also includes `SOUL.md` and the SQLite database behind `.hermes/state.db`. Stored credentials (`~/.nemoclaw/credentials.json`) and registered policy presets live on the host and are re-applied to the new sandbox automatically. -- NemoClaw does not preserve runtime changes outside the workspace state directories. This includes packages installed inside the running container with `apt` or `pip`, files in non-workspace paths, and in-memory or process state. If you have customized the running container at runtime, capture that as `Dockerfile` changes for `nemoclaw onboard --from` or a manual `openshell sandbox download` before the rebuild starts. - -Aborts before the destroy step are non-destructive. -The flow refuses to proceed past preflight if a credential is missing or past backup if required manifest-defined state cannot be copied, so a failed run leaves the original sandbox intact and ready to retry. -When a backup command reports partial archive output, NemoClaw keeps the usable entries and reports only the manifest-defined paths that could not be archived. - -See Backup and Restore (use the `nemoclaw-user-manage-sandboxes` skill) for the full list of state-preservation guarantees, snapshot retention, and instructions for manual backups when the auto-flow is not enough. - -**If the rebuild aborts with `Missing credential: `:** - -The rebuild preflight reads the provider credential recorded by your last `nemoclaw onboard` session. -If you have switched providers since onboarding, for example from a remote API to a local Ollama setup, the preflight may still reference the old key and fail before any destroy step runs. - -To recover, re-run `nemoclaw onboard` and select your current provider. -This refreshes the session metadata. -Your existing container keeps serving traffic until the new image is ready. +Each rebuild destroys the old container and creates a new one, while preserving the manifest-defined workspace or agent state that NemoClaw knows how to snapshot. +Runtime changes outside those state paths, such as packages installed manually in the running container, are not preserved. +For the full state-preservation contract, snapshot restore behavior, and manual backup workflow, refer to [Backup and Restore](references/backup-restore.md). +For command flags, refer to `nemoclaw update` (use the `nemoclaw-user-reference` skill), `nemoclaw upgrade-sandboxes` (use the `nemoclaw-user-reference` skill), and `nemoclaw rebuild` (use the `nemoclaw-user-reference` skill). ## Uninstall @@ -254,9 +247,17 @@ nemoclaw uninstall | `--keep-openshell` | Leave OpenShell binaries installed. | | `--delete-models` | Also remove NemoClaw-pulled Ollama models. | -`nemoclaw uninstall` runs the version-pinned `uninstall.sh` that shipped with your installed CLI, so it does not fetch anything over the network at uninstall time. +**Note:** + +The uninstall command preserves `~/.nemoclaw/rebuild-backups/` (host-side snapshots that snapshot and `backup-all` commands write), `~/.nemoclaw/backups/` (workspace backups that `scripts/backup-workspace.sh` writes), and `~/.nemoclaw/sandboxes.json` (the sandbox registry) by default. +Uninstall removes every other entry under `~/.nemoclaw/`. +Interactive runs prompt before they remove the preserved entries; the default answer keeps them. +For non-interactive runs (`--yes`, `NEMOCLAW_NON_INTERACTIVE=1`, or a non-TTY shell), set `NEMOCLAW_UNINSTALL_DESTROY_USER_DATA=1` to acknowledge data loss and remove the preserved entries as well. +See the Commands reference (use the `nemoclaw-user-reference` skill) for the full preservation contract. + +The CLI uninstall command runs the version-pinned `uninstall.sh` that shipped with your installed CLI, so it does not fetch anything over the network at uninstall time. -If the `nemoclaw` CLI is missing or broken, fall back to the hosted script: +If the CLI is missing or broken, fall back to the hosted script: ```bash curl -fsSL https://raw.githubusercontent.com/NVIDIA/NemoClaw/refs/heads/main/uninstall.sh | bash @@ -268,15 +269,19 @@ The same `--yes`, `--keep-openshell`, and `--delete-models` flags listed above a curl -fsSL https://raw.githubusercontent.com/NVIDIA/NemoClaw/refs/heads/main/uninstall.sh | bash -s -- --yes --delete-models ``` -For a full comparison of the two forms, including what they fetch, what they trust, and when to prefer each, see `nemoclaw uninstall` vs. the hosted `uninstall.sh` (use the `nemoclaw-user-reference` skill). +For a full comparison of the two forms, including what they fetch, what they trust, and when to prefer each, refer to `nemoclaw uninstall` vs. the hosted `uninstall.sh` (use the `nemoclaw-user-reference` skill). ## References - **[references/runtime-controls.md](references/runtime-controls.md)** — Single page that answers what can change at runtime versus what requires a rebuild for NemoClaw sandboxes. - **Load [references/backup-restore.md](references/backup-restore.md)** when downloading workspace files from a sandbox, uploading restored files into a new sandbox, or preserving sandbox state across rebuilds. Backs up and restores OpenClaw workspace files before destructive operations such as sandbox rebuilds. - **Load [references/messaging-channels.md](references/messaging-channels.md)** when setting up messaging channels, chat interfaces, or integrations without relying on nemoclaw tunnel start for bridges. Explains how Telegram, Discord, Slack, WeChat, and WhatsApp reach sandboxed OpenClaw and Hermes agents through OpenShell-managed processes and NemoClaw channel commands. +- **[references/install-plugins-hermes.md](references/install-plugins-hermes.md)** — Explains how to install Hermes plugins in NemoClaw-managed sandboxes. - **Load [references/workspace-files.md](references/workspace-files.md)** when users ask about `SOUL.md`, `USER.md`, `IDENTITY.md`, `AGENTS.md`, or other workspace files, or when preparing to back up or restore workspace state. Explains what workspace personality and configuration files are, where they live, and how they persist across sandbox restarts. ## Related Skills +- [Set Up Messaging Channels](references/messaging-channels.md) to connect Telegram, Discord, or Slack. +- [Workspace Files](references/workspace-files.md) for persistent OpenClaw files inside the sandbox. +- [Backup and Restore](references/backup-restore.md) for snapshot and restore workflows. - `nemoclaw-user-monitor-sandbox` — Monitor Sandbox Activity (use the `nemoclaw-user-monitor-sandbox` skill) for observability tools diff --git a/.agents/skills/nemoclaw-user-manage-sandboxes/evals/evals.json b/.agents/skills/nemoclaw-user-manage-sandboxes/evals/evals.json new file mode 100644 index 0000000000..e4d2e3d9c0 --- /dev/null +++ b/.agents/skills/nemoclaw-user-manage-sandboxes/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-manage-sandboxes-lifecycle-001", + "question": "I'm managing a NemoClaw sandbox. Help me check status, health, logs, ports, providers, upgrades, and uninstall paths so I can operate the sandbox safely after quickstart.", + "expected_skill": "nemoclaw-user-manage-sandboxes", + "ground_truth": "A NemoClaw-specific answer that helps the user check status, health, logs, ports, providers, upgrades, and uninstall paths and gives enough concrete guidance, decision criteria, verification steps, or risk framing to operate the sandbox safely after quickstart.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-manage-sandboxes/references/backup-restore.md b/.agents/skills/nemoclaw-user-manage-sandboxes/references/backup-restore.md index 55ee3c5fa0..ccebd8b589 100644 --- a/.agents/skills/nemoclaw-user-manage-sandboxes/references/backup-restore.md +++ b/.agents/skills/nemoclaw-user-manage-sandboxes/references/backup-restore.md @@ -1,103 +1,157 @@ - - # Backup and Restore Workspace Files -Workspace files define your agent's personality, memory, and user context. -They persist across sandbox restarts but are **permanently deleted** when you run `nemoclaw destroy`. +import { AgentOnly } from "../_components/AgentGuide"; + +Workspace and state files define your agent's personality, memory, user context, and durable runtime state. +They persist across sandbox restarts but are **permanently deleted** when you destroy the sandbox. This guide covers snapshot commands, manual backup with CLI commands, and an automated script. ## When to Back Up -- **Before running `nemoclaw destroy`** + + +- Before running `nemoclaw destroy` - Before major NemoClaw version upgrades - Periodically, if you've invested time customizing your agent + + + +- Before running `nemoclaw destroy` +- Before major NemoClaw version upgrades +- Periodically, if you've invested time customizing your agent or paired messaging channels + + + ## Snapshot Commands The fastest way to back up and restore sandbox state is with the built-in snapshot commands. Snapshots capture all workspace state directories defined in the agent manifest and store them in `~/.nemoclaw/rebuild-backups//`. -Agent manifests may also declare durable top-level state files. For Hermes, -snapshots include `SOUL.md` and the SQLite database behind `.hermes/state.db` -using SQLite's online backup API, then restore that database through SQLite -instead of copying a live raw database file. -Treat snapshot directories as private local data: the Hermes database can -contain session metadata and message history needed for a faithful restore. - -```console -$ nemoclaw my-assistant snapshot create -$ nemoclaw my-assistant snapshot list -$ nemoclaw my-assistant snapshot restore +Agent manifests can also declare durable top-level state files. +For Hermes, snapshots include `SOUL.md` and the SQLite database behind `.hermes/state.db` using SQLite's online backup API, then restore that database through SQLite instead of copying a live raw database file. +Treat snapshot directories as private local data: the Hermes database can contain session metadata and message history needed for a faithful restore. + +```bash +nemoclaw my-assistant snapshot create +nemoclaw my-assistant snapshot list +nemoclaw my-assistant snapshot restore ``` -`snapshot list` prints a table of version, name, timestamp, and path. Versions (`v1`, `v2`, ..., `vN`) are computed from the timestamp order, so `vN` is always the newest snapshot. +`snapshot list` prints a table of version, name, timestamp, and path. +NemoClaw computes versions (`v1`, `v2`, ..., `vN`) from timestamp order, so `vN` is always the newest snapshot. To tag a snapshot with a human-readable label, pass `--name`: -```console -$ nemoclaw my-assistant snapshot create --name before-upgrade +```bash +nemoclaw my-assistant snapshot create --name before-upgrade ``` To restore a specific snapshot instead of the latest, pass a version, name, or timestamp prefix: -```console -$ nemoclaw my-assistant snapshot restore v3 -$ nemoclaw my-assistant snapshot restore before-upgrade -$ nemoclaw my-assistant snapshot restore 2026-04-14T +```bash +nemoclaw my-assistant snapshot restore v3 +nemoclaw my-assistant snapshot restore before-upgrade +nemoclaw my-assistant snapshot restore 2026-04-14T ``` To clone a snapshot into a different sandbox name, pass `--to `. If the destination sandbox already exists, NemoClaw refuses to overwrite it unless you pass `--force`: -```console -$ nemoclaw my-assistant snapshot restore before-upgrade --to my-assistant-clone -$ nemoclaw my-assistant snapshot restore before-upgrade --to my-assistant-clone --force --yes +```bash +nemoclaw my-assistant snapshot restore before-upgrade --to my-assistant-clone +nemoclaw my-assistant snapshot restore before-upgrade --to my-assistant-clone --force --yes ``` + + +The `nemoclaw rebuild` command uses the same snapshot mechanism automatically. +Snapshot restore performs a targeted repair for legacy `.openclaw-data` symlinks that older images created. +NemoClaw rejects unsafe symlinks and hard links inside sandbox state during backup creation before they can enter a snapshot. + + + + The `nemoclaw rebuild` command uses the same snapshot mechanism automatically. -Snapshot restore performs a targeted repair for legacy `.openclaw-data` symlinks that were created by older images. -Unsafe symlinks and hard links inside sandbox state are rejected during backup creation before they can enter a snapshot. +NemoClaw rejects unsafe symlinks and hard links inside sandbox state during backup creation before they can enter a snapshot. Credential-bearing Hermes files such as `auth.json` are intentionally excluded from snapshots. NemoClaw-regenerated Hermes config files (`config.yaml` and `.env`) are also excluded; model/provider and messaging credentials are recreated from host-side onboarding and OpenShell provider state during rebuild. + + For full details, see the Commands reference (use the `nemoclaw-user-reference` skill). ## Manual Backup Use `openshell sandbox download` to copy files from the sandbox to your host. -```console -$ SANDBOX=my-assistant -$ BACKUP_DIR=~/.nemoclaw/backups/$(date +%Y%m%d-%H%M%S) -$ mkdir -p "$BACKUP_DIR" - -$ openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/SOUL.md "$BACKUP_DIR/" -$ openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/USER.md "$BACKUP_DIR/" -$ openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/IDENTITY.md "$BACKUP_DIR/" -$ openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/AGENTS.md "$BACKUP_DIR/" -$ openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/MEMORY.md "$BACKUP_DIR/" -$ openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/memory/ "$BACKUP_DIR/memory/" + + +```bash +SANDBOX=my-assistant +BACKUP_DIR=~/.nemoclaw/backups/$(date +%Y%m%d-%H%M%S) +mkdir -p "$BACKUP_DIR" + +openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/SOUL.md "$BACKUP_DIR/" +openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/USER.md "$BACKUP_DIR/" +openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/IDENTITY.md "$BACKUP_DIR/" +openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/AGENTS.md "$BACKUP_DIR/" +openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/MEMORY.md "$BACKUP_DIR/" +openshell sandbox download "$SANDBOX" /sandbox/.openclaw/workspace/memory/ "$BACKUP_DIR/memory/" ``` + + + +```bash +SANDBOX=my-hermes +BACKUP_DIR=~/.nemoclaw/backups/$(date +%Y%m%d-%H%M%S) +mkdir -p "$BACKUP_DIR" + +openshell sandbox download "$SANDBOX" /sandbox/SOUL.md "$BACKUP_DIR/" +openshell sandbox download "$SANDBOX" /sandbox/.hermes/state.db "$BACKUP_DIR/" +openshell sandbox download "$SANDBOX" /sandbox/.hermes/platforms/ "$BACKUP_DIR/platforms/" +``` + + + ## Manual Restore Use `openshell sandbox upload` to push files back into a sandbox. -```console -$ SANDBOX=my-assistant -$ BACKUP_DIR=~/.nemoclaw/backups/20260320-120000 # pick a timestamp - -$ openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/SOUL.md" /sandbox/.openclaw/workspace/ -$ openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/USER.md" /sandbox/.openclaw/workspace/ -$ openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/IDENTITY.md" /sandbox/.openclaw/workspace/ -$ openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/AGENTS.md" /sandbox/.openclaw/workspace/ -$ openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/MEMORY.md" /sandbox/.openclaw/workspace/ -$ openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/memory/" /sandbox/.openclaw/workspace/memory/ + + +```bash +SANDBOX=my-assistant +BACKUP_DIR=~/.nemoclaw/backups/20260320-120000 # pick a timestamp + +openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/SOUL.md" /sandbox/.openclaw/workspace/ +openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/USER.md" /sandbox/.openclaw/workspace/ +openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/IDENTITY.md" /sandbox/.openclaw/workspace/ +openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/AGENTS.md" /sandbox/.openclaw/workspace/ +openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/MEMORY.md" /sandbox/.openclaw/workspace/ +openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/memory/" /sandbox/.openclaw/workspace/memory/ ``` + + + +```bash +SANDBOX=my-hermes +BACKUP_DIR=~/.nemoclaw/backups/20260320-120000 # pick a timestamp + +openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/SOUL.md" /sandbox/ +openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/state.db" /sandbox/.hermes/ +openshell sandbox upload "$SANDBOX" "$BACKUP_DIR/platforms/" /sandbox/.hermes/platforms/ +``` + + + ## Using the Backup Script + + The repository includes a convenience script at `scripts/backup-workspace.sh`. ### Backup @@ -112,14 +166,14 @@ Backup saved to /home/user/.nemoclaw/backups/20260320-120000/ (6 items) Restore from the most recent backup: -```console -$ ./scripts/backup-workspace.sh restore my-assistant +```bash +./scripts/backup-workspace.sh restore my-assistant ``` Restore from a specific timestamp: -```console -$ ./scripts/backup-workspace.sh restore my-assistant 20260320-120000 +```bash +./scripts/backup-workspace.sh restore my-assistant 20260320-120000 ``` ## Verifying a Backup @@ -136,31 +190,47 @@ USER.md memory/ ``` + + + +For Hermes, prefer the built-in snapshot commands for faithful restore of `state.db`. +Use manual `openshell sandbox download` / `openshell sandbox upload` only when you need to inspect or transfer a specific file. + + + + + ## Multi-Agent Deployments -When OpenClaw is configured with multiple named agents, each agent has its own -workspace directory (`workspace-main/`, `workspace-support/`, `workspace-ops/`, -and so on — see Multi-Agent Deployments (use the `nemoclaw-user-manage-sandboxes` skill)). +When you configure OpenClaw with multiple named agents, each agent has its own workspace directory (`workspace-main/`, `workspace-support/`, `workspace-ops/`, and so on). +Refer to [Multi-Agent Deployments](workspace-files.md#multi-agent-deployments). -`nemoclaw snapshot create` automatically discovers every `workspace-*/` -directory under the sandbox state tree and includes it in the snapshot bundle -alongside the default `workspace/`. `snapshot restore` re-applies the full -per-agent set. No manual per-workspace backup pattern is needed. +`nemoclaw snapshot create` automatically discovers every `workspace-*/` directory under the sandbox state tree and includes it in the snapshot bundle alongside the default `workspace/`. +`snapshot restore` reapplies the full per-agent set. +You do not need a manual per-workspace backup pattern. -The sandbox entrypoint ensures every per-agent workspace lives directly under -the persistent `.openclaw/` tree, so state also survives `openshell sandbox restart`. +The sandbox entrypoint ensures every per-agent workspace lives directly under the persistent `.openclaw/` tree, so state also survives `openshell sandbox restart`. ### Shared files across agents Files that operators typically want consistent across every per-agent workspace (`AGENTS.md`, shared skills, common templates) are **not** synced automatically. -Each workspace is independent; changes in one don't propagate. Operators that -need this either copy the shared files explicitly to each workspace after -editing, or maintain a host-side sync layer. Tracking shared-file tooling -(shared mount, `workspaces list` command) in -[#1260](https://github.com/NVIDIA/NemoClaw/issues/1260). +Each workspace is independent, and changes in one do not propagate. +Operators that need this either copy the shared files explicitly to each workspace after editing or maintain a host-side sync layer. +NVIDIA tracks shared-file tooling (shared mount, `workspaces list` command) in [#1260](https://github.com/NVIDIA/NemoClaw/issues/1260). + + + + +## Hermes State + +Hermes does not use OpenClaw per-agent workspace directories. +NemoClaw snapshots preserve the Hermes manifest-defined state tree and durable top-level files instead. +Refer to [Workspace Files](workspace-files.md) for the Hermes state layout. + + ## Next Steps -- Workspace Files overview (use the `nemoclaw-user-manage-sandboxes` skill) to learn what each file does +- [Workspace Files overview](workspace-files.md) to learn what each file does. - Commands reference (use the `nemoclaw-user-reference` skill) diff --git a/.agents/skills/nemoclaw-user-manage-sandboxes/references/install-plugins-hermes.md b/.agents/skills/nemoclaw-user-manage-sandboxes/references/install-plugins-hermes.md new file mode 100644 index 0000000000..31f1903af6 --- /dev/null +++ b/.agents/skills/nemoclaw-user-manage-sandboxes/references/install-plugins-hermes.md @@ -0,0 +1,114 @@ +# Install Hermes Plugins + +Hermes plugins extend the Hermes runtime inside a NemoClaw-managed sandbox. +They are different from NemoClaw skills and from OpenClaw plugins, so install them through the Hermes plugin path instead of `skill install`. + +## How Hermes Loads Plugins + +NemoClaw sets `HERMES_HOME` to `/sandbox/.hermes` when it starts the Hermes gateway. +Hermes plugin directories live under `/sandbox/.hermes/plugins/`. +NemoClaw uses the same mechanism for its built-in Hermes integration, which the sandbox image bakes into `/sandbox/.hermes/plugins/nemoclaw`. + +The built-in NemoClaw Hermes plugin provides sandbox status tools, skill reload support, managed-tool broker patches, and runtime grounding for the OpenShell sandbox. +Do not replace or remove `/sandbox/.hermes/plugins/nemoclaw` when you add your own plugin. + +## Choose an Install Path + +Today, the supported path for custom Hermes plugins is to bake the plugin into a custom sandbox image and onboard from that Dockerfile. +Use this path when the plugin adds Python code, runtime hooks, or dependencies that Hermes must see at gateway startup. + +`nemohermes skill install ` is only for `SKILL.md` agent skills. +It uploads skill instructions and refreshes skill discovery, but it does not install Hermes runtime plugins. + +## Prepare a Build Directory + +Put the custom Dockerfile and everything it needs to `COPY` in one directory. +`nemohermes onboard --from ` sends the Dockerfile's parent directory as the Docker build context. + +```text +my-hermes-plugin-sandbox/ +├── Dockerfile +└── my-hermes-plugin/ + ├── __init__.py + └── requirements.txt +``` + +If you start from the stock NemoClaw Hermes Dockerfile, keep the NemoClaw Hermes image contract intact. +The image must still include the generated Hermes config, NemoClaw Hermes plugin, blueprint files, and `nemoclaw-start` entrypoint. + +**Warning:** + +A custom `--from` Dockerfile replaces the normal NemoClaw Hermes Dockerfile. + Starting from `ghcr.io/nvidia/nemoclaw/hermes-sandbox-base:latest` alone is not enough unless your Dockerfile also preserves the NemoClaw Hermes layers from `agents/hermes/Dockerfile`. + +## Install the Plugin in the Image + +Add your plugin after the Dockerfile has created `/sandbox/.hermes`. +The example below shows the layer that copies a plugin directory into the Hermes plugin tree. + +```dockerfile +COPY my-hermes-plugin/ /opt/my-hermes-plugin/ + +USER root +RUN mkdir -p /sandbox/.hermes/plugins/my-hermes-plugin \ + && cp -a /opt/my-hermes-plugin/. /sandbox/.hermes/plugins/my-hermes-plugin/ \ + && if [ -f /opt/my-hermes-plugin/requirements.txt ]; then \ + /opt/hermes/.venv/bin/python -m pip install --no-cache-dir -r /opt/my-hermes-plugin/requirements.txt; \ + fi \ + && chown -R sandbox:sandbox /sandbox/.hermes/plugins/my-hermes-plugin \ + && chmod -R a+rX /sandbox/.hermes/plugins/my-hermes-plugin + +USER sandbox +WORKDIR /sandbox +``` + +Keep plugin code and dependency files inside the build directory. +Avoid copying host credentials, local caches, or broad home-directory contents into the image. + +## Create the Sandbox + +Run onboarding with the custom Dockerfile and an explicit sandbox name. +NemoClaw requires a name for `--from` builds so a custom image cannot silently replace the default sandbox. + +```bash +nemohermes onboard --name my-hermes-build --from ./my-hermes-plugin-sandbox/Dockerfile +``` + +For non-interactive onboarding, set the same values through environment variables. + +```bash +NEMOCLAW_NON_INTERACTIVE=1 \ +NEMOCLAW_SANDBOX_NAME=my-hermes-build \ +NEMOCLAW_FROM_DOCKERFILE=./my-hermes-plugin-sandbox/Dockerfile \ +nemohermes onboard +``` + +If you resume an interrupted onboarding run, use the same Dockerfile path that started the session. +NemoClaw records the custom Dockerfile path and rejects a resume that points at a different image source. + +## Network Access + +Hermes plugins still run inside the OpenShell sandbox boundary. +If a plugin calls an external API at runtime, add a policy preset for the required hostnames and binaries before you recreate the sandbox. + +Hermes uses Python for plugin execution, so policy entries usually need to allow the Hermes Python runtime, such as `/opt/hermes/.venv/bin/python`, in addition to any command-line wrapper your plugin starts. +For package downloads during sandbox runtime, use the `pypi` preset or a custom preset that allows the package hosts you need. + +For policy concepts, refer to Network Policies (use the `nemoclaw-user-reference` skill). +For custom preset workflows, refer to Customize Network Policy (use the `nemoclaw-user-manage-policy` skill). + +## Common Mistakes + +These are the most common places where Hermes plugin installation gets mixed up with other NemoClaw extension paths. + +- Do not use `skill install` for Hermes runtime plugins. +- Do not install Hermes plugins into `/sandbox/.openclaw/extensions`; that path is for OpenClaw plugins. +- Do not remove `/sandbox/.hermes/plugins/nemoclaw`; NemoClaw depends on that plugin for managed Hermes behavior. +- Do not put the Dockerfile in a broad directory unless you intend to send that whole directory as the Docker build context. +- Do not assume OpenShell policy allows Python package downloads during runtime by default. + +## Next Steps + +- Review NemoHermes Command Reference (use the `nemoclaw-user-reference` skill) for `nemohermes onboard --from` details. +- Review Customize Network Policy (use the `nemoclaw-user-manage-policy` skill) if the plugin needs runtime network egress. +- Review [Runtime Controls](runtime-controls.md) before changing shields or mutability settings for a plugin-enabled sandbox. diff --git a/.agents/skills/nemoclaw-user-manage-sandboxes/references/messaging-channels.md b/.agents/skills/nemoclaw-user-manage-sandboxes/references/messaging-channels.md index 38114460ad..978fd2fc2b 100644 --- a/.agents/skills/nemoclaw-user-manage-sandboxes/references/messaging-channels.md +++ b/.agents/skills/nemoclaw-user-manage-sandboxes/references/messaging-channels.md @@ -1,24 +1,36 @@ - - # Messaging Channels +import { AgentOnly } from "../_components/AgentGuide"; + Telegram, Discord, Slack, WeChat, and WhatsApp reach your OpenClaw or Hermes agent through OpenShell-managed processes and gateway constructs. For token-based channels, NemoClaw registers credentials with OpenShell providers. WeChat captures a token through a host-side QR scan during onboarding. -WhatsApp pairs inside the sandbox via QR scan and intentionally stores mutable session state there. +WhatsApp pairs inside the sandbox through a QR scan and intentionally stores mutable session state there. NemoClaw bakes the selected channel configuration into the sandbox image and keeps runtime delivery under OpenShell control. **Experimental Channels:** WeChat and WhatsApp are experimental. Both rely on QR-based pairing flows that are more fragile than token-based bots, and the upstream client libraries can change behavior without notice. -Interfaces, defaults, and supported features may change, and these channels are not recommended for production use. +Interfaces, defaults, and supported features can change, and NVIDIA does not recommend these channels for production use. + +You can enable channels during `nemoclaw onboard` or add them later with host-side `nemoclaw channels` commands. +Do not run agent-specific channel mutation commands such as `openclaw channels add` or `openclaw channels remove` inside the sandbox because NemoClaw generates `/sandbox/.openclaw/openclaw.json` at image build time, and changes inside the running container do not persist across rebuilds. + + You can enable channels during `nemoclaw onboard` or add them later with host-side `nemoclaw channels` commands. -Do not run agent-specific channel mutation commands such as `openclaw channels add` or `openclaw channels remove` inside the sandbox because NemoClaw generates `/sandbox/.openclaw/openclaw.json` for OpenClaw and `/sandbox/.hermes/.env` for Hermes at image build time, and changes inside the running container do not persist across rebuilds. +Do not mutate messaging configuration directly inside the sandbox because NemoClaw generates `/sandbox/.hermes/.env` and Hermes config at image build time, and changes inside the running container do not persist across rebuilds. + + `nemoclaw tunnel start` does not start Telegram, Discord, Slack, or other chat bridges. It only starts optional host services such as the cloudflared tunnel when that binary is present. (`nemoclaw start` is kept as a deprecated alias.) + + +`nemoclaw tunnel start` does not start Telegram, Discord, Slack, or other chat bridges. +It only starts optional host services such as the cloudflared tunnel when that binary is present. + For details, refer to Commands (use the `nemoclaw-user-reference` skill). ## Prerequisites @@ -34,14 +46,16 @@ For details, refer to Commands (use the `nemoclaw-user-reference` skill). | Telegram | `TELEGRAM_BOT_TOKEN` | `TELEGRAM_ALLOWED_IDS` for DM allowlisting, `TELEGRAM_REQUIRE_MENTION` for group-chat replies | | Discord | `DISCORD_BOT_TOKEN` | `DISCORD_SERVER_ID`, `DISCORD_USER_ID`, `DISCORD_REQUIRE_MENTION` | | Slack | `SLACK_BOT_TOKEN`, `SLACK_APP_TOKEN` | `SLACK_ALLOWED_USERS` for DM and channel `@mention` user allowlisting, `SLACK_ALLOWED_CHANNELS` for channel ID allowlisting | -| WeChat (experimental) | None. Captured via host-side QR scan during `nemoclaw onboard` | `WECHAT_ALLOWED_IDS` for DM allowlisting | -| WhatsApp (experimental) | None. Pair via QR after rebuild | None | +| WeChat (experimental) | None. Captured through host-side QR scan during `nemoclaw onboard` | `WECHAT_ALLOWED_IDS` for DM allowlisting | +| WhatsApp (experimental) | None. Pair through QR after rebuild | None | Telegram uses a bot token from [BotFather](https://t.me/BotFather). Open Telegram, send `/newbot` to [@BotFather](https://t.me/BotFather), follow the prompts, and copy the token. For Telegram group chats, disable privacy mode before testing group replies: in @BotFather, run `/setprivacy`, choose the bot, then choose **Disable**. After changing privacy mode, remove the bot from each Telegram group and add it back so Telegram applies the new delivery setting to that group. -`TELEGRAM_ALLOWED_IDS` is a comma-separated list of Telegram user IDs for DM access. +`TELEGRAM_ALLOWED_IDS` is a comma-separated list of Telegram user or private-chat IDs for DM access. +For compatibility with older QA scripts, NemoClaw also treats `TELEGRAM_AUTHORIZED_CHAT_IDS` and `TELEGRAM_CHAT_ID` as aliases, but new automation should use `TELEGRAM_ALLOWED_IDS`. +Keep these aliases until QA automation and public repro templates have stopped exporting them for at least one full release. Group chats stay open by default so rebuilt sandboxes do not silently drop Telegram group messages because of an empty group allowlist. Set `TELEGRAM_REQUIRE_MENTION=1` to make the bot reply in Telegram groups only when users mention it. Pairing and `TELEGRAM_ALLOWED_IDS` still govern direct messages. @@ -54,23 +68,25 @@ Set `DISCORD_USER_ID` to restrict access to one user; otherwise, any member of t Slack uses Socket Mode and requires two tokens. Use `SLACK_BOT_TOKEN` for the bot user OAuth token (`xoxb-...`) and `SLACK_APP_TOKEN` for the app-level Socket Mode token (`xapp-...`). +NemoClaw validates both tokens before it saves Slack credentials or enables the channel. Set `SLACK_ALLOWED_USERS` to comma-separated Slack member IDs to authorize those users for DMs and for channel `@mention` events in channels where the Slack app is present. Set `SLACK_ALLOWED_CHANNELS` to comma-separated Slack channel IDs to restrict channel `@mention` handling to those channels. When both Slack allowlists are set, NemoClaw requires the mention to come from one of the allowed channels and one of the allowed members. Channel messages still require an explicit bot mention. +During sandbox startup, NemoClaw normalizes OpenShell credential placeholders into the environment shape expected by the Slack runtime, so post-rebuild Slack starts use the gateway-managed tokens instead of literal placeholder strings. -WeChat (experimental) delivers messages over Tencent's iLink gateway via the upstream `@tencent-weixin/openclaw-weixin` plugin baked into the sandbox base image and the built-in Hermes iLink WeChat adapter. +WeChat (experimental) delivers messages over Tencent's iLink gateway through the upstream `@tencent-weixin/openclaw-weixin` plugin baked into the sandbox base image and the built-in Hermes iLink WeChat adapter. The supported mode in this release is **personal WeChat** (`bot_type=3`). WeChat Official Account and WeCom/Enterprise WeChat are not wired up. Because the bot token only exists after a successful iLink QR handshake, NemoClaw runs the QR login on the host during `nemoclaw onboard`. You scan the QR with WeChat on your phone (Discover → Scan), confirm the login, and NemoClaw captures the token, `accountId`, `baseUrl`, and `userId` from the iLink response. NemoClaw registers the token as the `-wechat-bridge` OpenShell provider and substitutes the `openshell:resolve:env:WECHAT_BOT_TOKEN` placeholder for it inside the sandbox, so the token never lands in the image or on disk inside the running container. -The non-secret per-account metadata (`WECHAT_ACCOUNT_ID`, `WECHAT_BASE_URL`, `WECHAT_USER_ID`) is baked into the sandbox image so the in-sandbox bridge can pre-seed the per-account context tokens without re-running the QR handshake. +NemoClaw bakes the non-secret per-account metadata (`WECHAT_ACCOUNT_ID`, `WECHAT_BASE_URL`, `WECHAT_USER_ID`) into the sandbox image so the in-sandbox bridge can pre-seed the per-account context tokens without re-running the QR handshake. WeChat is DM-only (`allowIdsMode: "dm"`). NemoClaw adds the operator who scanned the QR to `WECHAT_ALLOWED_IDS` automatically, and you can append more comma-separated WeChat user IDs through the same env var. -You can silence the host-side `[wechat]` diagnostic lines (poll status, IDC redirects, swallowed gateway errors) by exporting `NEMOCLAW_WECHAT_QUIET=1` once the flow is stable in your environment. +You can silence the host-side `[wechat]` diagnostic lines (poll status, IDC redirects, swallowed gateway errors) by exporting `NEMOCLAW_WECHAT_QUIET=1` after the flow is stable in your environment. Tencent's iLink gateway is a third-party service. Review your organization's terms-of-service, compliance, and data-residency constraints before enabling WeChat. @@ -78,14 +94,17 @@ Review your organization's terms-of-service, compliance, and data-residency cons WhatsApp (experimental) Web does not use a host-side token or OpenShell credential provider. NemoClaw advertises WhatsApp for both OpenClaw and Hermes sandboxes, and each agent completes pairing with its own in-sandbox command. Pairing happens inside the sandbox after the rebuild completes and creates mutable session credentials there. -Run `openshell term` and then use the agent-specific pairing command to render the QR code in the terminal: +Connect to the sandbox and then use the agent-specific pairing command to render the QR code in the terminal: -```console -$ openclaw channels login --channel whatsapp # OpenClaw sandboxes -$ hermes whatsapp # Hermes sandboxes +```bash +openclaw channels login --channel whatsapp # OpenClaw sandboxes +hermes whatsapp # Hermes sandboxes ``` -Session credentials are generated and stored inside durable agent state (`whatsapp` for OpenClaw, `platforms/whatsapp` for Hermes), so they survive rebuilds without re-pairing. +For OpenClaw sandboxes, NemoClaw validates the gateway URL before pairing and renders the WhatsApp QR code in a compact terminal form so it fits in smaller terminal windows. +If pairing exits with a gateway close such as `1008`, rerun the login command one time and then check `nemoclaw channels status --channel whatsapp` so you can diagnose the gateway/session path separately from QR rendering. + +The sandbox generates and stores session credentials inside durable agent state (`whatsapp` for OpenClaw, `platforms/whatsapp` for Hermes), so they survive rebuilds without re-pairing. This is the runtime tradeoff of enabling WhatsApp without a host bridge: a paired sandbox can use that WhatsApp account until you unpair it or clear the durable state. NemoClaw cannot detect cross-sandbox WhatsApp conflicts the way it does for token-based channels. Pair only one sandbox per WhatsApp account at a time. @@ -94,6 +113,7 @@ Pair only one sandbox per WhatsApp account at a time. When the wizard reaches **Messaging channels**, it lists Telegram, Discord, Slack, WeChat, and WhatsApp. Press a channel number to toggle it on or off, then press **Enter** when done. +If you select no channels, pressing **Enter** skips messaging setup. If a token-based channel token is not already in the environment or credential store, the wizard prompts for it and saves it. If you enable WeChat (experimental), the wizard does not prompt for a paste token. @@ -107,15 +127,15 @@ NemoClaw also selects the matching network policy preset during policy setup so For scripted setup, export the credentials and optional settings for the channels you want to enable before you run onboarding: -```console -$ export TELEGRAM_BOT_TOKEN= -$ export TELEGRAM_REQUIRE_MENTION=1 -$ export DISCORD_BOT_TOKEN= -$ export DISCORD_SERVER_ID= -$ export SLACK_BOT_TOKEN= -$ export SLACK_APP_TOKEN= -$ export SLACK_ALLOWED_USERS= -$ export SLACK_ALLOWED_CHANNELS= +```bash +export TELEGRAM_BOT_TOKEN= +export TELEGRAM_REQUIRE_MENTION=1 +export DISCORD_BOT_TOKEN= +export DISCORD_SERVER_ID= +export SLACK_BOT_TOKEN= +export SLACK_APP_TOKEN= +export SLACK_ALLOWED_USERS= +export SLACK_ALLOWED_CHANNELS= ``` This release does not support non-interactive WeChat configuration because the iLink QR handshake requires a human to scan the QR on a paired phone. @@ -123,8 +143,8 @@ Run `nemoclaw onboard` interactively when you want to enable WeChat. Then run onboarding: -```console -$ nemoclaw onboard +```bash +nemoclaw onboard ``` Complete the rest of the wizard so the blueprint can create OpenShell providers where needed (for example `-telegram-bridge` or `-wechat-bridge`), bake channel configuration into the image (`NEMOCLAW_MESSAGING_CHANNELS_B64`), and start the sandbox. @@ -134,47 +154,56 @@ Complete the rest of the wizard so the blueprint can create OpenShell providers Run channel commands from the host, not from inside the sandbox. Use `channels list` to see the supported channel names: -```console -$ nemoclaw my-assistant channels list +```bash +nemoclaw my-assistant channels list ``` Add the channel you want: -```console -$ nemoclaw my-assistant channels add telegram -$ nemoclaw my-assistant channels add discord -$ nemoclaw my-assistant channels add slack -$ nemoclaw my-assistant channels add wechat -$ nemoclaw my-assistant channels add whatsapp +```bash +nemoclaw my-assistant channels add telegram +nemoclaw my-assistant channels add discord +nemoclaw my-assistant channels add slack +nemoclaw my-assistant channels add wechat +nemoclaw my-assistant channels add whatsapp ``` `channels add` collects whatever each channel needs. It prompts for Telegram, Discord, and Slack tokens, runs an interactive host-side QR scan for WeChat, and collects nothing for WhatsApp because pairing happens in-sandbox after rebuild. -It registers bridge providers with the OpenShell gateway when tokens were captured, records the channel in the sandbox registry, and asks whether to rebuild immediately. +It registers bridge providers with the OpenShell gateway when it captures tokens, records the channel in the sandbox registry, and asks whether to rebuild immediately. The command accepts mixed-case input such as `Telegram`, then stores and prints the canonical lowercase channel name. -If a matching built-in network policy preset exists, `channels add` applies it to the sandbox automatically before the rebuild so the bridge has egress to its upstream API. -If applying the preset fails, NemoClaw warns and tells you to re-apply manually with `nemoclaw policy-add ` after the rebuild. +`channels add` requires the matching built-in network policy preset YAML to be present. +A missing or malformed preset YAML (no `network_policies:` section) aborts the command before any token prompt, registry write, or rebuild prompt, so the sandbox never advertises a channel without a matching network policy. +With the preset file in place, `channels add` applies it to the sandbox before the rebuild so the bridge has egress to its upstream API. +When the apply step itself fails after the registry write on a fresh add, NemoClaw attempts to roll back the bridge providers, the `messagingChannels` entry, and any staged environment credentials, then exits without prompting for a rebuild; if any gateway-side step (provider detach or delete) fails the rollback continues and prints a `Rollback could not fully clean ` warning so the operator can clean up manually. +When the same failure happens on a re-add of an already-enabled channel, NemoClaw restores the prior `messagingChannels` entry, restores staged environment credentials when available, restores registry credential hashes, and attempts to re-upsert the prior bridge providers. +It flags `gateway-providers` as residual because the in-flight upsert can leave the gateway with the new token. +Verify the gateway bridge before relying on the channel. +Restore the preset YAML and re-run `nemoclaw channels add `. Choose the rebuild so the running sandbox image picks up the new channel. +For Telegram, Discord, and Slack, `channels add` also checks the rebuilt runtime for the selected bridge and reports startup, credential, or missing-plugin warnings before returning. If you need optional channel settings such as `TELEGRAM_ALLOWED_IDS`, `TELEGRAM_REQUIRE_MENTION`, `DISCORD_SERVER_ID`, `DISCORD_USER_ID`, `DISCORD_REQUIRE_MENTION`, `SLACK_ALLOWED_USERS`, or `SLACK_ALLOWED_CHANNELS`, export them before the rebuild starts. +Telegram Bot API `sendMessage` calls prove outbound delivery from the bot; to test inbound agent replies, send a message from the Telegram client as an allowed user. +For a repeatable live Telegram reply check, run `test/e2e/test-messaging-providers.sh` with `TELEGRAM_BOT_TOKEN_REAL`, `TELEGRAM_AUTHORIZED_CHAT_IDS` or `TELEGRAM_CHAT_ID`, and `NEMOCLAW_TELEGRAM_INBOUND_REPLY_E2E=1`. If you defer the rebuild, apply the change later: -```console -$ nemoclaw my-assistant rebuild +```bash +nemoclaw my-assistant rebuild ``` In non-interactive mode, set the required environment variables before running `channels add`. Missing credentials fail fast, and the command queues the change for a manual rebuild: -```console -$ NEMOCLAW_NON_INTERACTIVE=1 TELEGRAM_BOT_TOKEN= \ +```bash +NEMOCLAW_NON_INTERACTIVE=1 TELEGRAM_BOT_TOKEN= \ nemoclaw my-assistant channels add telegram -$ nemoclaw my-assistant rebuild +nemoclaw my-assistant rebuild ``` For Discord server access after onboarding, include the server settings when you add the channel and rebuild: -```console -$ DISCORD_BOT_TOKEN= \ +```bash +DISCORD_BOT_TOKEN= \ DISCORD_SERVER_ID= \ DISCORD_REQUIRE_MENTION=1 \ nemoclaw my-assistant channels add discord @@ -185,15 +214,15 @@ $ DISCORD_BOT_TOKEN= \ `channels add wechat` (experimental) follows the same shape as the other channels with two differences driven by the iLink QR handshake. First, the command does not prompt for a paste token. -Instead, it renders a QR code in your terminal, polls Tencent's iLink gateway, and captures both the bot token and the per-account metadata (`accountId`, `baseUrl`, `userId`) once you scan the QR with WeChat on your phone (Discover → Scan). +Instead, it renders a QR code in your terminal, polls Tencent's iLink gateway, and captures both the bot token and the per-account metadata (`accountId`, `baseUrl`, `userId`) after you scan the QR with WeChat on your phone (**Discover** > **Scan**). The login has an eight-minute deadline and refreshes the QR up to three times on expiry. Keep the terminal in the foreground until you see `✓ WeChat login confirmed`. Second, the command requires an interactive terminal. Non-interactive mode (`NEMOCLAW_NON_INTERACTIVE=1`) fails fast with a clear error because the QR handshake needs a paired phone. -```console -$ nemoclaw my-assistant channels add wechat +```bash +nemoclaw my-assistant channels add wechat ``` If `WECHAT_BOT_TOKEN` is already cached for this sandbox (the operator onboarded with WeChat earlier), `channels add wechat` reuses the cached token and skips the QR scan to keep the upstream plugin's existing iLink session intact. @@ -209,9 +238,9 @@ Rebuild the sandbox after the update so the image reflects the current channel s To remove a channel and clear its stored credentials, run: -```console -$ nemoclaw my-assistant channels remove telegram -$ nemoclaw my-assistant channels remove wechat +```bash +nemoclaw my-assistant channels remove telegram +nemoclaw my-assistant channels remove wechat ``` `channels remove wechat` clears the bot token, deletes the `-wechat-bridge` OpenShell provider, and drops `wechat` from the sandbox's enabled-channel set. @@ -223,21 +252,27 @@ The cleanup tries `openshell sandbox exec` and falls back to SSH if that does no If neither transport can reach a running sandbox for a QR-paired channel, the command exits non-zero and asks you to start the sandbox and re-run. NemoClaw deliberately leaves the registry, policy preset, and `session.policyPresets` unchanged on that failure path, so a follow-up re-run completes the removal cleanly. -`channels remove whatsapp` clears the client-side Baileys session inside the sandbox; it cannot deregister the linked device with WhatsApp's servers because that requires an active Baileys connection to issue the logout RPC, which we no longer have once the session files are gone. +`channels remove whatsapp` clears the client-side Baileys session inside the sandbox. +It cannot deregister the linked device with WhatsApp's servers because that requires an active Baileys connection to issue the logout RPC, and the command no longer has that connection after it removes the session files. The phone account will continue to list the sandbox as a Linked Device until you remove it manually from your phone (Settings → Linked Devices → tap the entry → Log out) or until WhatsApp's 14-day inactivity timeout expires. -Removing the entry from the phone is recommended if you plan to re-pair the same phone with a different sandbox. +Remove the entry from the phone if you plan to re-pair the same phone with a different sandbox. Use `channels stop` when you want to pause a bridge without deleting credentials: -```console -$ nemoclaw my-assistant channels stop telegram -$ nemoclaw my-assistant channels start telegram +```bash +nemoclaw my-assistant channels stop telegram +nemoclaw my-assistant channels start telegram -$ nemoclaw my-assistant channels stop wechat -$ nemoclaw my-assistant channels start wechat +nemoclaw my-assistant channels stop wechat +nemoclaw my-assistant channels start wechat ``` + For WeChat specifically, `channels stop wechat` followed by a rebuild keeps the per-account state files under `/sandbox/.openclaw/openclaw-weixin/accounts/` intact even though the bridge is no longer wired up in `openclaw.json`. + + +For WeChat specifically, `channels stop wechat` followed by a rebuild keeps the per-account state files under `/sandbox/.hermes/` intact even though the bridge is no longer wired up in Hermes config. + A subsequent `channels start wechat` plus rebuild revives the bridge against the same iLink account without a fresh QR scan. The bot token is held by the OpenShell provider across the stop/start cycle. @@ -254,7 +289,12 @@ Re-run `channels add ` with the intended token to refresh the stored no ## Stop Messaging Delivery Use `channels stop` when you want to pause one bridge and keep the sandbox running. + Use `nemoclaw tunnel stop` or its deprecated alias `nemoclaw stop` when you want to stop host auxiliary services and also ask NemoClaw to stop the OpenClaw gateway inside the selected sandbox. + + +Use `nemoclaw tunnel stop` when you want to stop host auxiliary services and also ask NemoClaw to stop the Hermes gateway inside the selected sandbox. + Stopping the in-sandbox gateway stops Telegram, Discord, Slack, WeChat, and WhatsApp polling for that sandbox until you restart the sandbox or gateway. ## Confirm Delivery @@ -265,17 +305,29 @@ Use the matching policy preset (`telegram`, `discord`, `slack`, `wechat`, or `wh ## Tunnel Command + When the host has `cloudflared`, `nemoclaw tunnel start` starts a cloudflared tunnel that can expose the dashboard with a public URL. + + +When the host has `cloudflared`, `nemoclaw tunnel start` starts a cloudflared tunnel that can expose the forwarded Hermes endpoint with a public URL. + Set `CLOUDFLARE_TUNNEL_TOKEN` before running the command when you want to use a Cloudflare named tunnel instead of a generated quick-tunnel URL. + `nemoclaw tunnel stop` stops the tunnel and asks NemoClaw to stop the in-sandbox gateway for the selected or default sandbox. The older `nemoclaw start` still works as a deprecated alias. + + +`nemoclaw tunnel stop` stops the tunnel and asks NemoClaw to stop the in-sandbox gateway for the selected or default sandbox. + -```console -$ nemoclaw tunnel start +```bash +nemoclaw tunnel start ``` ## Related Topics + - Deploy NemoClaw to a Remote GPU Instance (use the `nemoclaw-user-deploy-remote` skill) for remote deployment with messaging. + - Architecture (use the `nemoclaw-user-reference` skill) for how providers, the gateway, and the sandbox fit together. - Commands (use the `nemoclaw-user-reference` skill) for `channels add`, `channels remove`, `channels start`, `channels stop`, `tunnel start`, `tunnel stop`, and `status`. diff --git a/.agents/skills/nemoclaw-user-manage-sandboxes/references/runtime-controls.md b/.agents/skills/nemoclaw-user-manage-sandboxes/references/runtime-controls.md index 63689cb3fb..4e13098c3e 100644 --- a/.agents/skills/nemoclaw-user-manage-sandboxes/references/runtime-controls.md +++ b/.agents/skills/nemoclaw-user-manage-sandboxes/references/runtime-controls.md @@ -1,41 +1,81 @@ - - # Runtime Controls and Sandbox Mutability +import { AgentOnly } from "../_components/AgentGuide"; + This page explains which parts of a running NemoClaw sandbox can change immediately and which changes require a rebuild or re-onboard. -## What you can change at runtime +## What You Can Change at Runtime -NemoClaw applies its security posture in three layers — what is baked into the sandbox image at onboard, what is hot-reloadable on the running sandbox, and what requires a rebuild or re-onboard. +NemoClaw applies its security posture in three layers: what onboarding bakes into the sandbox image, what the running sandbox can hot-reload, and what requires a rebuild or re-onboard. The table below maps each commonly changed item to the layer that owns it and the command that changes it. + + | Item | When the change takes effect | How to change it | |---|---|---| -| Inference provider (cloud, NVIDIA Endpoints, local Ollama / vLLM, compatible-endpoint, …) | Rebuild required (`openclaw.json` is locked at sandbox creation) | `nemoclaw rebuild` after picking a different provider via `nemoclaw inference set` | +| Inference provider (cloud, NVIDIA Endpoints, local Ollama / vLLM, compatible-endpoint, …) | Rebuild required (`openclaw.json` is locked at sandbox creation) | `nemoclaw rebuild` after picking a different provider with `nemoclaw inference set` | | Inference model on the current provider | Rebuild required for OpenClaw; hot-reloadable for managed routers | `nemoclaw rebuild` (OpenClaw) or `nemoclaw inference set` (router-based) | | Sub-agent (Hermes / OpenClaw / …) | Re-onboard required (the sub-agent and its workspace are baked at onboard) | `nemoclaw onboard --recreate-sandbox` | -| Network policy preset (slack, discord, telegram, brave, …) | Runtime — applies on the next request; rebuild only required if the preset adds bind-mounted secrets | `nemoclaw policy-add ` / `policy-remove ` | -| Network allow-list (custom hosts) | Runtime — picks up at next request | `openshell policy set` or interactive approval prompt at the gateway | +| Network policy preset (slack, discord, telegram, brave, …) | Runtime. Applies on the next request; rebuild only required if the preset adds bind-mounted secrets | `nemoclaw policy-add ` / `policy-remove ` | +| Network allow-list (custom hosts) | Runtime. Picks up at next request | `openshell policy set` or interactive approval prompt at the gateway | | Channel tokens (Slack / Discord / Telegram bot credentials) | Rebuild required (tokens are baked into the sandbox image at onboard so they never leave the host clear-text) | `nemoclaw channels add ` then accept the rebuild prompt | | Channel enable/disable (turn a configured channel off without removing the token) | Rebuild required (`openclaw.json` is the source of truth at runtime, see #3453) | `nemoclaw channels stop ` then rebuild | -| Dashboard forward port | Runtime — port is re-resolved on next `connect` | `NEMOCLAW_DASHBOARD_PORT= nemoclaw connect` | -| Dashboard bind address (loopback vs all interfaces) | Runtime — applies on next `connect` | `NEMOCLAW_DASHBOARD_BIND=0.0.0.0 nemoclaw connect` (see #3259) | -| Web search backend (Brave, Tavily, etc.) | Runtime via `web.backend` config flag; rebuild only if `web.fetchEnabled` flips | `nemoclaw config set --key web.backend --value tavily` | -| Filesystem layout (Landlock zones, read-only mounts, container caps) | **Locked at creation** — no runtime change | Re-onboard with `nemoclaw onboard --recreate-sandbox` | +| Dashboard forward port | Runtime. Port is re-resolved on next `connect` | `NEMOCLAW_DASHBOARD_PORT= nemoclaw connect` | +| Dashboard bind address (loopback compared to all interfaces) | Runtime. Applies on next `connect` | `NEMOCLAW_DASHBOARD_BIND=0.0.0.0 nemoclaw connect` (see #3259) | +| Web search backend (Brave, Tavily, and so on) | Runtime through `web.backend` config flag; rebuild only if `web.fetchEnabled` flips | `nemoclaw config set --key web.backend --value tavily` | +| Filesystem layout (Landlock zones, read-only mounts, container caps) | **Locked at creation**. No runtime change | Re-onboard with `nemoclaw onboard --recreate-sandbox` | | Sandbox name | **Locked at creation** | Re-onboard with a different `--name` | | GPU passthrough enable / device selector | **Locked at creation** | Re-onboard with `--gpu` / `--sandbox-gpu-device` | -| Agents allow-list (`agents.list` in `openclaw.json`) | Runtime — hot-reloaded by OpenClaw on config change | Prefer agent or NemoClaw commands that keep host and sandbox state aligned | -| `openclaw.json` keys (general — model, agents.list, web.backend, channel config, etc.) | Mixed. Individual keys still follow the rebuild rules in the rows above, such as provider switch requiring rebuild even after editing the JSON. | Prefer NemoClaw host commands so the host registry and rebuilt image stay aligned | +| Agents allow-list (`agents.list` in `openclaw.json`) | Runtime. OpenClaw hot-reloads on config change | Prefer agent or NemoClaw commands that keep host and sandbox state aligned | +| `openclaw.json` keys (general: model, agents.list, web.backend, channel config, and so on) | Mixed. Individual keys still follow the rebuild rules in the rows above, such as provider switch requiring rebuild even after editing the JSON. | Prefer NemoClaw host commands so the host registry and rebuilt image stay aligned | If a row above conflicts with what you observe, the runtime source of truth inside the sandbox is `/opt/nemoclaw/openclaw.json`; the host registry caches metadata but the image and OpenClaw read from the in-sandbox file. -## See also + + + +| Item | When the change takes effect | How to change it | +|---|---|---| +| Inference provider (cloud, NVIDIA Endpoints, local Ollama / vLLM, compatible-endpoint, …) | Runtime route changes apply immediately; rebuild if you need to rebake model metadata into the image | `nemoclaw inference set` for route changes, or `nemoclaw rebuild` after changing build-time settings | +| Inference model on the current provider | Hot-reloadable through the Hermes config sync path | `nemoclaw inference set` | +| Agent runtime (Hermes compared to OpenClaw) | Re-onboard required (the agent and its state layout are baked at onboard) | `nemoclaw onboard --recreate-sandbox` or `nemoclaw onboard --agent openclaw --recreate-sandbox` | +| Network policy preset (slack, discord, telegram, brave, …) | Runtime. Applies on the next request; rebuild only required if the preset adds bind-mounted secrets | `nemoclaw policy-add ` / `policy-remove ` | +| Network allow-list (custom hosts) | Runtime. Picks up at next request | `openshell policy set` or interactive approval prompt at the gateway | +| Channel tokens (Slack / Discord / Telegram bot credentials) | Rebuild required (tokens are baked into the sandbox image at onboard so they never leave the host clear-text) | `nemoclaw channels add ` then accept the rebuild prompt | +| Channel enable/disable (turn a configured channel off without removing the token) | Rebuild required (`/sandbox/.hermes/.env` and Hermes config are baked at image build time) | `nemoclaw channels stop ` then rebuild | +| API/dashboard forward port | Runtime. Port is re-resolved on next `connect` | `nemoclaw connect` or `openshell forward start` | +| Filesystem layout (Landlock zones, read-only mounts, container caps) | **Locked at creation**. No runtime change | Re-onboard with `nemoclaw onboard --recreate-sandbox` | +| Sandbox name | **Locked at creation** | Re-onboard with a different `--name` | +| GPU passthrough enable / device selector | **Locked at creation** | Re-onboard with `--gpu` / `--sandbox-gpu-device` | +| Hermes `config.yaml` keys | Mixed. Inference keys can be patched by `nemoclaw inference set`; image, policy, and channel changes still require rebuild. | Prefer NemoClaw host commands so the host registry and rebuilt image stay aligned | + +If a row above conflicts with what you observe, the runtime source of truth for +Hermes is `/sandbox/.hermes/config.yaml` plus `/sandbox/.hermes/.env`; the host +registry caches metadata but the image and Hermes runtime read from the +in-sandbox files. + + + +## See Also The mutability table above is a consolidated index of information that lives in more detail on per-topic pages: -- Manage Sandbox Lifecycle (use the `nemoclaw-user-manage-sandboxes` skill) — full rebuild / re-onboard / upgrade workflow. -- Switch Inference Providers (use the `nemoclaw-user-configure-inference` skill) — the rebuild path for provider and model changes. -- Customize Network Policy (use the `nemoclaw-user-manage-policy` skill) and Approve Network Requests (use the `nemoclaw-user-manage-policy` skill) — runtime policy editing and operator approval flow. -- Security Best Practices (use the `nemoclaw-user-configure-security` skill) — the per-attack-surface posture table that this page complements. -- OpenClaw Security Controls (use the `nemoclaw-user-configure-security` skill) — application-layer controls that operate independently of NemoClaw. -- CLI Commands Reference (use the `nemoclaw-user-reference` skill) — full flag surface for every `nemoclaw` command, including the env vars that affect runtime behavior. + + +- [Manage Sandbox Lifecycle](../SKILL.md) for the full rebuild, re-onboard, and upgrade workflow. +- Switch Inference Providers (use the `nemoclaw-user-configure-inference` skill) for the rebuild path for provider and model changes. +- Customize Network Policy (use the `nemoclaw-user-manage-policy` skill) and Approve Network Requests (use the `nemoclaw-user-manage-policy` skill) for runtime policy editing and operator approval flow. +- Security Best Practices (use the `nemoclaw-user-configure-security` skill) for the per-attack-surface posture table that this page complements. +- OpenClaw Security Controls (use the `nemoclaw-user-configure-security` skill) for application-layer controls that operate independently of NemoClaw. +- CLI Commands Reference (use the `nemoclaw-user-reference` skill) for the full flag surface for every `nemoclaw` command, including the environment variables that affect runtime behavior. + + + + +- [Manage Sandbox Lifecycle](../SKILL.md) for the full rebuild, re-onboard, and upgrade workflow. +- Switch Inference Providers (use the `nemoclaw-user-configure-inference` skill) for the runtime route and rebuild paths for provider and model changes. +- Customize Network Policy (use the `nemoclaw-user-manage-policy` skill) and Approve Network Requests (use the `nemoclaw-user-manage-policy` skill) for runtime policy editing and operator approval flow. +- Security Best Practices (use the `nemoclaw-user-configure-security` skill) for the per-attack-surface posture table that this page complements. +- CLI Commands Reference (use the `nemoclaw-user-reference` skill) for the full flag surface for every `nemoclaw` and `nemoclaw` command, including the environment variables that affect runtime behavior. + + diff --git a/.agents/skills/nemoclaw-user-manage-sandboxes/references/workspace-files.md b/.agents/skills/nemoclaw-user-manage-sandboxes/references/workspace-files.md index f1a96b913f..a6d1a13b62 100644 --- a/.agents/skills/nemoclaw-user-manage-sandboxes/references/workspace-files.md +++ b/.agents/skills/nemoclaw-user-manage-sandboxes/references/workspace-files.md @@ -1,7 +1,9 @@ - - # Workspace Files +import { AgentOnly } from "../_components/AgentGuide"; + + + OpenClaw stores its personality, user context, and behavioral configuration in a set of Markdown files inside the sandbox. These files live at `/sandbox/.openclaw/workspace/` and are collectively called **workspace files**. @@ -11,7 +13,7 @@ These files live at `/sandbox/.openclaw/workspace/` and are collectively called |---|---| | `SOUL.md` | Defines the agent's persona, tone, and communication style. | | `USER.md` | Stores information about the human the agent assists. | -| `IDENTITY.md` | Short identity card — name, language, emoji, creature type. | +| `IDENTITY.md` | Short identity card with name, language, emoji, and creature type. | | `AGENTS.md` | Behavioral rules, memory conventions, safety guidelines, and session workflow. | | `MEMORY.md` | Curated long-term memory distilled from daily notes. | | `memory/` | Directory of daily note files (`YYYY-MM-DD.md`) for session continuity. | @@ -35,7 +37,7 @@ All workspace files reside inside the sandbox filesystem: ## Multi-Agent Deployments A single NemoClaw sandbox can host more than one OpenClaw agent. -When OpenClaw is configured with multiple named agents (e.g., a shared `main` agent +When you configure OpenClaw with multiple named agents (for example, a shared `main` agent plus per-user agents for a Teams-integrated deployment), each agent gets its own workspace directory alongside the default `workspace/`: @@ -49,27 +51,23 @@ workspace directory alongside the default `workspace/`: Each per-agent workspace contains the same Markdown file structure as the default (`SOUL.md`, `USER.md`, `IDENTITY.md`, `AGENTS.md`, `MEMORY.md`, `memory/`). -Files are per-agent — changes in `workspace-main/AGENTS.md` are not visible to +Files are per-agent. Changes in `workspace-main/AGENTS.md` are not visible to `workspace-support/`. -Persistence and snapshots are handled automatically for per-agent workspaces: -the sandbox entrypoint provisions each `workspace-/` directly under the -writable `.openclaw/` tree so state survives sandbox restart, and -`nemoclaw snapshot create` discovers every `workspace-/` directory -and includes it in the snapshot bundle alongside the default `workspace/`. +NemoClaw handles persistence and snapshots automatically for per-agent workspaces: +the sandbox entrypoint provisions each `workspace-/` directly under the writable `.openclaw/` tree so state survives sandbox restart, and `nemoclaw snapshot create` discovers every `workspace-/` directory and includes it in the snapshot bundle alongside the default `workspace/`. **Note:** Files that operators typically want consistent across every agent workspace (`AGENTS.md`, shared skills, common templates) are not synced automatically. -Each workspace is independent; changes in one don't propagate. Tracking -shared-file tooling (shared mount, `workspaces list` command) in -[#1260](https://github.com/NVIDIA/NemoClaw/issues/1260). +Each workspace is independent, and changes in one do not propagate. +NVIDIA tracks shared-file tooling (shared mount, `workspaces list` command) in [#1260](https://github.com/NVIDIA/NemoClaw/issues/1260). ## Persistence Behavior Workspace files live in the sandbox's persistent state volume, not in the container image. -This means they survive normal container restarts, but they are deleted when you destroy the sandbox. +They survive normal container restarts, but NemoClaw deletes them when you destroy the sandbox. ### Preserved During Restart, Rebuild, and Upgrade @@ -83,12 +81,12 @@ It does not continue with a partial backup. ### Deleted During Sandbox Destroy Running `nemoclaw destroy` deletes the sandbox and its persistent state volume. -Workspace files are removed from the sandbox unless you created a snapshot or backup first. +NemoClaw removes workspace files from the sandbox unless you created a snapshot or backup first. **Warning:** Back up your workspace files before running `nemoclaw destroy`. -See Backup and Restore (use the `nemoclaw-user-manage-sandboxes` skill) for instructions. +See [Backup and Restore](backup-restore.md) for instructions. ## Editing Workspace Files @@ -101,5 +99,28 @@ You can edit them in two ways: ## Next Steps - Set Up Task-Specific Sub-Agents (use the `nemoclaw-user-configure-inference` skill) -- Backup and Restore workspace files (use the `nemoclaw-user-manage-sandboxes` skill) +- [Backup and Restore workspace files](backup-restore.md) - Commands reference (use the `nemoclaw-user-reference` skill) + + + + +Hermes stores durable agent state under `/sandbox/.hermes/` instead of the OpenClaw workspace directory. +The main Hermes configuration lives in `/sandbox/.hermes/config.yaml`, environment settings live in `/sandbox/.hermes/.env`, and runtime state such as logs, memory, platform sessions, and the SQLite state database lives under the same `.hermes` tree. + +## Important Hermes State + +| Path | Purpose | +|---|---| +| `/sandbox/.hermes/config.yaml` | NemoClaw-generated Hermes runtime configuration. | +| `/sandbox/.hermes/.env` | NemoClaw-generated environment and messaging placeholders. | +| `/sandbox/.hermes/state.db` | Hermes SQLite state database. | +| `/sandbox/.hermes/platforms/` | Messaging platform state, including QR-paired sessions such as WhatsApp. | +| `/sandbox/.hermes/logs/` | Hermes runtime logs. | +| `/sandbox/SOUL.md` | Durable top-level Hermes persona file preserved by NemoClaw snapshots. | + +## Editing State + +Prefer NemoClaw host commands for generated configuration such as model, provider, messaging, and policy settings. +Direct edits to `/sandbox/.hermes/config.yaml` or `/sandbox/.hermes/.env` can be overwritten by rebuilds. +Use `nemoclaw connect` when you need to inspect runtime files interactively, or use `openshell sandbox download` and `openshell sandbox upload` for manual file transfer. diff --git a/.agents/skills/nemoclaw-user-monitor-sandbox/SKILL.md b/.agents/skills/nemoclaw-user-monitor-sandbox/SKILL.md index a8a11c631e..1915fb8619 100644 --- a/.agents/skills/nemoclaw-user-monitor-sandbox/SKILL.md +++ b/.agents/skills/nemoclaw-user-monitor-sandbox/SKILL.md @@ -1,11 +1,9 @@ --- name: "nemoclaw-user-monitor-sandbox" description: "Inspects sandbox health, traces agent behavior, and diagnoses problems. Use when monitoring a running sandbox, debugging agent issues, or checking sandbox logs. Trigger keywords - monitor nemoclaw sandbox, debug nemoclaw agent issues." +license: "Apache-2.0" --- - - - # Monitor Sandbox Activity and Debug Issues ## Prerequisites @@ -13,25 +11,27 @@ description: "Inspects sandbox health, traces agent behavior, and diagnoses prob - A running NemoClaw sandbox. - The OpenShell CLI on your `PATH`. +import { AgentOnly } from "../_components/AgentGuide"; + Use the NemoClaw status, logs, and TUI tools together to inspect sandbox health, trace agent behavior, and diagnose problems. ## Check Sandbox Health Run the status command to view the sandbox state, gateway health, and active inference configuration: -```console -$ nemoclaw status +```bash +nemoclaw status ``` For local Ollama and local vLLM routes, `nemoclaw status` also probes the host-side health endpoint directly. -This catches a stopped local backend before you retry `inference.local` from inside the sandbox. +This check catches a stopped local backend before you retry `inference.local` from inside the sandbox. -Key fields in the output include the following: +Key output fields include: -- Sandbox details, which show the configured model, provider, GPU mode, and applied policy presets. -- Gateway and process health, which show whether NemoClaw can still reach the OpenShell gateway and whether the in-sandbox agent process is running. -- Inference health for local Ollama and local vLLM, which shows `healthy` or `unreachable` together with the probed local URL. -- NIM status, which shows whether a NIM container is running and healthy when that path is in use. +- Sandbox details show the configured model, provider, GPU mode, and applied policy presets. +- Gateway and process health show whether NemoClaw can still reach the OpenShell gateway and whether the in-sandbox agent process is running. +- Inference health for local Ollama and local vLLM shows `healthy` or `unreachable` together with the probed local URL. +- NIM status shows whether a NIM container is running and healthy when that path is in use. Run `nemoclaw status` on the host to check sandbox state. Use `openshell sandbox list` for the underlying sandbox details. @@ -40,22 +40,51 @@ Use `openshell sandbox list` for the underlying sandbox details. Stream the most recent log output from the blueprint runner and sandbox: -```console -$ nemoclaw logs +```bash +nemoclaw logs ``` To follow the log output in real time: +```bash +nemoclaw logs --follow +``` + +The `logs` command shows lifecycle and gateway output. +It does not export the structured per-session agent state that OpenClaw stores under `.openclaw/agents/`. + +## Inspect Agent Session State + +OpenClaw stores structured session state inside the sandbox. +Use these files when you need an audit trail, a compliance review surface, or replay tooling that includes assistant messages and tool activity. + +| File | Purpose | +|---|---| +| `/sandbox/.openclaw/agents/main/sessions/.jsonl` | Per-session event log. Use this file for audit trails and compliance dashboards. Records can include assistant messages, `thinking` blocks, tool calls, tool results, token usage, and cost metadata. | +| `/sandbox/.openclaw/agents/main/sessions/.trajectory.jsonl` | Lower-level trajectory data for fine-grained replay. This file can be large, so avoid using it for routine audit summaries. | +| `/sandbox/.openclaw/agents/main/sessions/sessions.json` | Session index that maps known session keys to their persisted state. | + +To inspect the session directory from the host, run a sandbox command: + +```console +$ nemoclaw sandbox exec -- ls -lh /sandbox/.openclaw/agents/main/sessions +``` + +To copy a session log for offline review, use the OpenShell sandbox download command: + ```console -$ nemoclaw logs --follow +$ openshell sandbox download /sandbox/.openclaw/agents/main/sessions/.jsonl . ``` +Treat exported session logs as sensitive data. +They can contain prompts, tool inputs, tool outputs, file paths, and cost metadata from the agent run. + ## Monitor Network Activity in the TUI Open the OpenShell terminal UI for a live view of sandbox network activity and egress requests: -```console -$ openshell term +```bash +openshell term ``` For a remote sandbox, SSH to the instance and run `openshell term` there. @@ -72,10 +101,18 @@ Refer to Approve or Deny Agent Network Requests (use the `nemoclaw-user-manage-p Run a test inference request to verify that the provider is responding: -```console -$ nemoclaw my-assistant connect -$ openclaw agent --agent main -m "Test inference" --session-id debug + +```bash +nemoclaw my-assistant connect +openclaw agent --agent main -m "Test inference" --session-id debug +``` + + +```bash +nemoclaw my-hermes connect +hermes ``` + If the request fails, check the following: diff --git a/.agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json b/.agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json new file mode 100644 index 0000000000..f322b351d7 --- /dev/null +++ b/.agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-monitoring-monitor-sandbox-activity-001", + "question": "I'm monitoring sandbox activity. Help me understand what the agent and sandbox are doing now so I can detect unhealthy or unexpected behavior early.", + "expected_skill": "nemoclaw-user-monitor-sandbox", + "ground_truth": "A NemoClaw-specific answer that helps the user understand what the agent and sandbox are doing now and gives enough concrete guidance, decision criteria, verification steps, or risk framing to detect unhealthy or unexpected behavior early.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-overview/SKILL.md b/.agents/skills/nemoclaw-user-overview/SKILL.md index 99a57de8ff..89f0056373 100644 --- a/.agents/skills/nemoclaw-user-overview/SKILL.md +++ b/.agents/skills/nemoclaw-user-overview/SKILL.md @@ -1,16 +1,15 @@ --- name: "nemoclaw-user-overview" -description: "Explains how OpenClaw, OpenShell, and NemoClaw form the ecosystem, NemoClaw's position in the stack, what NemoClaw adds beyond the community sandbox, and when to prefer NemoClaw versus integrating OpenShell and OpenClaw directly. Use when users ask about the relationship between OpenClaw, OpenShell, and NemoClaw, or when to use NemoClaw versus OpenShell. Trigger keywords - nemoclaw ecosystem, openclaw openshell, nemoclaw vs openshell, sandboxed openclaw, how nemoclaw works, nemoclaw sandbox lifecycle blueprint, nemoclaw overview, openclaw always-on assistants, nvidia openshell, nvidia nemotron, nemoclaw release notes, nemoclaw changelog." +description: "Explains what NemoClaw covers: onboarding, lifecycle management, and agent operations within OpenShell containers, plus capabilities and why it exists. Use when users ask what NemoClaw is or what the project provides. For ecosystem placement or OpenShell-only paths, use the Ecosystem page; for internal mechanics, use How It Works. Trigger keywords - nemoclaw overview, openclaw always-on assistants, hermes agent, nvidia openshell, nvidia nemotron, nemoclaw ecosystem, nemohermes, nemoclaw vs openshell, run hermes openshell sandbox, openclaw openshell, sandboxed openclaw, how nemoclaw works, nemoclaw sandbox lifecycle blueprint, nemoclaw release notes, nemoclaw changelog." +license: "Apache-2.0" --- - - - -# Ecosystem +# NemoClaw User Overview ## References +- **Load [references/overview.md](references/overview.md)** when users ask what NemoClaw is or what the project provides. For ecosystem placement or OpenShell-only paths, use the Ecosystem page; for internal mechanics, use How It Works. Explains what NemoClaw covers: onboarding, lifecycle management, and agent operations within OpenShell containers, plus capabilities and why it exists. +- **Load [references/ecosystem-hermes.md](references/ecosystem-hermes.md)** when users ask about Hermes, OpenShell, and NemoClaw together, or when to use NemoClaw versus OpenShell for Hermes. Explains how Hermes, OpenShell, and NemoClaw form the ecosystem, NemoClaw's position in the stack, what NemoClaw adds beyond integrating OpenShell yourself, and when to prefer NemoHermes versus OpenShell. - **Load [references/ecosystem.md](references/ecosystem.md)** when users ask about the relationship between OpenClaw, OpenShell, and NemoClaw, or when to use NemoClaw versus OpenShell. Explains how OpenClaw, OpenShell, and NemoClaw form the ecosystem, NemoClaw's position in the stack, what NemoClaw adds beyond the community sandbox, and when to prefer NemoClaw versus integrating OpenShell and OpenClaw directly. - **Load [references/how-it-works.md](references/how-it-works.md)** for sandbox lifecycle and architecture mechanics; not for product definition (Overview) or multi-project placement (Ecosystem). Describes how NemoClaw works internally: CLI, plugin, blueprint runner, OpenShell orchestration, inference routing, and protection layers. -- **Load [references/overview.md](references/overview.md)** when users ask what NemoClaw is or what the project provides. For ecosystem placement or OpenShell-only paths, use the Ecosystem page; for internal mechanics, use How It Works. Explains what NemoClaw covers: onboarding, lifecycle management, and OpenClaw operations within OpenShell containers, plus capabilities and why it exists. - **Load [references/release-notes.md](references/release-notes.md)** when users ask about recent changes, the release cadence, or where to track versioned assets on GitHub. Includes the NemoClaw release notes. diff --git a/.agents/skills/nemoclaw-user-overview/evals/evals.json b/.agents/skills/nemoclaw-user-overview/evals/evals.json new file mode 100644 index 0000000000..e8fb4f52de --- /dev/null +++ b/.agents/skills/nemoclaw-user-overview/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-index-001", + "question": "I'm first arriving at the NemoClaw docs. Help me understand what NemoClaw helps me run and why it exists so I can decide whether it is worth installing before I spend time on setup.", + "expected_skill": "nemoclaw-user-overview", + "ground_truth": "A NemoClaw-specific answer that helps the user understand what NemoClaw helps me run and why it exists and gives enough concrete guidance, decision criteria, verification steps, or risk framing to decide whether it is worth installing before I spend time on setup.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-overview/references/ecosystem-hermes.md b/.agents/skills/nemoclaw-user-overview/references/ecosystem-hermes.md new file mode 100644 index 0000000000..ae660f0ad6 --- /dev/null +++ b/.agents/skills/nemoclaw-user-overview/references/ecosystem-hermes.md @@ -0,0 +1,93 @@ +# Ecosystem + +NemoClaw provides onboarding, lifecycle management, and Hermes operations within OpenShell containers. +Use the `nemohermes` CLI alias when you work from the Hermes agent guide; it is equivalent to `nemoclaw` with the Hermes agent pre-selected. + +This page describes how these projects form the ecosystem, where NemoClaw sits relative to [OpenShell](https://github.com/NVIDIA/OpenShell) and [Hermes](https://hermes-agent.nousresearch.com/docs/), and how to choose between NemoHermes and OpenShell alone. + +## How the Stack Fits Together + +A NemoClaw for Hermes deployment combines three pieces with distinct scopes: Hermes, OpenShell, and NemoClaw. +The following diagram shows how they fit together. + +```mermaid +flowchart TB + NC["🦞 NVIDIA NemoClaw
CLI, blueprint"] + OS["🐚 NVIDIA OpenShell
Gateway, policy, inference routing"] + HM["Hermes
Agent in sandbox"] + + NC -->|orchestrates| OS + OS -->|isolates and runs| HM + + classDef nv fill:#76b900,stroke:#333,color:#fff + classDef nvLight fill:#e6f2cc,stroke:#76b900,color:#1a1a1a + classDef nvDark fill:#333,stroke:#76b900,color:#fff + + class NC nv + class OS nv + class HM nvDark + + linkStyle 0 stroke:#76b900,stroke-width:2px + linkStyle 1 stroke:#76b900,stroke-width:2px +``` + +NemoClaw sits above OpenShell in the operator workflow. +It drives OpenShell APIs and CLI to create and configure the sandbox that runs Hermes. +Models and endpoints sit behind OpenShell's inference routing. +NemoClaw onboarding wires provider choice into that routing, including the Hermes Provider route when you onboard through `nemohermes`. + +The following table shows the scope of each component in the stack. + +| Project | Scope | +|---------|--------| +| [Hermes](https://hermes-agent.nousresearch.com/docs/) | The agent: runtime, tools, messaging adapters, and an OpenAI-compatible API inside the container. It does not define the sandbox or the host gateway. | +| [OpenShell](https://github.com/NVIDIA/OpenShell) | The execution environment: sandbox lifecycle, network, filesystem, and process policy, inference routing, and the operator-facing `openshell` CLI for those primitives. | +| NemoClaw | The NVIDIA reference stack on the host: `nemohermes` / `nemoclaw` CLI, versioned blueprint, channel messaging configured for OpenShell-managed delivery, and state migration helpers so Hermes runs inside OpenShell in a documented, repeatable way. | + +## NemoClaw Path versus OpenShell Path + +Both paths assume OpenShell can sandbox a workload. +The difference is who owns the integration work. + +| Path | What it means | +|------|---------------| +| **NemoClaw path** | You adopt the reference stack. NemoClaw's Hermes blueprint encodes a hardened image, default policies, and orchestration so `nemohermes onboard` can create a known-good Hermes-on-OpenShell setup with less custom glue. | +| **OpenShell path** | You use OpenShell as the platform and supply your own container, Hermes install steps, policy YAML, provider setup, and any host bridges. OpenShell stays the sandbox and policy engine; nothing requires NemoClaw's blueprint or CLI. | + +## What NemoClaw Adds Beyond Custom OpenShell + +You can run Hermes inside OpenShell without NemoClaw by building your own image, writing policy YAML, registering providers, and wiring inference routes yourself. +That path is valid when you need full control over the container layout. + +NemoClaw builds on OpenShell with additional security hardening, automation, and lifecycle tooling for Hermes. +The following table compares custom OpenShell integration with `nemohermes onboard`. + +| Capability | Custom OpenShell + Hermes | `nemohermes onboard` | +|---|---|---| +| Sandbox isolation | Yes, when you apply OpenShell seccomp, Landlock, network namespace isolation, and no-new-privileges enforcement through your policy. | Yes. NemoClaw applies these through the blueprint and layers a Hermes-specific restrictive policy on top. | +| Credential handling | You create OpenShell providers manually with `openshell provider create` and configure placeholder resolution at egress. | NemoClaw creates OpenShell providers during onboarding and filters sensitive host environment variables from the sandbox creation command to reduce accidental leakage through build args. | +| Image hardening | Depends on your base image and install steps. | NemoClaw strips build toolchains (`gcc`, `g++`, `make`) and network probes (`netcat`) from the runtime image to reduce attack surface. | +| Filesystem policy | You define read-only and read-write paths in policy YAML. | NemoClaw defines a targeted layout: system paths (`/usr`, `/lib`, `/etc`) are read-only; `/sandbox` and `/sandbox/.hermes` are writable for agent state and configuration. | +| Inference setup | You configure OpenShell inference routing and Hermes `config.yaml` manually. | NemoClaw validates credentials from the host, configures the OpenShell route, and bakes model settings into `/sandbox/.hermes/config.yaml`. Hermes Provider onboarding is available through `nemohermes`. | +| Channel messaging | OpenShell delivers channel tokens through its provider system and L7 proxy; you configure Hermes platform adapters manually. | NemoClaw automates supported channel setup during onboarding and bakes Hermes env/config with placeholder tokens that OpenShell resolves at egress. | +| Blueprint versioning | No NemoClaw blueprint; your image tag is whatever you built locally. | NemoClaw downloads the blueprint artifact, checks version compatibility, and verifies its digest before applying. Running `nemohermes onboard` on different machines produces the same sandbox. | +| State migration | Not included unless you build it. | NemoClaw migrates agent state across machines with credential stripping and integrity verification. | +| Process count limits | You set process count limits manually with `--ulimit` or orchestrator config. | NemoClaw applies `ulimit -u 512` in the container entrypoint on top of OpenShell's seccomp and privilege dropping. | + +## When to Use Which + +Use the following table to decide when to use NemoHermes versus OpenShell alone. + +| Situation | Prefer | +|-----------|--------| +| You want Hermes with minimal assembly, NVIDIA defaults, and the documented install and onboard flow. | NemoClaw (`nemohermes`) | +| You need maximum flexibility for custom images, a layout that does not match the NemoClaw Hermes blueprint, or a workload outside this reference stack. | OpenShell with your own integration | +| You are standardizing on the NVIDIA reference for always-on Hermes agents with policy and inference routing. | NemoClaw (`nemohermes`) | +| You are building internal platform abstractions where the NemoClaw CLI or blueprint is not the right fit. | OpenShell (and your orchestration) | + +## Related Topics + +- [Overview](overview.md) describes what NemoClaw is, including capabilities, benefits, and use cases. +- [How It Works](how-it-works.md) describes how NemoClaw runs, the blueprint, sandbox creation, routing, and protection layers for Hermes. +- Architecture (use the `nemoclaw-user-reference` skill) shows the repository structure and technical diagrams. +- Quickstart with Hermes (use the `nemoclaw-user-get-started` skill) installs NemoClaw and launches your first Hermes sandbox. diff --git a/.agents/skills/nemoclaw-user-overview/references/ecosystem.md b/.agents/skills/nemoclaw-user-overview/references/ecosystem.md index 1fc6ea0025..a5a11e289c 100644 --- a/.agents/skills/nemoclaw-user-overview/references/ecosystem.md +++ b/.agents/skills/nemoclaw-user-overview/references/ecosystem.md @@ -1,14 +1,12 @@ - - # Ecosystem NemoClaw provides onboarding, lifecycle management, and OpenClaw operations within OpenShell containers. -This page describes how the ecosystem is formed across projects, where NemoClaw sits relative to [OpenShell](https://github.com/NVIDIA/OpenShell) and [OpenClaw](https://openclaw.ai), and how to choose between NemoClaw and OpenShell. +This page describes how these projects form the ecosystem, where NemoClaw sits relative to [OpenShell](https://github.com/NVIDIA/OpenShell) and [OpenClaw](https://openclaw.ai), and how to choose between NemoClaw and OpenShell. ## How the Stack Fits Together -There are three pieces that are put together in a NemoClaw deployment: OpenClaw, OpenShell, and NemoClaw, each with a distinct scope. +A NemoClaw for OpenClaw deployment combines three pieces with distinct scopes: OpenClaw, OpenShell, and NemoClaw. The following diagram shows how they fit together. ```mermaid @@ -52,7 +50,7 @@ The difference is who owns the integration work. | Path | What it means | |------|---------------| -| **NemoClaw path** | You adopt the reference stack. NemoClaw's blueprint encodes a hardened image, default policies, and orchestration so `nemoclaw onboard` can stand up a known-good OpenClaw-on-OpenShell setup with less custom glue. | +| **NemoClaw path** | You adopt the reference stack. NemoClaw's blueprint encodes a hardened image, default policies, and orchestration so `nemoclaw onboard` can create a known-good OpenClaw-on-OpenShell setup with less custom glue. | | **OpenShell path** | You use OpenShell as the platform and supply your own container, install steps for OpenClaw, policy YAML, provider setup, and any host bridges. OpenShell stays the sandbox and policy engine; nothing requires NemoClaw's blueprint or CLI. | ## What NemoClaw Adds Beyond the OpenShell Community Sandbox @@ -70,7 +68,7 @@ The following table compares the two paths. | Credential handling | OpenShell's provider system replaces real credentials with placeholder tokens in the sandbox environment. The L7 proxy resolves placeholders to real values at egress. You create providers manually with `openshell provider create`. | NemoClaw creates OpenShell providers automatically during onboarding. It also filters sensitive host environment variables (provider API keys, `DISCORD_BOT_TOKEN`, `SLACK_BOT_TOKEN`, `TELEGRAM_BOT_TOKEN`) from the sandbox creation command to prevent accidental leakage through build args. | | Image hardening | The community image includes standard system tools for general-purpose use. | NemoClaw strips build toolchains (`gcc`, `g++`, `make`) and network probes (`netcat`) from the runtime image to reduce attack surface. | | Filesystem policy | The community sandbox bundles a policy for OpenClaw. | NemoClaw defines a targeted read-only and read-write layout. System paths (`/usr`, `/lib`, `/etc`) are read-only. The agent's home directory (`/sandbox`) and config directory (`/sandbox/.openclaw`) are writable by default so the agent can manage config, install skills, and write to standard paths natively. | -| Inference setup | The community sandbox includes an `openclaw-start` script that runs OpenClaw's onboarding wizard inside the sandbox. You can also create providers and configure OpenShell inference routing manually from the host. | NemoClaw's onboarding wizard validates your credential from the host, lets you select a provider (NVIDIA Endpoints, OpenAI, Anthropic, Google Gemini, Ollama, and compatible endpoints), and configures OpenShell's inference routing automatically. Credentials stay on the host and are delivered through OpenShell's provider system. | +| Inference setup | The community sandbox includes an `openclaw-start` script that runs OpenClaw's onboarding wizard inside the sandbox. You can also create providers and configure OpenShell inference routing manually from the host. | NemoClaw's onboarding wizard validates your credential from the host, lets you select a provider (NVIDIA Endpoints, OpenAI, Anthropic, Google Gemini, Ollama, and compatible endpoints), and configures OpenShell's inference routing automatically. Credentials stay on the host, and OpenShell's provider system delivers them. | | Channel messaging | OpenShell provides the credential provider system and L7 proxy that delivers channel tokens securely (including path-based resolution for Telegram's `/bot/` URL pattern). You create providers and configure OpenClaw's channel settings manually. | NemoClaw automates channel setup during onboarding: it collects bot tokens, registers them as OpenShell providers, and bakes OpenClaw channel config with placeholder tokens that OpenShell's proxy resolves at egress. No separate bridge process runs on the host. | | Blueprint versioning | No blueprint. The community sandbox uses whatever image version is currently published. | NemoClaw downloads the blueprint artifact, checks version compatibility, and verifies its digest before applying. Running `nemoclaw onboard` on different machines produces the same sandbox. | | State migration | Not included. | NemoClaw migrates agent state across machines with credential stripping and integrity verification. | @@ -87,8 +85,8 @@ Use the following table to decide when to use NemoClaw versus OpenShell. | You are standardizing on the NVIDIA reference for always-on assistants with policy and inference routing. | NemoClaw | | You are building internal platform abstractions where the NemoClaw CLI or blueprint is not the right fit. | OpenShell (and your orchestration) | -## Related topics +## Related Topics -- Overview (use the `nemoclaw-user-overview` skill) contains what NemoClaw is, capabilities, benefits, and use cases. -- How It Works (use the `nemoclaw-user-overview` skill) describes how NemoClaw runs, plugin, blueprint, sandbox creation, routing, protection layers. +- [Overview](overview.md) describes what NemoClaw is, including capabilities, benefits, and use cases. +- [How It Works](how-it-works.md) describes how NemoClaw runs, including the plugin, blueprint, sandbox creation, routing, and protection layers. - Architecture (use the `nemoclaw-user-reference` skill) shows the repository structure and technical diagrams. diff --git a/.agents/skills/nemoclaw-user-overview/references/how-it-works.md b/.agents/skills/nemoclaw-user-overview/references/how-it-works.md index 8305f2c2aa..e21062559f 100644 --- a/.agents/skills/nemoclaw-user-overview/references/how-it-works.md +++ b/.agents/skills/nemoclaw-user-overview/references/how-it-works.md @@ -1,11 +1,17 @@ - - # NemoClaw Architecture Overview -This page explains how NemoClaw runs OpenClaw inside an OpenShell sandbox and how the gateway connects the agent to inference, integrations, and policy. +import { AgentCli, AgentOnly } from "../_components/AgentGuide"; -NemoClaw does not replace OpenClaw or OpenShell. -It packages them into a repeatable setup with a host CLI, a versioned blueprint, default policies, inference setup, plugin configuration, and state helpers. +This page explains how NemoClaw runs supported agents inside an OpenShell sandbox and how the gateway connects the agent to inference, integrations, and policy. + +NemoClaw does not replace OpenShell or your chosen agent runtime. +It packages them into a repeatable setup with a host CLI, a versioned blueprint, default policies, inference setup, and state helpers. + +OpenClaw sandboxes also load the NemoClaw plugin for managed inference metadata and the `/nemoclaw` slash command. + + +Hermes sandboxes receive agent configuration under `/sandbox/.hermes` during onboarding instead of the OpenClaw plugin path. + You can use that setup directly or adapt it for your own OpenShell integration. ## High-Level Flow @@ -23,7 +29,7 @@ The diagram has the following components: | Users and operators | Start from the CLI, installer, dashboard, or an end-user channel. | | NemoClaw control | Collects configuration, runs onboarding, prepares the blueprint, and asks OpenShell to create or update resources. | | OpenShell gateway | Owns sandbox lifecycle, networking, policy enforcement, inference routing, and integration egress. | -| NemoClaw sandbox | Runs OpenClaw with the NemoClaw plugin, the selected blueprint contents, and supporting tools. | +| NemoClaw sandbox | Runs the onboarded agent with the selected blueprint contents and supporting tools. | | Inference | Receives model requests through the gateway, using NVIDIA endpoints, NIM, or compatible APIs. | | Integrations | Reach messaging services, MCP servers, GitHub, package indexes, or model hubs through gateway-managed egress. | | State and artifacts | Store configuration, credentials, logs, workspace files, policies, and transcripts outside the running agent process. | @@ -32,39 +38,48 @@ For repository layout, file paths, and deeper diagrams, see Architecture (use th ## Design Principles -NemoClaw architecture follows the following principles. +NemoClaw follows these architecture principles. -Thin plugin, versioned blueprint -: The sandbox plugin stays small and stable. Host-side orchestration uses a versioned blueprint and runner that can evolve on its own release cadence. +Versioned blueprint +: Host-side orchestration uses a versioned blueprint and runner that can evolve on its own release cadence. + The OpenClaw sandbox plugin stays small and stable inside the container. Respect CLI boundaries -: The `nemoclaw` CLI is the primary interface for sandbox management. +: The CLI is the primary interface for sandbox management. Supply chain safety : Blueprint artifacts are immutable, versioned, and digest-verified before execution. OpenShell-backed lifecycle -: NemoClaw orchestrates OpenShell resources under the hood, but `nemoclaw onboard` - is the supported operator entry point for creating or recreating NemoClaw-managed sandboxes. +: NemoClaw orchestrates OpenShell resources under the hood, but onboard is the supported operator entry point for creating or recreating NemoClaw-managed sandboxes. Reproducible setup : Running setup again recreates the sandbox from the same blueprint and policy definitions. ## CLI, Plugin, and Blueprint -NemoClaw is split into three integration pieces: +NemoClaw is split into integration pieces on the host and in the sandbox image: - The _host CLI_ runs onboarding, validates provider choices, stores configuration, and calls OpenShell commands for gateway, provider, sandbox, and policy operations. + + - The _plugin_ is a TypeScript package that runs with OpenClaw inside the sandbox. It registers the managed inference provider metadata, the `/nemoclaw` slash command, and runtime context hooks. + + + + +- NemoClaw writes Hermes runtime configuration into `/sandbox/.hermes` during onboarding, including `config.yaml`, environment files, and platform adapter settings for supported messaging channels. + + - The _blueprint_ is a versioned YAML package with the sandbox image, policy, inference profile, and supporting assets. The runner resolves and verifies the blueprint before applying it through OpenShell. -This separation keeps the sandbox plugin small while allowing host orchestration and blueprint contents to evolve on their own release cadence. +This separation keeps agent-specific sandbox assets focused while allowing host orchestration and blueprint contents to evolve on their own release cadence. ## Sandbox Creation -When you run `nemoclaw onboard`, NemoClaw creates an OpenShell sandbox that runs OpenClaw in an isolated container. +When you run onboard, NemoClaw creates an OpenShell sandbox that runs your selected agent in an isolated container. The host CLI and blueprint runner orchestrate this process through the OpenShell CLI: 1. NemoClaw resolves the blueprint, checks version compatibility, and verifies the digest. @@ -80,6 +95,9 @@ OpenShell intercepts every inference call and routes it to the configured provid During onboarding, NemoClaw validates the selected provider and model, configures the OpenShell route, and bakes the matching model reference into the sandbox image. The sandbox then talks to `inference.local`, while the host owns the actual provider credential and upstream endpoint. If you select the Model Router provider, `inference.local` routes to a host-side router that chooses from the configured NVIDIA model pool for each request. + +For Hermes, runtime model switches through inference set update `/sandbox/.hermes/config.yaml` without rebuilding the sandbox. + ## Protection Layers @@ -94,11 +112,24 @@ The sandbox starts with a default policy that controls network egress, filesyste When the agent tries to reach an unlisted host, OpenShell blocks the request and surfaces it in the TUI for operator approval. Approved endpoints persist for the current session but are not saved to the baseline policy file. -For details on the baseline rules, refer to Network Policies (use the `nemoclaw-user-reference` skill). For container-level hardening, refer to Sandbox Hardening (use the `nemoclaw-user-deploy-remote` skill). - ## Next Steps -- Read Ecosystem (use the `nemoclaw-user-overview` skill) for stack-level relationships and NemoClaw versus OpenShell-only paths. -- Follow the Quickstart (use the `nemoclaw-user-get-started` skill) to launch your first sandbox. + + +- Read [Ecosystem](ecosystem.md) for stack-level relationships and NemoClaw versus OpenShell-only paths. +- Follow Quickstart with OpenClaw (use the `nemoclaw-user-get-started` skill) to launch your first sandbox. +- Refer to the Architecture (use the `nemoclaw-user-reference` skill) for the full technical structure, including file layouts and the blueprint lifecycle. +- Refer to Inference Options (use the `nemoclaw-user-configure-inference` skill) for detailed provider configuration. +- For details on the baseline rules, refer to Network Policies (use the `nemoclaw-user-reference` skill). +- For container-level hardening, refer to Sandbox Hardening. + + + + +- Read [Ecosystem](ecosystem.md) for stack-level relationships and NemoClaw versus OpenShell-only paths. +- Follow Quickstart with Hermes (use the `nemoclaw-user-get-started` skill) to launch your first sandbox. - Refer to the Architecture (use the `nemoclaw-user-reference` skill) for the full technical structure, including file layouts and the blueprint lifecycle. - Refer to Inference Options (use the `nemoclaw-user-configure-inference` skill) for detailed provider configuration. +- For details on the baseline rules, refer to Network Policies (use the `nemoclaw-user-reference` skill). + + diff --git a/.agents/skills/nemoclaw-user-overview/references/overview.md b/.agents/skills/nemoclaw-user-overview/references/overview.md index 330cc0c740..a5dc5d525f 100644 --- a/.agents/skills/nemoclaw-user-overview/references/overview.md +++ b/.agents/skills/nemoclaw-user-overview/references/overview.md @@ -1,18 +1,19 @@ - - # Overview of NVIDIA NemoClaw -NVIDIA NemoClaw is an open-source reference stack that simplifies running [OpenClaw](https://openclaw.ai) always-on assistants more safely. -NemoClaw provides onboarding, lifecycle management, and OpenClaw operations within OpenShell containers. -It incorporates policy-based privacy and security guardrails, giving you control over your agents’ behavior and data handling. -This enables self-evolving claws to run more safely in clouds, on prem, RTX PCs and DGX Spark. +import { AgentCli, AgentOnly } from "../_components/AgentGuide"; + +NVIDIA NemoClaw is an open-source reference stack for running always-on AI agents more safely inside OpenShell containers. +NemoClaw provides onboarding, lifecycle management, and agent operations for supported runtimes in OpenShell sandboxes. +It incorporates policy-based privacy and security guardrails, giving you control over your agents' behavior and data handling. +These controls help self-evolving agents run more safely in clouds, on-premises environments, RTX PCs, and DGX Spark. NemoClaw pairs hosted models on inference providers or local endpoints with a hardened sandbox, routed inference, and declarative egress policy so deployment stays safer and more repeatable. -The sandbox runtime comes from [NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell); NemoClaw adds the blueprint, `nemoclaw` CLI, onboarding, and related tooling as the reference way to run OpenClaw there. +The sandbox runtime comes from [NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell). +NemoClaw adds the blueprint, CLI, onboarding, and related tooling as the reference way to run supported agents there. | Capability | Description | |-------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------| -| Sandbox OpenClaw | Creates an OpenShell sandbox pre-configured for OpenClaw, with filesystem and network policies applied from the first boot. | +| Sandbox supported agents | Creates an OpenShell sandbox pre-configured for your selected agent, with filesystem and network policies applied from the first boot. | | Route inference | Configures OpenShell inference routing so agent traffic goes to the provider and model you chose during onboarding (NVIDIA Endpoints, OpenAI, Anthropic, Gemini, compatible endpoints, local Ollama, and others). The agent uses `inference.local` inside the sandbox; credentials stay on the host. | | Manage the lifecycle | Handles blueprint versioning, digest verification, and sandbox setup. | @@ -23,6 +24,7 @@ NemoClaw provides the following product capabilities. | Feature | Description | |---------|-------------| | Guided onboarding | Validates credentials, selects providers, and creates a working sandbox in one command. | +| Agent skills | Packages NemoClaw documentation as user skills so AI coding assistants can guide setup, inference configuration, policy management, monitoring, deployment, security review, and troubleshooting. | | Hardened blueprint | A security-first Dockerfile with capability drops, least-privilege network rules, and declarative policy. | | State management | Safe migration of agent state across machines with credential stripping and integrity verification. | | Messaging channels | OpenShell-managed processes connect Telegram, Discord, Slack, and similar platforms to the sandboxed agent. NemoClaw configures channels during onboarding; OpenShell supplies the native constructs, credential flow, and runtime supervision. | @@ -37,19 +39,19 @@ NemoClaw provides the following benefits to mitigate these risks. | Benefit | Description | |----------------------------|------------------------------------------------------------------------------------------------------------------------| -| Sandboxed execution | Every agent runs inside an OpenShell sandbox with Landlock, seccomp, and network namespace isolation. No access is granted by default. | -| Routed inference | Model traffic is routed through the OpenShell gateway to your selected provider, transparent to the agent. You can switch providers or models. Refer to Inference Options (use the `nemoclaw-user-configure-inference` skill). | -| Declarative network policy | Egress rules are defined in YAML. Unknown hosts are blocked and surfaced to the operator for approval. | -| Single CLI | The `nemoclaw` command orchestrates the full stack: gateway, sandbox, inference provider, and network policy. | +| Sandboxed execution | Every agent runs inside an OpenShell sandbox with Landlock, seccomp, and network namespace isolation. The sandbox grants no access by default. | +| Routed inference | The OpenShell gateway routes model traffic to your selected provider, transparent to the agent. You can switch providers or models. Refer to Inference Options (use the `nemoclaw-user-configure-inference` skill). | +| Declarative network policy | YAML defines egress rules. OpenShell blocks unknown hosts and surfaces them to the operator for approval. | +| Single CLI | The command orchestrates the full stack: gateway, sandbox, inference provider, and network policy. | | Blueprint lifecycle | Versioned blueprints handle sandbox creation, digest verification, and reproducible setup. | ## Use Cases -You can use NemoClaw for various use cases including the following. +You can use NemoClaw for use cases such as the following. | Use Case | Description | |---------------------------|----------------------------------------------------------------------------------------------| -| Always-on assistant | Run an OpenClaw assistant with controlled network access and operator-approved egress. | +| Always-on assistant | Run a sandboxed agent with controlled network access and operator-approved egress. | | Sandboxed testing | Test agent behavior in a locked-down environment before granting broader permissions. | | Remote GPU deployment | Deploy a sandboxed agent to a remote GPU instance for persistent operation. | @@ -57,7 +59,21 @@ You can use NemoClaw for various use cases including the following. Navigate to the following topics to learn more about NemoClaw and how to install and use it. -- Architecture Overview (use the `nemoclaw-user-overview` skill) to understand how NemoClaw works. -- Ecosystem (use the `nemoclaw-user-overview` skill) to understand how OpenClaw, OpenShell, and NemoClaw relate in the wider stack, and when to use NemoClaw versus OpenShell. -- Quickstart (use the `nemoclaw-user-get-started` skill) to install NemoClaw and run your first sandboxed agent. + + +- [Architecture Overview](how-it-works.md) to understand how NemoClaw works. +- [Ecosystem](ecosystem.md) to understand how your agent, OpenShell, and NemoClaw relate in the wider stack, and when to use NemoClaw versus OpenShell. +- Quickstart with OpenClaw (use the `nemoclaw-user-get-started` skill) to install NemoClaw and run your first OpenClaw sandbox. +- Agent Skills (use the `nemoclaw-user-agent-skills` skill) to load NemoClaw guidance into an AI coding assistant. +- Inference Options (use the `nemoclaw-user-configure-inference` skill) to check the inference providers that NemoClaw supports and how inference routing works. + + + + +- [Architecture Overview](how-it-works.md) to understand how NemoClaw works. +- [Ecosystem](ecosystem.md) to understand how Hermes, OpenShell, and NemoClaw relate in the wider stack, and when to use NemoClaw versus OpenShell. +- Quickstart with Hermes (use the `nemoclaw-user-get-started` skill) to install NemoClaw and run your first Hermes sandbox with `nemoclaw`. +- Agent Skills (use the `nemoclaw-user-agent-skills` skill) to load NemoClaw guidance into an AI coding assistant. - Inference Options (use the `nemoclaw-user-configure-inference` skill) to check the inference providers that NemoClaw supports and how inference routing works. + + diff --git a/.agents/skills/nemoclaw-user-overview/references/release-notes.md b/.agents/skills/nemoclaw-user-overview/references/release-notes.md index 45c25b6c9c..72f6d092d7 100644 --- a/.agents/skills/nemoclaw-user-overview/references/release-notes.md +++ b/.agents/skills/nemoclaw-user-overview/references/release-notes.md @@ -1,8 +1,74 @@ - - # Release Notes -NVIDIA NemoClaw is available in early preview starting March 16, 2026. Use this page to track changes. +NVIDIA NemoClaw is available in early preview starting March 16, 2026. +Use this page to track the highlights of the latest release. +For more detailed release notes, refer to the [NemoClaw GitHub announcements](https://github.com/NVIDIA/NemoClaw/discussions/categories/announcements?discussions_q=is%3Aopen+category%3AAnnouncements). + +## v0.0.58 + +NemoClaw v0.0.58 improves GPU proof reporting, local-inference metadata, policy failure handling, Hermes messaging reliability, OpenClaw diagnostics, and release-prep documentation: + +- GPU and local-inference setup report more accurate state. WSL Docker Desktop on ARM64 can accept a reported NVIDIA GPU only after a bounded Docker CUDA proof succeeds, `nemoclaw status` shows whether sandbox CUDA usability is verified, unverified, or failed, managed vLLM uses runtime `max_model_len` metadata for the baked context window when available, and DeepSeek managed-vLLM startup receives the runtime keyword arguments it expects. For more information, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). +- Onboarding and installer failures stop earlier with clearer recovery guidance. The installer checks for `strings` from `binutils` before clone, build, or OpenShell download work; Docker-driver gateway startup fails fast when Docker is unreachable; WSL Docker Desktop diagnostics explain unsupported native Docker-in-WSL routes; Windows-host Ollama detection also checks the installed Windows process when the daemon is stopped; and custom proxy host and port settings are forwarded into the runtime container. For more information, refer to Prerequisites (use the `nemoclaw-user-get-started` skill). +- Policy and sandbox hardening paths avoid misleading success. `policy-add` refuses to merge a preset when the live policy read returns unparseable output, custom preset application reports when the gateway accepted a preset but the sandbox registry could not record it, and `NEMOCLAW_REQUIRE_CAP_DROP=1` lets operators make entrypoint capability dropping fail closed. For more information, refer to NemoClaw CLI Commands Reference (use the `nemoclaw-user-reference` skill). +- OpenClaw runtime diagnostics can export conversation traces through the `diagnostics-otel` plugin. Set `NEMOCLAW_OPENCLAW_OTEL=1` before onboarding or rebuilding an OpenClaw sandbox to bake the plugin config and apply the local OTLP policy preset. For more information, refer to NemoClaw CLI Commands Reference (use the `nemoclaw-user-reference` skill). +- Hermes sandboxes are more reliable across messaging, inference, and startup repair paths. Slack channel rebuilds enable the Hermes Slack platform block, `inference.local` routes include the placeholder API key LiteLLM expects, Telegram pseudo-tool text is normalized only for the active chat platform, the messaging response patch preserves Hermes method binding, retry markers are cleared before explicit command dispatch, and Hermes state repair preserves writable history and background dispatcher behavior in locked runtime state. For more information, refer to Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill). + +## v0.0.57 + +NemoClaw v0.0.57 improves multi-agent command workflows, local inference setup, messaging channel reliability, sandbox diagnostics, policy persistence, and installer pinning: + +- OpenClaw sandboxes can manage conversation sessions and secondary agents from the host CLI. Use `nemoclaw sessions` to list sessions, reset a session key through the OpenClaw gateway, or delete a non-main session, and use `nemoclaw agents add` or `nemoclaw agents delete` to invoke the in-sandbox OpenClaw agent commands. Build-time config also accepts `NEMOCLAW_EXTRA_AGENTS_JSON` so operators can bake validated secondary-agent entries into `agents.list` without replacing the primary `main` agent. For more information, refer to NemoClaw CLI Commands Reference (use the `nemoclaw-user-reference` skill). +- Local inference setup is more observable and more resilient. Managed vLLM on DGX Spark defaults to `nvidia/Qwen3.6-35B-A3B-NVFP4`, streams Hugging Face model-download progress, polls `/v1/models` for readiness, and uses a progress-aware Docker pull watchdog. Local Ollama routes request streaming usage metadata so OpenClaw token counters can update, and `connect` warns when the recorded inference route diverges from the live gateway route instead of reverting silently. For more information, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). +- Onboarding and re-onboarding preserve more operator intent. Linux Docker-driver onboarding can auto-apply a narrow UFW rule for the sandbox-to-gateway bridge when `NEMOCLAW_AUTO_FIX_FIREWALL=1`, verifies host-network local-inference reachability before reporting success, reuses healthy containerized gateways, binds gateway state by port, rolls back a freshly-created sandbox when setup is cancelled at the policy preset step, and carries finalized policy preset selections across later re-onboard runs. For more information, refer to NemoClaw CLI Commands Reference (use the `nemoclaw-user-reference` skill). +- Messaging channel setup fails earlier and leaves fewer partial changes. Slack setup validates both Socket Mode tokens before saving credentials, `channels add` checks the matching built-in policy preset before prompting or persisting channel state, failed preset application rolls back staged bridge changes when possible, WhatsApp pairing renders a compact QR code with clearer gateway diagnostics, and Slack runtime placeholders are normalized before OpenClaw starts. For more information, refer to Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill). +- Sandbox status and repair output are more actionable. `nemoclaw status` reports Docker daemon, stopped-container, dashboard-port-conflict, and paused-container layers without running misleading inference probes, `doctor` skips stale Kubernetes-only gateway container checks on Docker-driver installs, and stale local registry entries are preserved so the suggested `rebuild --yes` recovery path still has the metadata it needs. For more information, refer to NemoClaw CLI Commands Reference (use the `nemoclaw-user-reference` skill). +- Installer and policy guidance tightened. Piped installs show the correct `NEMOCLAW_INSTALL_TAG` placement and fail clearly when a requested ref is unavailable, the `pypi` preset allows the `uv` package manager binary, and Jira validation now uses a body-visible Atlassian API probe so operators can distinguish blocked and approved curl traffic. For more information, refer to Common NemoClaw Integration Policy Examples (use the `nemoclaw-user-manage-policy` skill). + +## v0.0.56 + +NemoClaw v0.0.56 improves install safety, local-inference validation, messaging diagnostics, sandbox lifecycle reporting, and day-two command behavior: + +- Public installer and `nemoclaw update` flows now follow the admin-promoted `lkg` release tag by default, so curl-piped installs and update checks target the maintained build while validation catches up to newer semver tags. Non-interactive Linux installs can also reactivate Docker group membership through `sg docker` and continue in the same installer run when that path is available. For more information, refer to Manage Sandbox Lifecycle (use the `nemoclaw-user-manage-sandboxes` skill). +- `nemoclaw status`, `nemoclaw connect`, and `nemoclaw upgrade-sandboxes` now probe the live sandbox agent version before deciding whether a rebuild is needed, instead of trusting stale host metadata. Status output reports when the version cannot be verified and points at rebuild when the running agent may predate the current install. For more information, refer to NemoClaw CLI Commands Reference (use the `nemoclaw-user-reference` skill). +- GPU Docker-driver local-inference onboarding now verifies that host-network sandboxes can reach the selected Ollama or vLLM health endpoint before onboarding reports success. Failures now include the provider endpoint, container network mode, and recovery guidance, which avoids discovering the broken route only after the first agent prompt. For more information, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). +- Messaging setup is more diagnosable. Slack setup validates both required Slack credentials before enabling the channel, WhatsApp pairing renders a compact scan-friendly QR for OpenClaw sandboxes and separates gateway close errors from QR rendering, and Telegram DM allowlist aliases continue to work for existing automation. For more information, refer to Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill). +- Command ergonomics are clearer for common day-two paths. `nemoclaw inference set` without both `--provider` and `--model` now points users to the underlying `openshell inference set` command, `nemoclaw skill remove ` removes uploaded skills by `SKILL.md` name, `nemoclaw status --json` supports per-sandbox automation, and `nemoclaw debug --sandbox` validates explicit sandbox names before writing diagnostics. For more information, refer to NemoClaw CLI Commands Reference (use the `nemoclaw-user-reference` skill). +- Policy and sandbox base-image compatibility improved. The `pypi` preset allows the `uv` package manager binary, the sandbox base image includes `tmux` for OpenClaw's bundled tmux-session flow, and Jira preset validation docs now use observable status probes. For more information, refer to Common NemoClaw Integration Policy Examples (use the `nemoclaw-user-manage-policy` skill). +- Uninstall, rebuild, and snapshot flows protect user state more consistently. `nemoclaw uninstall` preserves host-side backups and the sandbox registry by default, rebuilds preserve explicit CPU-only sandbox intent, and snapshot restore blocks ambiguous existing-destination rollbacks unless you opt in with `--force`. For more information, refer to Manage Sandbox Lifecycle (use the `nemoclaw-user-manage-sandboxes` skill). + +## v0.0.55 + +NemoClaw v0.0.55 improves local Ollama onboarding reliability, plugin secret-scanner resilience, and messaging-channel prompt clarity: + +- Local Ollama validation retries host-side curl process timeouts with a larger timeout before failing, and Docker runtime detection retries `docker info` before choosing the local inference route. For more information, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). +- The NemoClaw OpenClaw plugin keeps the memory secret scanner active when OpenClaw runs in embedded fallback mode without a usable path resolver. The scanner falls back to literal memory and workspace-relative paths instead of crashing before the first write-tool call. For more information, refer to Security Best Practices (use the `nemoclaw-user-configure-security` skill). +- The onboarding messaging-channel picker now states that pressing Enter with no channels selected skips messaging setup. For more information, refer to Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill). + +## v0.0.54 + +NemoClaw v0.0.54 updates messaging activation, Windows WSL onboarding, NemoHermes dashboard access, and sandbox repair paths: + +- Generated OpenClaw config now marks Telegram, Discord, Slack, and WhatsApp as enabled at the channel level. Selected messaging plugins are pinned during the image build, and `channels add` verifies Telegram, Discord, and Slack bridge startup after the rebuild instead of leaving silent channel failures for later debugging. For more information, refer to Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill). +- The Windows bootstrap flow waits for Ubuntu account creation before touching Docker settings, enables Docker Desktop WSL integration for the target distro, avoids changing the global WSL default distro, and adds WSL-specific Docker reachability hints during onboarding. For more information, refer to Prepare Windows for NemoClaw. +- Windows-host Ollama setup inside WSL now requires the Docker Desktop WSL integration path. NemoClaw still shows Windows-host Ollama options when it detects them, but labels the Docker Desktop requirement and blocks unsupported native Docker-in-WSL selections before it tries to start or install Ollama. For more information, refer to Use a Local Inference Server (use the `nemoclaw-user-configure-inference` skill). +- NemoHermes can expose the optional native Hermes web dashboard separately from the OpenAI-compatible API. Set `NEMOCLAW_HERMES_DASHBOARD=1` before onboarding to start and forward the dashboard on port `9119`, with `NEMOCLAW_HERMES_DASHBOARD_PORT` and `NEMOCLAW_HERMES_DASHBOARD_TUI` available for port and TUI tab control. For more information, refer to NemoClaw Quickstart with Hermes. +- Onboarding diagnostics include more copy-paste-ready recovery hints. Invalid sandbox names now include a `Try: ` line when NemoClaw can derive a valid name, and non-interactive NVIDIA Endpoints setup prints the exact `export NVIDIA_API_KEY=nvapi-...` shape when the key is missing. For more information, refer to NemoClaw CLI Commands Reference (use the `nemoclaw-user-reference` skill). +- Homebrew stays on the Linuxbrew prefix while exposing installed formula commands in sandbox shell sessions, the `/nemoclaw` slash command activates at OpenClaw startup again, Hermes rebuilds tolerate older release tarballs that lack optional UI package lockfiles, and device scope-upgrade approvals recover without being pinned to the old gateway-scoped request. For more information, refer to Common NemoClaw Integration Policy Examples (use the `nemoclaw-user-manage-policy` skill). +- The host-gateway allowance for OpenClaw `web_fetch` is confined to the trusted proxy path, while strict and direct paths continue to block host-gateway names. Hermes Provider onboarding skips the host-side smoke probe only for OAuth-backed setup and keeps direct validation for Nous API key setup. For more information, refer to NemoClaw Inference Options (use the `nemoclaw-user-configure-inference` skill). + +## v0.0.53 + +NemoClaw v0.0.53 focuses on safer sandbox recreation, stricter onboarding preflight defaults, local inference reliability, policy coverage, and day-two repair workflows: + +- `nemoclaw onboard` backs up workspace state before deleting an existing sandbox during recreation, including sandboxes that are registered but not ready. If the backup is partial or fails, onboarding aborts before delete so workspace, skills, extensions, identity, memory, messaging state, and credentials are not silently dropped. Set `NEMOCLAW_RECREATE_WITHOUT_BACKUP=1` only when you intentionally want a fresh workspace. +- Under-provisioned container-runtime warnings now default to abort in interactive onboarding. Pressing Enter at the warning stops the run so you can resize Docker Desktop or Colima before the sandbox build stalls. Non-interactive runs continue with a warning, and `NEMOCLAW_IGNORE_RUNTIME_RESOURCES=1` still suppresses the check when you have already accepted the resource trade-off. +- OpenClaw sandboxes can use the new `openclaw-pricing` policy preset for model-pricing reference fetches from LiteLLM and OpenRouter. NemoClaw suggests this preset during OpenClaw onboarding so session JSONL records can populate `usage.cost` without widening egress beyond the two read-only pricing endpoints. +- Local Ollama onboarding is more accurate. NemoClaw validates the `/api/tags` response body through the authenticated proxy, honors accepted no-tools overrides through validation and proxy setup, and uses Ollama's reported runtime context length for `contextWindow` unless you set `NEMOCLAW_CONTEXT_WINDOW`. +- Onboarding and gateway reuse recover from more host-runtime drift. NemoClaw recovers stopped gateways before preserving PVC-backed state, verifies gateway containers before reusing port-conflict state, defers Docker-driver gateway teardown until step `[2/8]`, records Docker-driver sandboxes on macOS, and uses Docker `--gpus` rather than CDI repair on WSL Docker Desktop. +- The sandbox and integration paths handle more common failures cleanly, including Brave Search credential rewrite through OpenShell providers, Telegram placeholder repair, host-gateway `web_fetch` routing, read-only host targets for `share mount`, live gateway drift in `list`, host-alias Kubernetes invocations, Jetson bridge DNS preflight failures, and non-ready sandboxes during maintenance backups. +- Hermes startup no longer treats a fresh root-entrypoint layout as locked state, which avoids false locked-layout detection during sandbox boot. +- Maintainer tooling can export a signed skills catalog, detect untracked files during skills refresh diffs, and run the stale-issue verification workflow added for maintainers. ## v0.0.52 @@ -204,20 +270,20 @@ NemoClaw v0.0.38 improves several day-two workflows: Starting with NemoClaw v0.0.34, the `curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash` installer pipeline no longer auto-accepts the third-party software notice when stdin is piped and `/dev/tty` is unavailable (for example, deeply detached SSH sessions or some container shells). In environments without a TTY, accept upfront in the pipe: -```console -$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 bash +```bash +curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 bash ``` Or pass the flag through to the installer: -```console -$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash -s -- --yes-i-accept-third-party-software +```bash +curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash -s -- --yes-i-accept-third-party-software ``` Or re-run from a terminal with a controlling TTY: -```console -$ bash <(curl -fsSL https://www.nvidia.com/nemoclaw.sh) +```bash +bash <(curl -fsSL https://www.nvidia.com/nemoclaw.sh) ``` The installer error message in v0.0.35+ surfaces all three invocations directly so users can copy-paste a recovery without leaving the terminal. diff --git a/.agents/skills/nemoclaw-user-reference/SKILL.md b/.agents/skills/nemoclaw-user-reference/SKILL.md index 4a6d03e571..25cb810e52 100644 --- a/.agents/skills/nemoclaw-user-reference/SKILL.md +++ b/.agents/skills/nemoclaw-user-reference/SKILL.md @@ -1,17 +1,15 @@ --- name: "nemoclaw-user-reference" -description: "Describes the NemoClaw plugin and blueprint architecture and how they orchestrate the OpenClaw sandbox. Use when looking up architecture, plugin structure, or blueprint design. Trigger keywords - nemoclaw architecture, nemoclaw plugin blueprint structure, nemoclaw vs openshell, which cli, nemoclaw cli, openshell cli, sandbox commands, nemoclaw cli commands, nemoclaw command reference, nemoclaw network policy, sandbox egress control operator approval, nemoclaw troubleshooting, nemoclaw debug sandbox issues." +description: "Describes the NemoClaw integration layer and blueprint architecture and how they orchestrate compatible agent sandboxes. Use when looking up architecture, agent integration, plugin structure, or blueprint design. Trigger keywords - nemoclaw architecture, nemoclaw agent architecture, nemoclaw plugin blueprint structure, nemoclaw vs openshell, which cli, nemoclaw cli, openshell cli, sandbox commands, nemoclaw cli commands, nemoclaw command reference, nemoclaw network policy, sandbox egress control operator approval, nemoclaw troubleshooting, nemoclaw debug sandbox issues." +license: "Apache-2.0" --- - - - -# Architecture Details +# NemoClaw User Reference ## References -- **Load [references/architecture.md](references/architecture.md)** when looking up architecture, plugin structure, or blueprint design. Describes the NemoClaw plugin and blueprint architecture and how they orchestrate the OpenClaw sandbox. -- **[references/cli-selection-guide.md](references/cli-selection-guide.md)** — Explains when to use `nemoclaw` versus `openshell` for NemoClaw-managed sandboxes, including lifecycle, inference, policy, monitoring, file transfer, and gateway operations. -- **Load [references/commands.md](references/commands.md)** when looking up a specific `nemoclaw` or `/nemoclaw` subcommand, flag, argument, or exit code. Includes the full CLI reference for slash commands and standalone NemoClaw commands. +- **Load [references/architecture.md](references/architecture.md)** when looking up architecture, agent integration, plugin structure, or blueprint design. Describes the NemoClaw integration layer and blueprint architecture and how they orchestrate compatible agent sandboxes. +- **[references/cli-selection-guide.md](references/cli-selection-guide.md)** — Explains when to use `$$nemoclaw` versus `openshell` for NemoClaw-managed sandboxes, including lifecycle, inference, policy, monitoring, file transfer, and gateway operations. +- **Load [references/commands.md](references/commands.md)** when looking up a specific `$$nemoclaw`, `nemohermes`, or `/nemoclaw` subcommand, flag, argument, or exit code. Includes the full CLI reference for standalone NemoClaw commands and agent-specific in-sandbox commands. - **Load [references/network-policies.md](references/network-policies.md)** when looking up a specific default endpoint, filesystem path, or the runtime approval sequence NemoClaw applies on blocked requests. Covers the baseline network policy, filesystem rules, and operator approval flow. - **Load [references/troubleshooting.md](references/troubleshooting.md)** when diagnosing a reported NemoClaw error, a failed onboard, or unexpected sandbox behavior. Lists fixes for common installation, onboarding, and runtime issues. diff --git a/.agents/skills/nemoclaw-user-reference/evals/evals.json b/.agents/skills/nemoclaw-user-reference/evals/evals.json new file mode 100644 index 0000000000..b7d114b097 --- /dev/null +++ b/.agents/skills/nemoclaw-user-reference/evals/evals.json @@ -0,0 +1,11 @@ +[ + { + "id": "docs-reference-architecture-001", + "question": "I'm using the architecture reference. Help me verify implementation and operations details so I can make changes or debug behavior from the right mental model.", + "expected_skill": "nemoclaw-user-reference", + "ground_truth": "A NemoClaw-specific answer that helps the user verify implementation and operations details and gives enough concrete guidance, decision criteria, verification steps, or risk framing to make changes or debug behavior from the right mental model.", + "expected_behavior": [ + "Uses the expected_skill and does not make up answers if it cannot find the answer from the skill." + ] + } +] diff --git a/.agents/skills/nemoclaw-user-reference/references/architecture.md b/.agents/skills/nemoclaw-user-reference/references/architecture.md index 07e193434a..07bbfc2ba9 100644 --- a/.agents/skills/nemoclaw-user-reference/references/architecture.md +++ b/.agents/skills/nemoclaw-user-reference/references/architecture.md @@ -1,12 +1,10 @@ - - # Architecture Details -NemoClaw combines a host CLI, a TypeScript plugin that runs with OpenClaw inside the sandbox, and a versioned YAML blueprint that defines the sandbox image, policies, and inference profiles applied through OpenShell. +NemoClaw combines a host CLI, an in-sandbox integration layer, and a versioned YAML blueprint that defines the sandbox image, policies, and inference profiles applied through OpenShell. ## System Overview -NVIDIA OpenShell is a general-purpose agent runtime. It provides sandbox containers, a credential-storing gateway, inference proxying, and policy enforcement, but has no opinions about what runs inside. NemoClaw is an opinionated reference stack built on OpenShell that handles what goes in the sandbox and makes the setup accessible. +NVIDIA OpenShell is a general-purpose agent runtime. It provides sandbox containers, a credential-storing gateway, inference proxying, and policy enforcement, but has no opinions about what runs inside. NemoClaw is an opinionated reference stack built on OpenShell that handles what goes in the sandbox, prepares agent-specific integration, and makes the setup accessible. ```mermaid graph LR @@ -42,8 +40,8 @@ graph LR subgraph SANDBOX["Sandbox Container 🔒"] direction TB - AGENT["Agent
OpenClaw or any
compatible agent
"]:::agent - PLUG["NemoClaw Plugin
Extends agent with
managed configuration
"]:::sandbox + AGENT["Compatible Agent
OpenClaw, Hermes,
or another supported runtime
"]:::agent + PLUG["NemoClaw Integration
Managed configuration
and runtime context
"]:::sandbox end end end @@ -91,7 +89,7 @@ graph TB subgraph DOCKER["Docker daemon"] direction TB - SANDBOX["Sandbox container 🔒
Landlock + seccomp + netns
OpenClaw agent + NemoClaw plugin
"]:::sandbox + SANDBOX["Sandbox container 🔒
Landlock + seccomp + netns
Compatible agent + NemoClaw integration
"]:::sandbox end end @@ -115,44 +113,33 @@ Layering from top to bottom: | Host CLI | Host process (`nemoclaw` on Node.js) | Orchestrates OpenShell via `openshell` CLI calls. | | OpenShell gateway | Host process by default; optional Linux compatibility container when the gateway binary needs a newer host ABI | Hosts the credential store, owns sandbox lifecycle coordination, and provides the L7 proxy. | | Docker daemon | Host service | Runs the Docker-driver sandbox container and, on affected Linux hosts, the optional gateway compatibility container. | -| Sandbox container | Docker container | Runs the OpenClaw agent and the NemoClaw plugin under Landlock + seccomp + netns. | +| Sandbox container | Docker container | Runs the selected compatible agent and NemoClaw integration under Landlock + seccomp + netns. | | OpenShell L7 proxy | Gateway process | Intercepts agent egress and rewrites `Authorization` headers (Bearer/Bot) and URL-path segments to inject the real credential at the network boundary. | NemoClaw never gives the sandbox a raw provider key. At onboard time it registers credentials with OpenShell's provider/placeholder system, and the L7 proxy substitutes the real value into outbound requests at egress. -The CLI helper `isInferenceRouteReady` (in `src/lib/onboard.ts`) is a host-side readiness check used by the resume flow to decide whether the active route already covers the chosen provider and model — it is not a runtime component. +The CLI helper `isInferenceRouteReady` (in `src/lib/onboard.ts`) is a host-side readiness check used by the resume flow to decide whether the active route already covers the chosen provider and model. +It is not a runtime component. For the DGX Spark-specific variant of this topology (cgroup v2, aarch64, unified memory), refer to the [NVIDIA Spark playbook](https://build.nvidia.com/spark/nemoclaw). -## NemoClaw Plugin +## NemoClaw Agent Integration -The plugin is a thin TypeScript package that registers an inference provider and the `/nemoclaw` slash command. -It runs in-process with the OpenClaw gateway inside the sandbox. -It also registers runtime hooks that keep the agent aware of its environment. -Before an agent turn starts, the plugin prepends a short context block with the active sandbox name, sandbox phase, network policy summary, and filesystem policy summary. +NemoClaw integrates with each supported agent through a runtime layer that adapts the agent to OpenShell-managed providers, policies, and sandbox state. +The concrete files differ by agent because each runtime has its own plugin system, config format, state layout, and startup command. + +| Agent | Integration files | Runtime behavior | +|---|---|---| +| OpenClaw | `nemoclaw/openclaw.plugin.json`, `nemoclaw/src/runtime-context.ts`, and the TypeScript package under `nemoclaw/src/` | Registers the `/nemoclaw` slash command, adds the NemoClaw inference provider, and injects sandbox and policy context into OpenClaw turns. | +| Hermes | `agents/hermes/manifest.yaml`, `agents/hermes/plugin/plugin.yaml`, `agents/hermes/generate-config.ts`, `agents/hermes/config/`, and `agents/hermes/start.sh` | Declares the Hermes agent contract, installs the NemoClaw Hermes plugin, writes `/sandbox/.hermes/config.yaml` and `/sandbox/.hermes/.env`, and launches `hermes gateway run` behind the OpenShell proxy. | + +The OpenClaw integration is a thin TypeScript plugin that runs in-process with the OpenClaw gateway inside the sandbox. +Before an OpenClaw turn starts, the plugin prepends a short context block with the active sandbox name, sandbox phase, network policy summary, and filesystem policy summary. When the policy or phase changes during a session, the plugin sends a smaller update block instead of repeating the full context. -```text -nemoclaw/ -├── src/ -│ ├── index.ts Plugin entry: registers all commands -│ ├── cli.ts Commander.js subcommand wiring -│ ├── runtime-context.ts Sandbox and policy context injection -│ ├── commands/ -│ │ ├── launch.ts Fresh install into OpenShell -│ │ ├── connect.ts Interactive shell into sandbox -│ │ ├── status.ts Blueprint run state + sandbox health -│ │ ├── logs.ts Stream blueprint and sandbox logs -│ │ └── slash.ts /nemoclaw chat command handler -│ └── blueprint/ -│ ├── resolve.ts Version resolution, cache management -│ ├── fetch.ts Download blueprint from OCI registry -│ ├── verify.ts Digest verification, compatibility checks -│ ├── exec.ts Subprocess execution of blueprint runner -│ └── state.ts Persistent state (run IDs) -├── openclaw.plugin.json Plugin manifest -└── package.json Commands declared under openclaw.extensions -``` +The Hermes integration follows the generic agent-manifest path instead of the OpenClaw plugin package path. +The manifest declares Hermes' binary, health probe, config directory, state directories, messaging support, and OpenAI-compatible API endpoint. +The build-time config generator turns NemoClaw onboarding choices into Hermes YAML and environment files, and the Hermes plugin manifest exposes NemoClaw tools and an `on_session_start` hook. ## NemoClaw Blueprint @@ -166,10 +153,13 @@ nemoclaw-blueprint/ ├── model-specific-setup/ Agent-scoped model/provider compatibility manifests ├── router/ Model Router config and routing engine ├── policies/ -│ └── openclaw-sandbox.yaml Default network + filesystem policy +│ └── openclaw-sandbox.yaml Default network + filesystem policy for the OpenClaw profile ``` -The blueprint runtime (TypeScript) lives in the plugin source tree: +Hermes keeps its agent-owned image, plugin, config, entrypoint, and policy additions under `agents/hermes/`. +The default Hermes policy starts from `agents/hermes/policy-additions.yaml`. + +The current blueprint runner implementation lives in the `nemoclaw/` TypeScript package: ```text nemoclaw/src/blueprint/ @@ -189,8 +179,8 @@ flowchart LR D --> E[status] ``` -1. Resolve. The plugin locates the blueprint artifact and checks the version against `min_openshell_version` and `min_openclaw_version` constraints in `blueprint.yaml`. -2. Verify. The plugin checks the artifact digest against the expected value. +1. Resolve. The integration layer locates the blueprint artifact and checks the version against the OpenShell and agent runtime constraints in `blueprint.yaml`. +2. Verify. The integration layer checks the artifact digest against the expected value. 3. Plan. The runner determines what OpenShell resources to create or update, such as the gateway, providers, sandbox, inference route, and policy. 4. Apply. The runner executes the plan by calling `openshell` CLI commands. 5. Status. The runner reports current state. @@ -203,11 +193,11 @@ base image and layers the NemoClaw runtime Dockerfile on top. The direct bluepri runner still carries a pinned OpenShell Community OpenClaw image for legacy `openshell sandbox create --from` compatibility. Inside the sandbox: -- OpenClaw runs with the NemoClaw plugin pre-installed. +- The selected compatible agent runs with the NemoClaw integration layer installed or generated for that agent. - Inference calls are routed through OpenShell to the configured provider. -- Network egress is restricted by the baseline policy in `openclaw-sandbox.yaml`. +- Network egress is restricted by the baseline policy for the selected agent profile. - Filesystem access is confined to `/sandbox` and `/tmp` for read-write access, with system paths read-only. -- The NemoClaw plugin injects sandbox and policy context into agent turns so the agent can report policy blocks accurately. +- NemoClaw injects sandbox and policy context into agent turns when the selected agent supports runtime context hooks, so the agent can report policy blocks accurately. - The image exposes a Docker health check that probes the in-sandbox gateway, so container runtimes can report whether the agent service is responding. - The image includes common runtime compatibility helpers such as Homebrew and a `python` to `python3` symlink for tools that still invoke `python`. @@ -217,14 +207,14 @@ Inference requests from the agent never leave the sandbox directly. OpenShell intercepts them and routes to the configured provider: ```text -Agent (sandbox) ──▶ OpenShell gateway ──▶ NVIDIA Endpoint (build.nvidia.com) +Compatible agent (sandbox) ──▶ OpenShell gateway ──▶ Provider endpoint ``` When you select the Model Router provider, the OpenShell gateway routes to a host-side router process instead of a single upstream model. The router selects from the configured pool, then calls the upstream NVIDIA endpoint with the credential held outside the sandbox. Some model and provider combinations need agent-specific compatibility setup. -NemoClaw keeps those declarations under `nemoclaw-blueprint/model-specific-setup//` so OpenClaw and Hermes fixes can be tested and reviewed independently. +NemoClaw keeps those declarations under `nemoclaw-blueprint/model-specific-setup//` so fixes for each supported agent can be tested and reviewed independently. Refer to Inference Options (use the `nemoclaw-user-configure-inference` skill) for provider configuration details. diff --git a/.agents/skills/nemoclaw-user-reference/references/cli-selection-guide.md b/.agents/skills/nemoclaw-user-reference/references/cli-selection-guide.md index 7b96be6d39..3614b9fafc 100644 --- a/.agents/skills/nemoclaw-user-reference/references/cli-selection-guide.md +++ b/.agents/skills/nemoclaw-user-reference/references/cli-selection-guide.md @@ -1,6 +1,4 @@ - - -# CLI Selection Guide +# Choose Between NemoClaw and OpenShell CLIs NemoClaw uses two host-side CLIs. Use `nemoclaw` for NemoClaw-managed workflows. @@ -21,52 +19,52 @@ Use `nemoclaw` for operations where NemoClaw adds product-specific state, safety - Install, onboard, or recreate a NemoClaw sandbox: - ```console - $ nemoclaw onboard - $ nemoclaw onboard --resume --recreate-sandbox + ```bash + nemoclaw onboard + nemoclaw onboard --resume --recreate-sandbox ``` - List, connect to, check, or delete NemoClaw-managed sandboxes: - ```console - $ nemoclaw list - $ nemoclaw my-assistant connect - $ nemoclaw my-assistant status - $ nemoclaw my-assistant logs --follow - $ nemoclaw my-assistant destroy + ```bash + nemoclaw list + nemoclaw my-assistant connect + nemoclaw my-assistant status + nemoclaw my-assistant logs --follow + nemoclaw my-assistant destroy ``` - Rebuild or upgrade while preserving workspace state: - ```console - $ nemoclaw my-assistant rebuild - $ nemoclaw upgrade-sandboxes --check + ```bash + nemoclaw my-assistant rebuild + nemoclaw upgrade-sandboxes --check ``` - Snapshot, restore, or mount sandbox state: - ```console - $ nemoclaw my-assistant snapshot create --name before-change - $ nemoclaw my-assistant snapshot restore before-change - $ nemoclaw my-assistant share mount + ```bash + nemoclaw my-assistant snapshot create --name before-change + nemoclaw my-assistant snapshot restore before-change + nemoclaw my-assistant share mount ``` - Add or remove NemoClaw policy presets: - ```console - $ nemoclaw my-assistant policy-add pypi --yes - $ nemoclaw my-assistant policy-list - $ nemoclaw my-assistant policy-remove pypi --yes + ```bash + nemoclaw my-assistant policy-add pypi --yes + nemoclaw my-assistant policy-list + nemoclaw my-assistant policy-remove pypi --yes ``` - Manage NemoClaw messaging channels, credentials, diagnostics, and cleanup: - ```console - $ nemoclaw my-assistant channels add slack - $ nemoclaw credentials list - $ nemoclaw credentials reset nvidia-prod - $ nemoclaw debug --sandbox my-assistant - $ nemoclaw gc --dry-run + ```bash + nemoclaw my-assistant channels add slack + nemoclaw credentials list + nemoclaw credentials reset nvidia-prod + nemoclaw debug --sandbox my-assistant + nemoclaw gc --dry-run ``` ## Use `openshell` For OpenShell Operations @@ -75,40 +73,40 @@ Use `openshell` when the docs explicitly call for a live OpenShell gateway opera - Open the OpenShell TUI for network approvals and live activity: - ```console - $ openshell term + ```bash + openshell term ``` - Manage dashboard or service port forwards: - ```console - $ openshell forward start --background - $ openshell forward list + ```bash + openshell forward start --background + openshell forward list ``` - Inspect the underlying sandbox state: - ```console - $ openshell sandbox list - $ openshell sandbox get - $ openshell logs -n 20 - $ openshell doctor check + ```bash + openshell sandbox list + openshell sandbox get + openshell logs -n 20 + openshell doctor check ``` -- Run one-off commands or move files without starting a NemoClaw chat session: +- Move files, or run raw one-off commands when you intentionally want to bypass NemoClaw's sandbox registry and wrappers: - ```console - $ openshell sandbox exec -n -- ls -la /sandbox - $ openshell sandbox upload ./local-file /sandbox/ - $ openshell sandbox download /sandbox/output ./output + ```bash + openshell sandbox upload ./local-file /sandbox/ + openshell sandbox download /sandbox/output ./output + openshell sandbox exec -n -- env | grep '^HOME=' ``` - Inspect or replace raw OpenShell policy: - ```console - $ openshell policy get --full > live-policy.yaml - $ openshell policy update --add-endpoint api.example.com:443:read-only:rest:enforce - $ openshell policy set --policy live-policy.yaml + ```bash + openshell policy get --full > live-policy.yaml + openshell policy update --add-endpoint api.example.com:443:read-only:rest:enforce + openshell policy set --policy live-policy.yaml ``` `openshell policy update` merges specific endpoint and rule changes into the live sandbox policy. @@ -134,10 +132,18 @@ It waits for readiness, handles stale SSH host keys after gateway restarts, and Use `openshell sandbox connect ` only when you intentionally want the raw OpenShell connection path. -For a one-off command, use `openshell sandbox exec` instead of opening an interactive shell. +For a one-off command in a NemoClaw-managed sandbox, use `nemoclaw exec` instead of opening an interactive shell. +It resolves the sandbox by its NemoClaw registry name and runs through the standard NemoClaw CLI surface. +The command executes as the sandbox user with `HOME=/sandbox` inside the provisioned sandbox, where the agent configuration, inference routing, and policy state are already in place. -```console -$ openshell sandbox exec -n my-assistant -- cat /tmp/gateway.log +```bash +nemoclaw my-assistant exec -- cat /tmp/gateway.log +``` + +Use `openshell sandbox exec` for the raw OpenShell execution path, for example when addressing a sandbox by its gateway name or intentionally bypassing the NemoClaw CLI and registry. + +```bash +openshell sandbox exec -n my-assistant -- cat /tmp/gateway.log ``` ### Check Health or Logs @@ -159,27 +165,27 @@ Approved endpoints are session-scoped unless you also add them to the policy thr Use the NemoClaw commands for model or provider inspection and switches so the OpenShell route and the running agent config stay consistent: -```console -$ nemoclaw inference get -$ nemoclaw inference set --provider nvidia-prod --model nvidia/nemotron-3-super-120b-a12b +```bash +nemoclaw inference get +nemoclaw inference set --provider nvidia-prod --model nvidia/nemotron-3-super-120b-a12b ``` For Hermes sandboxes, use the alias; it updates the route and `/sandbox/.hermes/config.yaml` without a rebuild or restart: -```console -$ nemohermes inference set --provider hermes-provider --model openai/gpt-5.4-mini +```bash +nemohermes inference set --provider hermes-provider --model openai/gpt-5.4-mini ``` For a build-time agent setting change, rerun onboarding so the sandbox configuration is recreated consistently: -```console -$ nemoclaw onboard --resume --recreate-sandbox +```bash +nemoclaw onboard --resume --recreate-sandbox ``` Verify either path with: -```console -$ nemoclaw status +```bash +nemoclaw status ``` ### Update Network Policy @@ -198,7 +204,7 @@ Use `openshell sandbox upload` and `openshell sandbox download` for manual file ## Related Topics -- Commands (use the `nemoclaw-user-reference` skill) for the full NemoClaw command reference. +- [Commands](commands.md) for the full NemoClaw command reference. - Manage Sandbox Lifecycle (use the `nemoclaw-user-manage-sandboxes` skill) for day-two operations. - Switch Inference Models (use the `nemoclaw-user-configure-inference` skill) for inference route examples. - Customize the Network Policy (use the `nemoclaw-user-manage-policy` skill) for persistent network access changes. diff --git a/.agents/skills/nemoclaw-user-reference/references/commands.md b/.agents/skills/nemoclaw-user-reference/references/commands.md index fa2a81367c..4a632f4f6a 100644 --- a/.agents/skills/nemoclaw-user-reference/references/commands.md +++ b/.agents/skills/nemoclaw-user-reference/references/commands.md @@ -1,12 +1,50 @@ - - # NemoClaw CLI Commands Reference +import { AgentOnly } from "../_components/AgentGuide"; + + + The `nemoclaw` CLI is the primary interface for managing NemoClaw sandboxes. It is installed automatically by the installer (`curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash`). -For guidance on when to use `nemoclaw` versus the underlying `openshell` CLI, see CLI Selection Guide (use the `nemoclaw-user-reference` skill). +For guidance on when to use `nemoclaw` versus the underlying `openshell` CLI, see [CLI Selection Guide](cli-selection-guide.md). + + + + +The `nemohermes` alias is the primary interface for managing Hermes sandboxes through NemoClaw. +It is installed automatically by the installer (`curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_AGENT=hermes bash`). +Most commands in this reference use the same arguments and subcommands across agent variants. +Use `nemohermes` when you want Hermes selected by default. +For guidance on choosing between the agent CLIs and the underlying `openshell` CLI, see [CLI Selection Guide](cli-selection-guide.md). + + -## `/nemoclaw` Slash Command +## Agent Selection + + + +Use `nemoclaw` for the OpenClaw variant. +OpenClaw is the default agent for `nemoclaw onboard` unless you pass `--agent hermes` or set `NEMOCLAW_AGENT=hermes`. +OpenClaw-specific sections below describe the `/nemoclaw` slash command, the OpenClaw dashboard URL, the OpenClaw gateway token, and OpenClaw config paths under `/sandbox/.openclaw`. + + + + +Use `nemohermes` for the Hermes variant. +It selects Hermes by default during onboarding and for other commands. +Use `--agent hermes` during onboarding or set `NEMOCLAW_AGENT=hermes` when you need the same selection through another entry point. +Hermes-specific sections below describe the OpenAI-compatible API endpoint, optional Hermes dashboard, Hermes config under `/sandbox/.hermes`, and provider updates that patch `config.yaml`. + +```bash +nemohermes onboard # selects Hermes by default +nemohermes my-sandbox connect # connects to a Hermes sandbox +``` + + + +## In-Sandbox Commands + + The `/nemoclaw` slash command is available inside the OpenClaw chat interface for quick actions: @@ -17,25 +55,34 @@ The `/nemoclaw` slash command is available inside the OpenClaw chat interface fo | `/nemoclaw onboard` | Show onboarding status and reconfiguration guidance | | `/nemoclaw eject` | Show rollback instructions for returning to the host installation | + + + +Hermes does not use the OpenClaw chat slash command. +Use the host-side `nemohermes` commands for lifecycle, status, policy, and inference operations. +The in-sandbox Hermes integration installs the NemoClaw Hermes plugin, which exposes tools for status, environment information, and skill reload support, plus an `on_session_start` hook. + + + ## Standalone Host Commands -The `nemoclaw` binary handles host-side operations that run outside the OpenClaw plugin context. +The CLI handles host-side operations that run outside the selected agent runtime. ### `nemoclaw help`, `nemoclaw --help`, `nemoclaw -h` Show the top-level usage summary and command groups. Running `nemoclaw` with no arguments shows the same help output. -```console -$ nemoclaw help +```bash +nemoclaw help ``` ### `nemoclaw --version`, `nemoclaw -v` Print the installed NemoClaw CLI version. -```console -$ nemoclaw --version +```bash +nemoclaw --version ``` ### `nemoclaw resources` @@ -43,8 +90,8 @@ $ nemoclaw --version Display host hardware inventory and configured sandbox resource profiles. Use `--json` for machine-readable CPU, memory, GPU, Kubernetes allocatable-capacity, and profile data. -```console -$ nemoclaw resources [--json] +```bash +nemoclaw resources [--json] ``` If the gateway is not running, Kubernetes allocatable fields are omitted and host CPU/RAM totals are still shown. @@ -55,33 +102,58 @@ Run the interactive setup wizard (recommended for new installs). The wizard creates an OpenShell gateway, registers inference providers, builds the sandbox image, and creates the sandbox. Use this command for new installs and for recreating a sandbox after changes to policy or configuration. -```console -$ nemoclaw onboard [--non-interactive] [--resume | --fresh] [--recreate-sandbox] [--gpu | --no-gpu] [--from ] [--name ] [--sandbox-gpu | --no-sandbox-gpu] [--sandbox-gpu-device ] [--agent ] [--control-ui-port ] [--yes | -y] [--no-ollama-autostart] [--yes-i-accept-third-party-software] +```bash +nemoclaw onboard [--non-interactive] [--resume | --fresh] [--recreate-sandbox] [--gpu | --no-gpu] [--from ] [--name ] [--sandbox-gpu | --no-sandbox-gpu] [--sandbox-gpu-device ] [--agent ] [--control-ui-port ] [--yes | -y] [--no-ollama-autostart] [--yes-i-accept-third-party-software] ``` + + +For Hermes, use the alias or pass the agent explicitly: + +```bash +nemohermes onboard [options] +nemoclaw onboard --agent hermes [options] +``` + + + +#### `--resume` and `--fresh` + +NemoClaw records onboarding progress so interrupted runs can continue. +Use `--resume` to continue a resumable onboarding session with the provider, model, sandbox name, agent, and custom Dockerfile path recorded by the original run. +If the recorded session conflicts with flags you pass on the recovery run, NemoClaw exits and tells you to either rerun with the original settings or start over. + +Use `--fresh` to discard the saved onboarding session and start the wizard from the beginning. +This clears stale or failed session state before NemoClaw creates a new session record. +The installer also accepts `--fresh` and forwards it to `nemoclaw onboard`, which skips automatic resume detection. +`--resume` and `--fresh` are mutually exclusive. + **Warning:** For NemoClaw-managed environments, use `nemoclaw onboard` when you need to create or recreate the OpenShell gateway or sandbox. Avoid `openshell self-update`, `npm update -g openshell`, `openshell gateway start --recreate`, or `openshell sandbox create` directly unless you intend to manage OpenShell separately and then rerun `nemoclaw onboard`. +Use `--fresh` to ignore any saved onboarding session and restart the wizard from scratch. This is useful after an interrupted `nemoclaw onboard` run when you want to discard saved state instead of continuing it with `--resume`. + The installer detects existing sandbox sessions before onboarding and prints a warning if any are found. To make the installer abort instead of continuing, set `NEMOCLAW_SINGLE_SESSION=1`: -```console -$ NEMOCLAW_SINGLE_SESSION=1 curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash +```bash +NEMOCLAW_SINGLE_SESSION=1 curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash ``` When existing sandboxes were created with OpenShell earlier than `0.0.37`, the installer prompts before running the new automatic gateway upgrade path. For scripted installs, set `NEMOCLAW_ACCEPT_EXPERIMENTAL_OPENSHELL_UPGRADE=1` to allow the installer to back up registered sandbox state, retire the old gateway, install the current supported OpenShell release, and restore state during onboarding. The automatic path is disabled if the existing `nemoclaw` CLI does not advertise `backup-all`; preserve sandbox state manually before retiring the old gateway in that case. -To perform those steps manually, run `nemoclaw backup-all`, retire the old gateway with `openshell gateway destroy -g nemoclaw || openshell gateway destroy`, then rerun the installer as `curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_OPENSHELL_UPGRADE_PREPARED=1 bash`. +To perform those steps manually, run `nemoclaw backup-all`, retire the old gateway registration with `openshell gateway remove nemoclaw || openshell gateway destroy -g nemoclaw || openshell gateway destroy` (both verbs are tried so the right one runs on either OpenShell release), stop any remaining privileged host gateway with `sudo pkill -f openshell-gateway`, then rerun the installer as `curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_OPENSHELL_UPGRADE_PREPARED=1 bash`. The wizard prompts for a provider first, then collects the provider credential if needed. Supported non-experimental choices include NVIDIA Endpoints, OpenAI, Anthropic, Google Gemini, and compatible OpenAI or Anthropic endpoints. Credentials are registered with the OpenShell gateway and never persisted to host disk. See Credential Storage (use the `nemoclaw-user-configure-security` skill) for details on inspection, rotation, and migration from earlier releases. The legacy `nemoclaw setup` command is deprecated; use `nemoclaw onboard` instead. -After provider selection, the wizard prompts for a **policy tier** that controls the default set of network policy presets applied to the sandbox. +After provider selection, the wizard reviews the provider, model, credential state, and sandbox name before registering inference. +It then prompts for optional web search and messaging channels, builds and starts the sandbox, and asks for a **policy tier** that controls the default set of network policy presets applied to the sandbox. Three tiers are available: | Tier | Description | @@ -91,12 +163,14 @@ Three tiers are available: | Open | Broad access across third-party services including messaging and productivity. Agent-specific unsupported presets are filtered out. | After selecting a tier, the wizard shows a combined preset and access-mode screen where you can include or exclude individual presets and toggle each between read and read-write access. -For details on tiers and the presets each includes, see Network Policies (use the `nemoclaw-user-reference` skill). +For details on tiers and the presets each includes, see [Network Policies](network-policies.md#policy-tiers). +When you finish the policy step, NemoClaw records the finalized built-in preset selection for that sandbox. +Later re-onboard runs seed from that finalized selection, so presets you intentionally removed stay removed unless you select them again or override the policy mode. In non-interactive mode, set the tier with `NEMOCLAW_POLICY_TIER` (default: `balanced`): -```console -$ NEMOCLAW_POLICY_TIER=restricted nemoclaw onboard --non-interactive --yes-i-accept-third-party-software +```bash +NEMOCLAW_POLICY_TIER=restricted nemoclaw onboard --non-interactive --yes-i-accept-third-party-software ``` `NEMOCLAW_POLICY_MODE` controls how non-interactive onboarding reconciles the tier-derived suggestions against the sandbox's currently-applied presets. @@ -113,35 +187,37 @@ NemoClaw filters tier suggestions and resume selections by active agent support, | `custom` | Apply exactly `NEMOCLAW_POLICY_PRESETS`. Previously-applied presets not in the list are removed. Alias: `list`. | | `skip` | Skip the policy step entirely. Aliases: `none`, `no`. | -If you enable Brave Search during onboarding, NemoClaw currently stores the Brave API key in the sandbox's OpenClaw configuration. -That means the OpenClaw agent can read the key. -NemoClaw explores an OpenShell-hosted credential path first, but the current OpenClaw Brave runtime does not consume that path end to end yet. + + +If you enable Brave Search during onboarding, NemoClaw registers a Brave Search OpenShell provider and keeps `openclaw.json` on an OpenShell credential placeholder. +At egress, OpenShell rewrites Brave's `X-Subscription-Token` header with the real `BRAVE_API_KEY`. Treat Brave Search as an explicit opt-in and use a dedicated low-privilege Brave key. + For non-interactive onboarding, you must explicitly accept the third-party software notice: -```console -$ nemoclaw onboard --non-interactive --yes-i-accept-third-party-software +```bash +nemoclaw onboard --non-interactive --yes-i-accept-third-party-software ``` or: -```console -$ NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 nemoclaw onboard --non-interactive +```bash +NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 nemoclaw onboard --non-interactive ``` For scripted installer runs, pass explicit acceptance to the `bash` side of the installer pipe: -```console -$ curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 bash +```bash +curl -fsSL https://www.nvidia.com/nemoclaw.sh | NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 bash ``` If the installer cannot prompt for the notice in a terminal and no explicit acceptance is set, it exits before installing Node.js or the NemoClaw CLI. To enable Brave Search in non-interactive mode, set: -```console -$ BRAVE_API_KEY=... \ +```bash +BRAVE_API_KEY=... \ nemoclaw onboard --non-interactive ``` @@ -151,7 +227,8 @@ After fixing the key, re-enable web search with `nemoclaw config web-search`. The wizard prompts for a sandbox name. Names must be 1 to 63 characters, lowercase, start with a letter, contain only letters, numbers, and internal hyphens, and end with a letter or number. -Uppercase letters are automatically lowercased. +The CLI rejects names that do not match these rules. +It also prints a `Try: ` recovery line whenever it can derive a valid lowercase, hyphen-separated form from the input, so passing `--name MyAssistant` reports `Try: myassistant`. Names that match global CLI commands (`status`, `list`, `debug`, etc.) are rejected to avoid routing conflicts. Use `--agent ` to target a specific installed agent profile during onboarding. @@ -173,11 +250,32 @@ If you enable Telegram during onboarding, the wizard can also prompt for whether Set `TELEGRAM_REQUIRE_MENTION=1` for non-interactive onboarding when you want mention-only group replies. Pairing and `TELEGRAM_ALLOWED_IDS` still govern direct messages. -If you run onboarding again with the same sandbox name and choose a different inference provider or model, NemoClaw detects the drift and recreates the sandbox so the running OpenClaw UI matches your selection. +If you cancel a brand-new onboarding run at the policy preset step, NemoClaw rolls back the sandbox, registry entry, and onboarding session instead of leaving a default sandbox with unfinished policy state. +Existing live sandboxes are not deleted by this cancel rollback path. + +If you run onboarding again with the same sandbox name and choose a different inference provider or model, NemoClaw detects the drift and recreates the sandbox so the running agent config matches your selection. In interactive mode, the wizard asks for confirmation before delete and recreate. In non-interactive mode, NemoClaw recreates automatically when the stored selection is readable and differs; if NemoClaw cannot read the stored selection, NemoClaw reuses by default. Set `NEMOCLAW_RECREATE_SANDBOX=1` to force recreation even when no drift is detected. +Before deleting an existing sandbox during recreation, NemoClaw backs up the workspace state declared by the selected agent profile and restores it into the new sandbox once it is live. +This applies whether the existing sandbox is ready or marked not-ready, so cross-version upgrades that pass `NEMOCLAW_RECREATE_SANDBOX=1` no longer drop user files from the selected agent workspace. +The behaviour matches `nemoclaw rebuild --force`. +NemoClaw aborts the recreate when the backup cannot complete in full — including when individual state directories or files fail mid-backup — so failed entries are not silently dropped on delete. +Set `NEMOCLAW_RECREATE_WITHOUT_BACKUP=1` to skip the pre-recreate backup. +The destination sandbox starts with a fresh workspace. + + + +For OpenClaw, the backed-up paths include agents, extensions, workspace, skills, hooks, identity, devices, canvas, cron, memory, telegram, wechat, credentials, and `/sandbox/.openclaw/workspace/`. + + + + +For Hermes, the backed-up paths come from `agents/hermes/manifest.yaml`, including `/sandbox/.hermes` state such as memories, sessions, skills, plugins, cron, logs, plans, workspace, messaging platform state, and `runtime/state.db`. + + + Before creating the gateway, the wizard runs preflight checks. It verifies that Docker is reachable, warns on untested runtimes such as Podman, and prints host remediation guidance when prerequisites are missing. The preflight also enforces the OpenShell version range declared in the blueprint (`min_openshell_version` and `max_openshell_version`). @@ -187,8 +285,13 @@ If release metadata is unavailable, the installer uses its bundled fallback pin When NemoClaw finds an existing gateway to reuse, it probes the host gateway HTTP endpoint before declaring the gateway reusable. If the container is running but the upstream is still warming up (for example, immediately after a Docker daemon restart), NemoClaw rebuilds the gateway instead of trusting stale metadata. +On the Docker-driver gateway path, preflight stays read-only when it detects a stale gateway (for example, a Docker-driver runtime env hash drift). +It prints a `⚠ Gateway will be recreated when sandbox creation starts` notice and defers the actual teardown to step `[2/8] Starting OpenShell gateway`. +This means pressing `Ctrl+C` between preflight and step `[2/8]` leaves the running gateway and existing sandbox containers untouched, so `nemoclaw onboard` is safe to run just to check preflight output. For Linux Docker-driver gateways, onboarding also checks that a helper container on the OpenShell Docker network can reach `host.openshell.internal:`. -If a host firewall blocks that sandbox path, onboarding exits with a `sudo ufw allow from to any port proto tcp` command before it reports the gateway healthy. +If a host firewall blocks that sandbox path, onboarding exits with a `sudo ufw allow from to port proto tcp` command before it reports the gateway healthy. +Set `NEMOCLAW_AUTO_FIX_FIREWALL=1` to opt in to automatic UFW remediation for this specific failure: NemoClaw uses `sudo -n` only, validates the Docker bridge subnet/gateway/port, applies the narrow UFW rule only after a proven TCP reachability failure, and re-probes before continuing. +If passwordless sudo, UFW, or active UFW is unavailable, NemoClaw falls back to the manual guidance path without prompting for a password. Tune the wait via `NEMOCLAW_REUSE_HEALTH_POLL_COUNT` (default `6`) and `NEMOCLAW_REUSE_HEALTH_POLL_INTERVAL` (default `5` seconds). The poll count is clamped to a minimum of `1` so the probe always runs at least once, and the interval is clamped to a minimum of `0` (no sleep between attempts). @@ -202,8 +305,8 @@ Other build outputs such as `dist/`, `target/`, or `build/` are still included. If the staged context is larger than 100 MB, onboarding prints a warning before the Docker build starts. If the directory contains unreadable files (for example, Windows system files visible in WSL), onboarding exits with an error suggesting you move the Dockerfile to a dedicated directory. -```console -$ nemoclaw onboard --from path/to/Dockerfile +```bash +nemoclaw onboard --from path/to/Dockerfile ``` The Dockerfile path must exist. @@ -223,8 +326,8 @@ All NemoClaw build arguments (`NEMOCLAW_MODEL`, `NEMOCLAW_PROVIDER_KEY`, `NEMOCL In non-interactive mode, the path can also be supplied via the `NEMOCLAW_FROM_DOCKERFILE` environment variable. You must also supply a sandbox name via `--name ` or `NEMOCLAW_SANDBOX_NAME` so a `--from` build cannot silently clobber the default `my-assistant` sandbox. -```console -$ NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_FROM_DOCKERFILE=path/to/Dockerfile NEMOCLAW_SANDBOX_NAME=my-build nemoclaw onboard +```bash +NEMOCLAW_NON_INTERACTIVE=1 NEMOCLAW_FROM_DOCKERFILE=path/to/Dockerfile NEMOCLAW_SANDBOX_NAME=my-build nemoclaw onboard ``` If a `--resume` is attempted with a different `--from` path than the original session, onboarding exits with a conflict error rather than silently building from the wrong image. @@ -235,8 +338,8 @@ Set the sandbox name without going through the interactive prompt. The same name format and reserved-name rules that the wizard enforces apply here too. Names must be 1 to 63 characters, lowercase, start with a letter, contain only letters, numbers, and internal hyphens, and end with a letter or number. Names that match a NemoClaw CLI command (`status`, `list`, `debug`, etc.) are rejected up front. -```console -$ nemoclaw onboard --non-interactive --name my-build --from path/to/Dockerfile +```bash +nemoclaw onboard --non-interactive --name my-build --from path/to/Dockerfile ``` The flag wins over `NEMOCLAW_SANDBOX_NAME`. @@ -249,13 +352,15 @@ Combining `--from ` with non-interactive onboarding requires one of Use a custom Dockerfile for the sandbox image. This variant of `nemoclaw onboard` accepts a `--from ` argument to build the sandbox from a user-supplied Dockerfile instead of the default NemoClaw image. -```console -$ nemoclaw onboard --from ./Dockerfile.custom +```bash +nemoclaw onboard --from ./Dockerfile.custom ``` ### GPU passthrough -When `nemoclaw onboard` detects an NVIDIA GPU on the host (`nvidia-smi` succeeds), it enables OpenShell GPU passthrough at both the gateway and sandbox level by default. +When `nemoclaw onboard` detects an NVIDIA GPU on the host, it enables OpenShell GPU passthrough at both the gateway and sandbox level by default. +Detection proceeds along two paths. The `nvidia-smi`-based paths (the primary `--query-gpu=name,memory.total,memory.free` probe and the unified-memory `--query-gpu=name` fallback) require `nvidia-smi` to succeed and, on hosts whose firmware does not classify as a known NVIDIA platform (DGX Spark, DGX Station, Jetson, or Tegra), additionally require that the GPU name does not match the placeholder family observed on the Windows-on-ARM WSL2 nvidia-smi shim (`JMJWOA-Generic-*`) and that either the host is not ARM64 Linux (the observed shim is Windows-on-ARM only) or the NVIDIA kernel driver is bound (`/proc/driver/nvidia/` present), so that placeholder shims on non-NVIDIA hardware are not mistaken for real GPUs. +Jetson/Tegra hosts that ship without `nvidia-smi` continue to be detected via the devicetree firmware fallback (`/sys/firmware/devicetree/base/model`) or the Tegra device-node fallback (`/dev/nvhost-gpu`, `/dev/nvhost-ctrl-gpu`, `/dev/nvhost-ctrl`, or `/dev/nvmap`); both bypass the trust-tier gate above. Use `--no-gpu` to opt out when you want host-side inference providers only and do not need direct GPU access inside the sandbox. Use `--gpu` to require GPU passthrough and fail fast if an NVIDIA GPU is not detected. Use `--sandbox-gpu` or `--no-sandbox-gpu` to control only direct NVIDIA GPU access inside the sandbox. @@ -265,7 +370,9 @@ If the patch fails, onboarding keeps diagnostics and prints a manual cleanup com Prerequisites: -- NVIDIA GPU drivers installed and working (`nvidia-smi` must succeed). +- Ensure NVIDIA GPU drivers are installed and working. + - On generic NVIDIA hosts, `nvidia-smi` must succeed. + - On Jetson/Tegra hosts shipping without `nvidia-smi`, the devicetree firmware fallback substitutes. - NVIDIA Container Toolkit configured for Docker. When GPU passthrough is enabled and a gateway already exists without it, onboarding first checks whether replacing the CPU-only gateway is safe. @@ -281,9 +388,9 @@ Pass `--json` for machine-readable output that includes a `schemaVersion`, the d Sandboxes with an active SSH session are marked with a `●` indicator so you can tell at a glance which sandbox you are already connected to in another terminal. When a sandbox has a recorded dashboard port, the output includes its local dashboard URL. -```console -$ nemoclaw list [--json] -$ nemoclaw list --json +```bash +nemoclaw list [--json] +nemoclaw list --json ``` ### `nemoclaw deploy` @@ -298,8 +405,8 @@ This command remains as a compatibility wrapper for the older Brev-specific boot The Brev instance name is the positional argument. The sandbox name comes from `NEMOCLAW_SANDBOX_NAME` and defaults to `my-assistant`; invalid sandbox names fail before Brev provisioning starts. -```console -$ nemoclaw deploy +```bash +nemoclaw deploy ``` ### `nemoclaw connect` @@ -314,12 +421,20 @@ Set `NEMOCLAW_NO_CONNECT_HINT=1` to suppress the hint in scripted workflows. If the sandbox is running an outdated agent version, a non-blocking warning prints before connecting with a `nemoclaw rebuild` hint. If another terminal is already connected to the sandbox, `connect` prints a note with the number of existing sessions before proceeding. Multiple concurrent sessions are allowed. +`connect` does not pull or serve a model itself, but it does inspect `NEMOCLAW_VLLM_MODEL` if you exported it for the managed-vLLM install path. +An unknown slug or a gated model (for example `deepseek-r1-distill-70b`) with no `HF_TOKEN` or `HUGGING_FACE_HUB_TOKEN` exits non-zero with the same error the installer would emit, before any sandbox readiness probe or SSH attach. +Unset the variable, or supply the missing token, before retrying. + +When the live OpenShell gateway inference route differs from the route recorded in the NemoClaw registry, `connect` prints an explicit warning and realigns the shared gateway to the recorded route. +Use `nemoclaw inference set --provider --model ` to make an intentional route change. +If the sandbox is registered locally but missing from a healthy gateway, `connect` preserves the registry entry and points you to `rebuild --yes`, `onboard`, or `destroy` instead of deleting the metadata needed for recovery. + After a host reboot, the OpenShell gateway rotates its SSH host keys. `connect` detects the resulting identity drift, prunes stale `openshell-*` entries from `~/.ssh/known_hosts`, and retries automatically. You no longer need to re-run `nemoclaw onboard` after a reboot in this case. -```console -$ nemoclaw my-assistant connect [--probe-only] +```bash +nemoclaw my-assistant connect [--probe-only] ``` The `--probe-only` flag verifies the sandbox is reachable over SSH and exits without opening a shell. @@ -328,14 +443,30 @@ Use it for health checks and scripted readiness probes. ### `nemoclaw exec` Run a single command non-interactively in a running sandbox via the OpenShell exec endpoint. -The command runs as the sandbox user with `HOME=/sandbox`, so in-sandbox tooling resolves NemoClaw-provisioned config under `/sandbox/.openclaw` the same way it does for `connect` and `openshell sandbox connect`. -This is the supported substitute for `docker exec` on the sandbox container; raw `docker exec` runs as root and lands on `HOME=/root`, where the agent config is not present and `openclaw agent` falls back to its built-in defaults. +The command runs as the sandbox user with `HOME=/sandbox`, so in-sandbox tooling resolves NemoClaw-provisioned config the same way it does for `connect` and `openshell sandbox connect`. +This is the supported substitute for `docker exec` on the sandbox container; raw `docker exec` runs as root and lands on `HOME=/root`, where the selected agent config is not present. -```console -$ nemoclaw my-assistant exec -- openclaw agent -m "What is 2+2?" -$ nemoclaw my-assistant exec --workdir /sandbox/workspace -- ls -la + + +OpenClaw config resolves under `/sandbox/.openclaw`. + +```bash +nemoclaw my-assistant exec -- openclaw agent -m "What is 2+2?" +nemoclaw my-assistant exec --workdir /sandbox/workspace -- ls -la ``` + + + +Hermes config resolves under `/sandbox/.hermes`. + +```bash +nemohermes my-assistant exec -- hermes --version +nemohermes my-assistant exec --workdir /sandbox/workspace -- ls -la +``` + + + Everything after `--` is forwarded verbatim to the sandbox command, including flags the inner command needs. The exit code is the remote command's exit code. @@ -354,14 +485,29 @@ Use this after a sandbox pod restart, a sandbox crash, or whenever `nemoclaw status` Show sandbox status, health, and inference configuration. +Pass `--json` to emit a structured per-sandbox report instead of the text renderer. +The JSON output includes at least `schemaVersion`, `name`, `found`, `model`, `provider`, `phase`, `gatewayState`, `inferenceHealth`, `rpcIssue`, `hostGpuDetected`, `sandboxGpuEnabled`, `sandboxGpuMode`, `sandboxGpuDevice`, `openshellDriver`, `openshellVersion`, `policies`, `failureLayer`, and `dockerPaused`. +`openshellDriver` and `openshellVersion` are always strings (falling back to `"unknown"` when the registry has no value), so consumers can rely on `typeof` checks. +`failureLayer` is `null` when no preflight failure was detected and otherwise one of `docker_unreachable`, `sandbox_container_stopped`, or `sandbox_dashboard_port_conflict`; when set, `inferenceHealth` is suppressed to `null` so automation does not see a stale remote-provider healthy status during a local outage. +`dockerPaused` is `true` when NemoClaw detects that the Docker-driver sandbox container is paused. +In that case, text output keeps OpenShell's authoritative phase but prints a `docker unpause ` recovery hint instead of sending you directly to rebuild. +The command exits non-zero when the sandbox is missing locally, the gateway state is not `present`, the gateway reports a schema/protobuf mismatch (mirrored as `rpcIssue`), or `failureLayer` is non-null. +The alias form `nemoclaw status --json` requires the sandbox to be registered locally; the canonical form `nemoclaw sandbox status --json` is the one to use from automation that may run against an unknown sandbox name, since it still emits a JSON document with `found: false` instead of a text error. + +```bash +nemoclaw my-assistant status +nemoclaw my-assistant status --json +nemoclaw sandbox status my-assistant --json +``` + The command probes every inference provider and reports one of three states on the `Inference` line: | State | Meaning | @@ -380,9 +526,19 @@ Use that line to distinguish a healthy backend from a broken proxy path that the For cloud-only providers, the output omits the NIM status line unless a NIM container is registered or an unexpected NIM container is running. +When the sandbox's recorded driver is `docker` and the host Docker daemon is not reachable, the command prints `Failure layer: docker_unreachable — Docker daemon is not reachable.` as the first line of stdout, suppresses the host-side `Inference` probe (which otherwise hits the remote provider directly and is misleading when the local stack is down), and exits with a non-zero status. + +When the host Docker daemon is reachable but the per-sandbox container is stopped, the command prints `Failure layer: sandbox_container_stopped — sandbox container exists but is not running.` as the first line of stdout, suppresses the host-side `Inference` probe, and exits with a non-zero status. +If the sandbox's recorded dashboard port is also held by a foreign listener, the header escalates to `Failure layer: sandbox_dashboard_port_conflict — sandbox container is stopped and the dashboard port is held by a foreign listener.` so the operator can recover the port before restarting the sandbox. + If the sandbox or gateway cannot be verified, the command exits non-zero instead of reporting healthy inference from stale registry state. +When a locally registered sandbox is missing from the live gateway, status preserves the registry entry so the suggested `rebuild --yes` recovery can still find the sandbox metadata. Gateway and dashboard health checks treat HTTP `401` from device auth as a live service, not as an offline gateway. +When sandbox GPU passthrough is enabled, the `Sandbox GPU` line includes the last CUDA usability proof state. +It reports `(CUDA verified)`, `(CUDA unverified)`, or `(last CUDA proof failed: