NVIDIA · miyoungc · May 28, 2026 · May 28, 2026 · May 28, 2026 · May 28, 2026
diff --git a/.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json b/.agents/skills/nemoclaw-user-deploy-remote/evals/evals.json
diff --git a/.agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json b/.agents/skills/nemoclaw-user-monitor-sandbox/evals/evals.json
@@ -3,42 +3,18 @@
     "id": "docs-monitoring-monitor-sandbox-activity-001",
     "question": "I'm monitoring sandbox activity. Help me understand what the agent and sandbox are doing now so I can detect unhealthy or unexpected behavior early.",
     "expected_skill": "nemoclaw-user-monitor-sandbox",
-    "ground_truth": "A NemoClaw-specific answer that helps the user understand what the agent and sandbox are doing now and gives enough concrete guidance, decision criteria, verification steps, or risk framing to detect unhealthy or unexpected behavior early.",
-    "expected_behavior": [
-      "The output directly addresses the user's situation: monitoring sandbox activity.",
-      "The AI coding assistant loads the expected_skill and SKILL.md",
-      "The output helps the user understand what the agent and sandbox are doing now with NemoClaw-specific guidance rather than generic advice.",
-      "The output gives enough concrete guidance, decision criteria, verification steps, or risk framing for the user to detect unhealthy or unexpected behavior early.",
-      "The output avoids inventing unsupported NemoClaw behavior.",
-      "The output follows progressive disclosure: it answers the current request without dumping unrelated details other than the expected_skill and the SKILL.md file."
-    ]
+    "ground_truth": "A NemoClaw-specific answer that helps the user understand what the agent and sandbox are doing now and gives enough concrete guidance, decision criteria, verification steps, or risk framing to detect unhealthy or unexpected behavior early."
   },
   {
     "id": "docs-monitoring-monitor-sandbox-activity-002",
     "question": "I'm diagnosing a runtime failure. Help me use health, logs, and traces to locate the failing layer so I can separate host, gateway, sandbox, policy, and inference issues.",
     "expected_skill": "nemoclaw-user-monitor-sandbox",
-    "ground_truth": "A NemoClaw-specific answer that helps the user use health, logs, and traces to locate the failing layer and gives enough concrete guidance, decision criteria, verification steps, or risk framing to separate host, gateway, sandbox, policy, and inference issues.",
-    "expected_behavior": [
-      "The output directly addresses the user's situation: diagnosing a runtime failure.",
-      "The AI coding assistant loads the expected_skill and SKILL.md",
-      "The output helps the user use health, logs, and traces to locate the failing layer with NemoClaw-specific guidance rather than generic advice.",
-      "The output gives enough concrete guidance, decision criteria, verification steps, or risk framing for the user to separate host, gateway, sandbox, policy, and inference issues.",
-      "The output avoids inventing unsupported NemoClaw behavior.",
-      "The output follows progressive disclosure: it answers the current request without dumping unrelated details other than the expected_skill and the SKILL.md file."
-    ]
+    "ground_truth": "A NemoClaw-specific answer that helps the user use health, logs, and traces to locate the failing layer and gives enough concrete guidance, decision criteria, verification steps, or risk framing to separate host, gateway, sandbox, policy, and inference issues."
   },
   {
     "id": "docs-monitoring-monitor-sandbox-activity-003",
     "question": "I'm collecting debugging evidence. Help me gather enough information without weakening controls so I can investigate safely and share useful diagnostics.",
     "expected_skill": "nemoclaw-user-monitor-sandbox",
-    "ground_truth": "A NemoClaw-specific answer that helps the user gather enough information without weakening controls and gives enough concrete guidance, decision criteria, verification steps, or risk framing to investigate safely and share useful diagnostics.",
-    "expected_behavior": [
-      "The output directly addresses the user's situation: collecting debugging evidence.",
-      "The AI coding assistant loads the expected_skill and SKILL.md",
-      "The output helps the user gather enough information without weakening controls with NemoClaw-specific guidance rather than generic advice.",
-      "The output gives enough concrete guidance, decision criteria, verification steps, or risk framing for the user to investigate safely and share useful diagnostics.",
-      "The output avoids inventing unsupported NemoClaw behavior.",
-      "The output follows progressive disclosure: it answers the current request without dumping unrelated details other than the expected_skill and the SKILL.md file."
-    ]
+    "ground_truth": "A NemoClaw-specific answer that helps the user gather enough information without weakening controls and gives enough concrete guidance, decision criteria, verification steps, or risk framing to investigate safely and share useful diagnostics."
   }
 ]
diff --git a/skills/nemoclaw-user-deploy-remote/BENCHMARK.md b/skills/nemoclaw-user-deploy-remote/BENCHMARK.md
@@ -0,0 +1,70 @@
+# Evaluation Report
+
+Evaluation of the `nemoclaw-user-deploy-remote` skill before publication through NVSkills-Eval.
+
+This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.
+
+## Evaluation Summary
+
+- Skill: `nemoclaw-user-deploy-remote`
+- Evaluation date: 2026-05-28
+- NVSkills-Eval profile: `external`
+- Overall verdict: FAIL
+- Tier 3 live agent evaluation: not available in this report
+
+## Agents Used
+
+- Tier 3 agent details were not available in this report.
+
+## Metrics Used
+
+Reported benchmark dimensions:
+
+- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
+- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
+- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
+- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
+- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.
+
+Underlying evaluation signals used in this run:
+
+- No Tier 3 evaluation signal details were available in this report.
+
+## Test Tasks
+
+Tier 3 evaluation task details were not available in this report.
+
+## Results
+
+Tier 3 dimension rollup was not available in this report.
+
+## Tier 1: Static Validation Summary
+
+Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 13 total findings.
+
+Top findings:
+
+- MEDIUM QUALITY/quality_correctness: SKILL_SPEC recommended field missing: 'metadata.author' (`skills/nemoclaw-user-deploy-remote/SKILL.md`)
+- MEDIUM QUALITY/quality_correctness: SKILL_SPEC recommended field missing: 'metadata.tags' (`skills/nemoclaw-user-deploy-remote/SKILL.md`)
+- MEDIUM QUALITY/quality_efficiency: Deeply nested references in brev-web-ui.md (`skills/nemoclaw-user-deploy-remote/SKILL.md`)
+- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`skills/nemoclaw-user-deploy-remote/SKILL.md`)
+- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Examples' (`skills/nemoclaw-user-deploy-remote/SKILL.md`)
+
+## Tier 2: Deduplication Summary
+
+Tier 2 validation reported findings. NVSkills-Eval ran 2 checks and found 2 total findings.
+
+Top findings:
+
+- HIGH DUPLICATE/duplicate: Duplicate content found within references/install-openclaw-plugins.md:
+  "## Network Access" in references/install-openclaw-plugins.md (lines 64-73)
+  vs "## Next Steps" in references/install-openclaw-plugins.md (lines 86-93) (`references/install-openclaw-plugins.md:64`)
+- HIGH DUPLICATE/duplicate: Duplicate content found across SKILL.md and references/brev-web-ui.md and references/install-openclaw-plugins.md and references/sandbox-hardening.md:
+  "(preamble)" in SKILL.md (lines 1-3)
+  vs "(preamble)" in references/brev-web-ui.md (lines 1-2)
+  vs "(preamble)" in references/install-openclaw-plugins.md (lines 1-2)
+  vs "(preamble)" in references/sandbox-hardening.md (lines 1-2) (`SKILL.md:1`)
+
+## Publication Recommendation
+
+The skill should be reviewed before NVSkills-Eval publication. Skill owners should address the findings above and rerun NVSkills-Eval to refresh this benchmark.
diff --git a/skills/nemoclaw-user-deploy-remote/SKILL.md b/skills/nemoclaw-user-deploy-remote/SKILL.md
@@ -0,0 +1,177 @@
+---
+name: "nemoclaw-user-deploy-remote"
+description: "Explains how to run NemoClaw on a remote GPU instance, including the deprecated Brev compatibility path and the preferred installer plus onboard flow. Use when deploying NemoClaw to a remote VM, onboarding a Brev instance, or migrating away from the legacy `nemoclaw deploy` wrapper. Trigger keywords - deploy nemoclaw remote gpu, nemoclaw brev cloud deployment, nemoclaw plugins, openclaw plugins, install openclaw plugin, nemoclaw onboard from dockerfile, nemoclaw brev web ui, nemoclaw getting started, brev quickstart, nvidia nemotron agent, nemoclaw sandbox hardening, container security, docker capabilities, process limits."
+license: "Apache-2.0"
+---
+
+<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Deploy NemoClaw to a Remote GPU Instance
+
+## Gotchas
+
+- The `nemoclaw deploy` command is deprecated.
+- On Brev, set `CHAT_UI_URL` in the launchable environment configuration so it is available when the installer builds the sandbox image.
+
+## Prerequisites
+
+- The [Brev CLI](https://brev.nvidia.com) installed and authenticated.
+- A provider credential for the inference backend you want to use during onboarding.
+- `HF_TOKEN` or `HUGGING_FACE_HUB_TOKEN` exported when your remote vLLM or Hugging Face workflow needs access to gated models.
+- NemoClaw installed locally if you plan to use the deprecated `nemoclaw deploy` wrapper. Otherwise, install NemoClaw directly on the remote host after provisioning it.
+
+Run NemoClaw on a remote GPU instance through [Brev](https://brev.nvidia.com).
+The preferred path is to provision the VM, run the standard NemoClaw installer on that host, and then run `nemoclaw onboard`.
+
+## Quick Start
+
+If your Brev instance is already up and has already been onboarded with a sandbox, start with the standard sandbox chat flow:
+
+```console
+$ nemoclaw my-assistant connect
+$ openclaw tui
+```
+
+This gets you into the sandbox shell first and opens the OpenClaw chat UI right away.
+If the VM is fresh, run the standard installer on that host and then run `nemoclaw onboard` before trying `nemoclaw my-assistant connect`.
+
+If you are connecting from your local machine and still need to provision the remote VM, you can still use `nemoclaw deploy <instance-name>` as the legacy compatibility path described below.
+
+## Deploy the Instance
+
+**Warning:**
+
+The `nemoclaw deploy` command is deprecated.
+Prefer provisioning the remote host separately, then running the standard NemoClaw installer and `nemoclaw onboard` on that host.
+
+Create a Brev instance and run the legacy compatibility flow:
+
+```console
+$ nemoclaw deploy <instance-name>
+```
+
+Replace `<instance-name>` with a name for your remote instance, for example `my-gpu-box`.
+The sandbox created on the remote VM uses `NEMOCLAW_SANDBOX_NAME`, or `my-assistant` when the variable is unset.
+Sandbox names must be lowercase, start with a letter, contain only letters, numbers, and internal hyphens, and end with a letter or number.
+The deploy wrapper validates the sandbox name before it provisions the Brev instance, opens SSH, or starts the remote installer.
+
+The legacy compatibility flow performs the following steps on the VM:
+
+1. Installs Docker and the NVIDIA Container Toolkit if a GPU is present.
+2. Installs the OpenShell CLI.
+3. Runs `nemoclaw onboard` (the setup wizard) to create the gateway, register providers, and launch the sandbox.
+4. Starts optional host auxiliary services (for example the cloudflared tunnel) when `cloudflared` is available. Channel messaging is configured during onboarding and runs through OpenShell-managed processes, not through `nemoclaw tunnel start`.
+
+By default, the compatibility wrapper asks Brev to provision on `gcp`. Override this with `NEMOCLAW_BREV_PROVIDER` if you need a different Brev cloud provider.
+If you export `HF_TOKEN` or `HUGGING_FACE_HUB_TOKEN`, the wrapper forwards those values to the VM so remote setup can pull gated Hugging Face model repositories.
+
+## Connect to the Remote Sandbox
+
+After deployment finishes, the deploy command opens an interactive shell inside the remote sandbox.
+To reconnect after closing the session, run the command again:
+
+```console
+$ nemoclaw deploy <instance-name>
+```
+
+## Monitor the Remote Sandbox
+
+SSH to the instance and run the OpenShell TUI to monitor activity and approve network requests:
+
+```console
+$ ssh <instance-name> 'cd ~/nemoclaw && set -a && . .env && set +a && openshell term'
+```
+
+## Verify Inference
+
+Run a test agent prompt inside the remote sandbox:
+
+```console
+$ openclaw agent --agent main -m "Hello from the remote sandbox" --session-id test
+```
+
+## Remote Dashboard Access
+
+The NemoClaw dashboard validates the browser origin against an allowlist baked
+into the sandbox image at build time.  By default the allowlist only contains
+`http://127.0.0.1:18789`.  When accessing the dashboard from a remote browser
+(for example through a Brev public URL or an SSH port-forward), set
+`CHAT_UI_URL` to the origin the browser will use **before** running setup:
+
+```console
+$ export CHAT_UI_URL="https://openclaw0-<id>.brevlab.com"
+$ nemoclaw deploy <instance-name>
+```
+
+For SSH port-forwarding, the origin is typically `http://127.0.0.1:18789` (the
+default), so no extra configuration is needed.
+
+**Warning:**
+
+On Brev, set `CHAT_UI_URL` in the launchable environment configuration so it is
+available when the installer builds the sandbox image. If `CHAT_UI_URL` is not
+set on a headless host, the compatibility wrapper prints a warning.
+
+`NEMOCLAW_DISABLE_DEVICE_AUTH` is also evaluated at image build time.
+When `CHAT_UI_URL` points at a non-loopback origin, NemoClaw disables OpenClaw device pairing in the generated sandbox configuration because browser-only remote users cannot complete terminal-based pairing.
+Any device that can reach the configured dashboard origin can connect without pairing, so avoid exposing that origin on internet-reachable or shared-network deployments.
+
+## First-Run Readiness Budget
+
+On a remote GPU host, the first `nemoclaw onboard` typically does the slowest work of the lifecycle: the sandbox image is built locally and uploaded into the OpenShell gateway, which can stream hundreds of MiB over the VM's link before the readiness wait even starts.
+The post-create readiness wait defaults to 180 seconds (`NEMOCLAW_SANDBOX_READY_TIMEOUT`), which is sized for warm-cache, workstation-class onboarding and can be exceeded on:
+
+- DGX Station first runs with large quantised models (70B+ parameter footprints, NVFP4 weights).
+- Cloud VMs where the local image-build cache is cold and the upload runs over the public network.
+- Hosts onboarding the Brave Web Search preset on the first run (the egress policy stack adds boot work).
+
+Raise the budget before re-running onboard:
+
+```console
+$ export NEMOCLAW_SANDBOX_READY_TIMEOUT=600
+$ nemoclaw onboard
+```
+
+If onboard ends with `Sandbox '<name>' was created but did not become ready within 180s`, onboard deletes the partially-created sandbox first, so the next attempt with the raised budget starts from a clean state.
+For the inference-probe budget that runs earlier in onboarding, see `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` (use the `nemoclaw-user-configure-inference` skill).
+
+## Proxy Configuration
+
+NemoClaw routes sandbox traffic through a gateway proxy that defaults to `10.200.0.1:3128`.
+If your network requires a different proxy, set `NEMOCLAW_PROXY_HOST` and `NEMOCLAW_PROXY_PORT` before onboarding:
+
+```console
+$ export NEMOCLAW_PROXY_HOST=proxy.example.com
+$ export NEMOCLAW_PROXY_PORT=8080
+$ nemoclaw onboard
+```
+
+These values are baked into the sandbox image at build time.
+They are also forwarded into the runtime container during sandbox creation, so `/tmp/nemoclaw-proxy-env.sh` uses the same host and port that the image build used.
+Only alphanumeric characters, dots, hyphens, and colons are accepted for the host.
+The port must be numeric (0-65535).
+Changing the proxy after onboarding requires re-running `nemoclaw onboard`.
+
+## GPU Configuration
+
+The deploy script uses the `NEMOCLAW_GPU` environment variable to select the GPU type.
+The default value is `a2-highgpu-1g:nvidia-tesla-a100:1`.
+Set this variable before running `nemoclaw deploy` to use a different GPU configuration:
+
+```console
+$ export NEMOCLAW_GPU="a2-highgpu-1g:nvidia-tesla-a100:2"
+$ nemoclaw deploy <instance-name>
+```
+
+## References
+
+- **Load [references/install-openclaw-plugins.md](references/install-openclaw-plugins.md)** when users ask how to install, build, or configure OpenClaw plugins under NemoClaw. Explains the difference between OpenClaw plugins and agent skills, and shows the current Dockerfile-based workflow for baking a plugin into a NemoClaw sandbox.
+- **Load [references/brev-web-ui.md](references/brev-web-ui.md)** when a user wants to try NemoClaw without installing the CLI, or asks how to get started on Brev. Guides users through deploying NemoClaw with the Brev web UI.
+- **Load [references/sandbox-hardening.md](references/sandbox-hardening.md)** when reviewing sandbox image security controls, auditing capability drops, or looking up the runtime resource limits. Includes the sandbox container image hardening reference, covering Docker capabilities and process limits.
+
+## Related Skills
+
+- `nemoclaw-user-manage-sandboxes` — Set Up Messaging Channels (use the `nemoclaw-user-manage-sandboxes` skill) to connect Telegram, Discord, or Slack through OpenShell-managed channel messaging
+- `nemoclaw-user-monitor-sandbox` — Monitor Sandbox Activity (use the `nemoclaw-user-monitor-sandbox` skill) for sandbox monitoring tools
+- `nemoclaw-user-reference` — Commands (use the `nemoclaw-user-reference` skill) for the full `deploy` command reference