Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .agents/skills/nemoclaw-user-agent-skills/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ This page is for users who installed NemoClaw with the installer and do not have

Fetch only the skills from the NemoClaw repository without downloading the full source tree.

```console
$ git clone --filter=blob:none --no-checkout https://github.com/NVIDIA/NemoClaw.git
$ cd NemoClaw
$ git sparse-checkout set --no-cone '/.agents/skills/nemoclaw-user-*/**' '/.agents/skills/nemoclaw-skills-guide/**' '/.claude/**' '/AGENTS.md' '/CLAUDE.md'
$ git checkout
```bash
git clone --filter=blob:none --no-checkout https://github.com/NVIDIA/NemoClaw.git
cd NemoClaw
git sparse-checkout set --no-cone '/.agents/skills/nemoclaw-user-*/**' '/.agents/skills/nemoclaw-skills-guide/**' '/.claude/**' '/AGENTS.md' '/CLAUDE.md'
git checkout
```

Open the `NemoClaw` directory in your AI coding assistant.
Expand Down
335 changes: 252 additions & 83 deletions .agents/skills/nemoclaw-user-configure-inference/SKILL.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,21 @@
<!-- SPDX-License-Identifier: Apache-2.0 -->
# NemoClaw Inference Options

import { AgentOnly } from "../_components/AgentGuide";

NemoClaw supports multiple inference providers.
During onboarding, the `nemoclaw onboard` wizard presents a numbered list of providers to choose from.
Your selection determines where the agent's inference traffic is routed.
During onboarding, the NemoClaw onboarding wizard presents a numbered list of providers to choose from.
Your selection determines where NemoClaw routes the agent's inference traffic.

<AgentOnly variant="openclaw">
For OpenClaw onboarding, use `nemoclaw onboard`.
The provider flow is the same, with the NVIDIA Endpoints route available for OpenClaw Agent.
</AgentOnly>

<AgentOnly variant="hermes">
For Hermes onboarding, use `nemohermes onboard`.
The provider flow is the same, with the Hermes Provider route available for Hermes Agent.
</AgentOnly>

## How Inference Routing Works

Expand Down Expand Up @@ -37,7 +49,7 @@ NemoClaw uses provider-specific local tokens for those routes, and rebuilds of l

The onboard wizard presents the following provider options by default.
The first six are always available.
Ollama appears when it is installed or running on the host.
Ollama appears when you have installed or started it on the host.
Local vLLM appears when NemoClaw detects a running vLLM server.
The managed install/start vLLM entry appears by default on DGX Spark and DGX Station, and appears on generic Linux NVIDIA GPU hosts after opt-in.

Expand All @@ -57,7 +69,7 @@ The managed install/start vLLM entry appears by default on DGX Spark and DGX Sta

NVIDIA Nemotron models expose OpenAI-compatible APIs across every supported deployment surface, so two onboarding options can route to Nemotron.

| Where Nemotron is hosted | Onboard wizard option | Why |
| Nemotron Host | Onboard Wizard Option | Why |
|---|---|---|
| `build.nvidia.com` (NVIDIA-hosted) | **Option 1: NVIDIA Endpoints** | NemoClaw sets the base URL to `https://integrate.api.nvidia.com/v1` for you and validates the model against the build catalog. |
| Self-hosted NIM container | **Option 3: Other OpenAI-compatible endpoint** | NIM exposes an OpenAI-compatible `/v1/chat/completions` route. Point the base URL at your NIM service and enter the Nemotron model ID. |
Expand Down Expand Up @@ -116,11 +128,19 @@ The sandbox never sees raw API keys.

To use the router in scripted setup, set:

```console
$ NEMOCLAW_PROVIDER=routed NVIDIA_API_KEY=<your-key> nemoclaw onboard --non-interactive
<AgentOnly variant="openclaw">
```bash
NEMOCLAW_PROVIDER=routed NVIDIA_API_KEY=<your-key> nemoclaw onboard --non-interactive
```
</AgentOnly>

### Host Python requirement
<AgentOnly variant="hermes">
```bash
NEMOCLAW_PROVIDER=routed NVIDIA_API_KEY=<your-key> nemohermes onboard --non-interactive
```
</AgentOnly>

### Host Python Requirement

The Model Router runs in a host-side virtual environment that NemoClaw creates during onboarding.
NemoClaw probes `python3.13`, `python3.12`, `python3.11`, `python3.10`, and bare `python3`, and adopts the first interpreter that satisfies both of:
Expand All @@ -131,20 +151,34 @@ NemoClaw probes `python3.13`, `python3.12`, `python3.11`, `python3.10`, and bare
If no candidate qualifies, onboarding aborts and prints the real failure for each candidate.
This surfaces issues like Homebrew `python@3.14` whose `pyexpat` extension fails to dlopen against the older system `libexpat` on macOS.

<AgentOnly variant="openclaw">
To pin a specific interpreter, set `NEMOCLAW_MODEL_ROUTER_PYTHON` to its absolute path before running `nemoclaw onboard`:
</AgentOnly>
<AgentOnly variant="hermes">
To pin a specific interpreter, set `NEMOCLAW_MODEL_ROUTER_PYTHON` to its absolute path before running `nemohermes onboard`:
</AgentOnly>

<AgentOnly variant="openclaw">
```bash
NEMOCLAW_MODEL_ROUTER_PYTHON=/opt/homebrew/bin/python3.12 nemoclaw onboard
```
</AgentOnly>

```console
$ NEMOCLAW_MODEL_ROUTER_PYTHON=/opt/homebrew/bin/python3.12 nemoclaw onboard
<AgentOnly variant="hermes">
```bash
NEMOCLAW_MODEL_ROUTER_PYTHON=/opt/homebrew/bin/python3.12 nemohermes onboard
```
</AgentOnly>

The pin is strict.
NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to a different python on `PATH`.
Relative command names such as `python3.12` are rejected; use `command -v python3.12` to find the absolute path.
NemoClaw rejects relative command names such as `python3.12`.
Use `command -v python3.12` to find the absolute path.
If `python -m venv` itself fails for a probe-clean interpreter (for example, a corrupt ensurepip seed), NemoClaw retries with the next healthy candidate when no pin is set; with a pin set, the failure stops onboarding so you can fix or repoint the pinned python.

## Caveated Local Options

The following local inference options are caveated.
The following local inference options have caveats.
Local NIM and generic Linux managed vLLM install/start require `NEMOCLAW_EXPERIMENTAL=1`; DGX Spark and DGX Station managed vLLM entries appear by default.
An already-running vLLM server appears directly in the onboarding selection list.

Expand All @@ -159,20 +193,20 @@ For setup instructions, refer to [Use a Local Inference Server](../SKILL.md).

NemoClaw validates the selected provider and model before creating the sandbox.
If credential validation fails, the wizard asks whether to re-enter the API key, choose a different provider, retry, or exit.
Transient upstream validation failures are retried before the wizard reports a provider failure.
The wizard retries transient upstream validation failures before it reports a provider failure.
The `nvapi-` prefix check applies only to `NVIDIA_API_KEY`.
Other provider credentials, such as `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `GEMINI_API_KEY`, and compatible endpoint keys, use provider-aware validation during retry.

| Provider type | Validation method |
|---|---|
| OpenAI | Tries `/responses` first, then `/chat/completions`. |
| NVIDIA Endpoints | Validates via `/v1/chat/completions` only; the `/v1/responses` probe is skipped because NVIDIA Build does not expose `/v1/responses` (returns 404 for every model). |
| Google Gemini | Validates via Gemini's OpenAI-compatible chat-completions path only; the `/v1/responses` probe is skipped because Gemini does not support the Responses API. |
| NVIDIA Endpoints | Validates through `/v1/chat/completions` only; NemoClaw skips the `/v1/responses` probe because NVIDIA Build does not expose `/v1/responses` (returns 404 for every model). |
| Google Gemini | Validates through Gemini's OpenAI-compatible chat-completions path only; NemoClaw skips the `/v1/responses` probe because Gemini does not support the Responses API. |
| Other OpenAI-compatible endpoint | Tries `/v1/responses` first with a tool-calling probe; falls back to `/v1/chat/completions`. Selected runtime API defaults to `/v1/chat/completions`; set `NEMOCLAW_PREFERRED_API=openai-responses` to allow `/v1/responses` at runtime when validation succeeds. |
| Anthropic-compatible | Tries `/v1/messages`. |
| NVIDIA Endpoints (manual model entry) | Validates the model name against the catalog API. |
| Compatible endpoints | Sends a real inference request because many proxies do not expose a `/models` endpoint. For OpenAI-compatible endpoints, the probe tries `/v1/responses` first then falls back to `/v1/chat/completions`; the selected runtime API defaults to `/v1/chat/completions`. Set `NEMOCLAW_PREFERRED_API=openai-responses` to allow `/v1/responses` at runtime when validation succeeds. |
| Local NVIDIA NIM | Validates via `/v1/chat/completions` only; the `/v1/responses` probe is skipped (same as NVIDIA Endpoints). |
| Local NVIDIA NIM | Validates through `/v1/chat/completions` only; NemoClaw skips the `/v1/responses` probe (same as NVIDIA Endpoints). |

## Next Steps

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,17 +37,17 @@ It keeps the primary `main` agent on the normal NemoClaw inference route and add
| Sub-agent model | `nvidia-omni/private/nvidia/nemotron-3-nano-omni-reasoning-30b-a3b` |
| Delegation tool | `sessions_spawn` |

Omni is used as the specialist model for image tasks.
The sub-agent uses Omni as the specialist model for image tasks.
The primary orchestration model remains responsible for conversation, planning, and deciding when to delegate.

## Update the Sandbox Config

Fetch the current OpenClaw config from the sandbox, patch it with your auxiliary provider and `agents.list` changes, then upload it back.

```console
$ export SANDBOX=my-assistant
$ export DOCKER_CTR=openshell-cluster-nemoclaw
$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- cat /sandbox/.openclaw/openclaw.json > /tmp/openclaw.json
```bash
export SANDBOX=my-assistant
export DOCKER_CTR=openshell-cluster-nemoclaw
docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- cat /sandbox/.openclaw/openclaw.json > /tmp/openclaw.json
```

Create `/tmp/openclaw.updated.json` with the OpenClaw sub-agent config.
Expand All @@ -56,13 +56,13 @@ For the Omni example, the demo provides `vlm-demo/vlm-subagent/openclaw-patch.py
Upload the patched config and refresh the hash.
In the default mutable state, this keeps the local hash consistent but does not make it tamper-proof; lock the config root-owned and read-only afterward if the sandbox should enforce config integrity at startup.

```console
$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 644 /sandbox/.openclaw/openclaw.json
$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 644 /sandbox/.openclaw/.config-hash
$ cat /tmp/openclaw.updated.json | docker exec -i "$DOCKER_CTR" kubectl exec -i -n openshell "$SANDBOX" -c agent -- sh -c 'cat > /sandbox/.openclaw/openclaw.json'
$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- /bin/bash -c "cd /sandbox/.openclaw && sha256sum openclaw.json > .config-hash"
$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 444 /sandbox/.openclaw/openclaw.json
$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 444 /sandbox/.openclaw/.config-hash
```bash
docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 644 /sandbox/.openclaw/openclaw.json
docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 644 /sandbox/.openclaw/.config-hash
cat /tmp/openclaw.updated.json | docker exec -i "$DOCKER_CTR" kubectl exec -i -n openshell "$SANDBOX" -c agent -- sh -c 'cat > /sandbox/.openclaw/openclaw.json'
docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- /bin/bash -c "cd /sandbox/.openclaw && sha256sum openclaw.json > .config-hash"
docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 444 /sandbox/.openclaw/openclaw.json
docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chmod 444 /sandbox/.openclaw/.config-hash
```

Check `/tmp/gateway.log` after upload and confirm the gateway hot-reloaded the provider or `agents.list` change.
Expand All @@ -77,10 +77,10 @@ For the Omni example:
```

Use the same provider ID that appears in `models.providers`, such as `nvidia-omni`.
After uploading the auth profile, make sure the sub-agent directory is owned by the sandbox user:
After uploading the auth profile, make sure the sandbox user owns the sub-agent directory:

```console
$ docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chown -R sandbox:sandbox /sandbox/.openclaw/agents/vision-operator
```bash
docker exec "$DOCKER_CTR" kubectl exec -n openshell "$SANDBOX" -c agent -- chown -R sandbox:sandbox /sandbox/.openclaw/agents/vision-operator
```

## Allow Auxiliary Provider Egress
Expand Down
Loading
Loading