Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 22 additions & 39 deletions .agents/skills/nemoclaw-user-configure-inference/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ When Ollama reports a loaded-model context length, NemoClaw uses that value for
If the selected model declares that it does not support tool calling, onboarding stops with guidance to choose a model whose `ollama show <model>` capabilities include `tools`.
The validation also requires structured chat-completions tool calls.
If the model leaks tool-call JSON as plain message text, onboarding stops so you can choose a model that returns tool calls in the expected response field.
If a host-side validation probe times out, NemoClaw retries the Ollama tool-call validation with a larger timeout before failing the setup.
On WSL, if you choose the Windows-host Ollama path, NemoClaw uses `host.docker.internal:11434` and pulls missing models through the Ollama HTTP API instead of requiring the `ollama` CLI inside WSL.

### WSL with Windows-Host Ollama
Expand Down Expand Up @@ -108,53 +109,35 @@ JSON such as `{"name":"memory_search","arguments":{...}}` instead of running a
tool, switch to vLLM with `--enable-auto-tool-choice` and the correct
`--tool-call-parser`. See [Tool-Calling Reliability](references/tool-calling-reliability.md).

### Authenticated Reverse Proxy
### GPU Memory Cleanup

On non-WSL hosts, NemoClaw keeps Ollama bound to `127.0.0.1:11434` and starts a token-gated reverse proxy on `0.0.0.0:11435`.
The native install/start paths also reset NemoClaw-managed systemd launches to the loopback binding.
Containers and other hosts on the local network reach Ollama only through the
proxy, which validates a Bearer token before forwarding requests.
On that native path, NemoClaw never exposes Ollama without authentication.
When you switch away from Ollama, stop host services, or destroy an Ollama-backed sandbox, NemoClaw asks Ollama to unload currently loaded models from GPU memory.
The cleanup sends `keep_alive: 0` for each model reported by Ollama and runs on a best-effort basis, so shutdown continues if Ollama is already stopped.
This does not delete downloaded model files.

WSL Ollama paths do not use this proxy.
Windows-host Ollama uses the Windows daemon through `host.docker.internal`.

For non-WSL Ollama setups, the onboard wizard manages the proxy automatically:

- Generates a random 24-byte token on first run and stores it in
`~/.nemoclaw/ollama-proxy-token` with `0600` permissions.
- Starts the proxy after Ollama and verifies it before continuing.
- Cleans up stale proxy processes from previous runs.
- Probes the sandbox Docker network path to the proxy before committing the inference route.
- Stops matching proxy processes during uninstall before deleting NemoClaw state.
- Reuses the persisted token after a host reboot so you do not need to re-run
onboard.

On native Linux hosts, a firewall can allow the host proxy health check while still blocking sandbox containers on the OpenShell Docker bridge.
When the sandbox-side proxy probe fails with a TCP error, onboarding exits before it saves the inference route and prints a command like:
### Non-Interactive Setup

```console
$ sudo ufw allow from <openshell-docker-subnet> to any port 11435 proto tcp
$ nemoclaw onboard
$ NEMOCLAW_PROVIDER=ollama \
NEMOCLAW_MODEL=qwen2.5:14b \
nemoclaw onboard --non-interactive --yes
```

If the probe cannot run, for example because Docker Desktop or WSL uses a different host routing model, onboarding continues and relies on the regular proxy health check.

The sandbox provider is configured to use proxy port `11435` with the generated
token as its `OPENAI_API_KEY` credential.
OpenShell's L7 proxy injects the token at egress, so the agent inside the
sandbox never sees the token directly.
If `NEMOCLAW_MODEL` is not set, NemoClaw selects a default model based on available memory.
If `NEMOCLAW_MODEL` names a known bootstrap model (for example `qwen3.6:35b`) that does not fit the host's currently available GPU memory, NemoClaw warns and falls back to the largest known model that does fit.
Unknown or custom tags (any value the bootstrap registry has not seen) are still passed through; the Ollama runner validates the choice itself.

All proxy endpoints require the Bearer token, including `GET /api/tags`.
Internal health and reachability checks run via the proxy treat any HTTP
response (including `401`) as proof the proxy is alive — they only fail
when nothing answers at all.
`--yes` (or `NEMOCLAW_YES=1`) authorises the Ollama model download without an interactive confirmation prompt.
Under `--non-interactive`, `--yes` (or `NEMOCLAW_YES=1`) is required to authorise the download — onboard exits otherwise, since it cannot prompt.
Run onboard without `--non-interactive` to get the interactive `[y/N]` prompt that shows the model size before downloading.

If Ollama is already running on a non-loopback address when you start onboard,
the wizard restarts it on `127.0.0.1:11434` so the proxy is the only network
path to the model server.
| Variable | Purpose |
|---|---|
| `NEMOCLAW_PROVIDER` | Set to `ollama`. |
| `NEMOCLAW_MODEL` | Ollama model tag to use. Optional. |
| `NEMOCLAW_YES` | Set to `1` to auto-accept the model-download confirmation prompt. Optional. |

Load [references/use-local-inference-details.md](references/use-local-inference-details.md) for detailed steps on GPU Memory Cleanup, Non-Interactive Setup.
Load [references/use-local-inference-details.md](references/use-local-inference-details.md) for detailed steps on Authenticated Reverse Proxy.

## OpenAI-Compatible Server

Expand Down Expand Up @@ -273,7 +256,7 @@ Load [references/use-local-inference-details.md](references/use-local-inference-
- **Load [references/set-up-sub-agent.md](references/set-up-sub-agent.md)** when users ask how to add a second model, configure a sub-agent model, use Omni for vision tasks, configure agents.list, or use sessions_spawn in NemoClaw. Shows the NemoClaw-specific file paths and update flow for adding an auxiliary OpenClaw sub-agent model.
- **[references/tool-calling-reliability.md](references/tool-calling-reliability.md)** — Explains Ollama tool-call leak symptoms, when vLLM with a tool-call parser is recommended, and how to repoint NemoClaw to a parser-aware local endpoint.
- **Load [references/inference-options.md](references/inference-options.md)** when explaining which providers are available, what the onboard wizard presents, or how inference routing works. Lists all inference providers offered during NemoClaw onboarding.
- **Load [references/use-local-inference-details.md](references/use-local-inference-details.md)** when you need detailed steps for GPU Memory Cleanup, Non-Interactive Setup, Selecting the API Path, and related details.
- **Load [references/use-local-inference-details.md](references/use-local-inference-details.md)** when you need detailed steps for Authenticated Reverse Proxy, Non-Interactive Setup, Selecting the API Path, and related details.

## Related Skills

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,70 @@
<!-- SPDX-License-Identifier: Apache-2.0 -->
# Use a Local Inference Server: Details

## GPU Memory Cleanup
## Authenticated Reverse Proxy

When you switch away from Ollama, stop host services, or destroy an Ollama-backed sandbox, NemoClaw asks Ollama to unload currently loaded models from GPU memory.
The cleanup sends `keep_alive: 0` for each model reported by Ollama and runs on a best-effort basis, so shutdown continues if Ollama is already stopped.
This does not delete downloaded model files.
On non-WSL hosts, NemoClaw keeps Ollama bound to `127.0.0.1:11434` and starts a token-gated reverse proxy on `0.0.0.0:11435`.
The native install/start paths also reset NemoClaw-managed systemd launches to the loopback binding.
Containers and other hosts on the local network reach Ollama only through the
proxy, which validates a Bearer token before forwarding requests.
On that native path, NemoClaw never exposes Ollama without authentication.

### Non-Interactive Setup
WSL Ollama paths do not use this proxy.
Windows-host Ollama uses the Windows daemon through `host.docker.internal`.

For non-WSL Ollama setups, the onboard wizard manages the proxy automatically:

- Generates a random 24-byte token on first run and stores it in
`~/.nemoclaw/ollama-proxy-token` with `0600` permissions.
- Starts the proxy after Ollama and verifies it before continuing.
- Cleans up stale proxy processes from previous runs.
- Probes the sandbox Docker network path to the proxy before committing the inference route.
- Stops matching proxy processes during uninstall before deleting NemoClaw state.
- Reuses the persisted token after a host reboot so you do not need to re-run
onboard.

On native Linux hosts, a firewall can allow the host proxy health check while still blocking sandbox containers on the OpenShell Docker bridge.
When the sandbox-side proxy probe fails with a TCP error, onboarding exits before it saves the inference route and prints a command like:

```console
$ NEMOCLAW_PROVIDER=ollama \
NEMOCLAW_MODEL=qwen2.5:14b \
nemoclaw onboard --non-interactive --yes
$ sudo ufw allow from <openshell-docker-subnet> to any port 11435 proto tcp
$ nemoclaw onboard
```

If `NEMOCLAW_MODEL` is not set, NemoClaw selects a default model based on available memory.
If `NEMOCLAW_MODEL` names a known bootstrap model (for example `qwen3.6:35b`) that does not fit the host's currently available GPU memory, NemoClaw warns and falls back to the largest known model that does fit.
Unknown or custom tags (any value the bootstrap registry has not seen) are still passed through; the Ollama runner validates the choice itself.
If the probe cannot run, for example because Docker Desktop or WSL uses a different host routing model, onboarding continues and relies on the regular proxy health check.

The sandbox provider is configured to use proxy port `11435` with the generated
token as its `OPENAI_API_KEY` credential.
OpenShell's L7 proxy injects the token at egress, so the agent inside the
sandbox never sees the token directly.

All proxy endpoints require the Bearer token, including `GET /api/tags`.
Internal health and reachability checks run via the proxy treat any HTTP
response (including `401`) as proof the proxy is alive — they only fail
when nothing answers at all.

`--yes` (or `NEMOCLAW_YES=1`) authorises the Ollama model download without an interactive confirmation prompt.
Under `--non-interactive`, `--yes` (or `NEMOCLAW_YES=1`) is required to authorise the download — onboard exits otherwise, since it cannot prompt.
Run onboard without `--non-interactive` to get the interactive `[y/N]` prompt that shows the model size before downloading.
If Ollama is already running on a non-loopback address when you start onboard,
the wizard restarts it on `127.0.0.1:11434` so the proxy is the only network
path to the model server.

### Non-Interactive Setup

Set the following environment variables for scripted or CI/CD deployments.

```console
$ NEMOCLAW_PROVIDER=custom \
NEMOCLAW_ENDPOINT_URL=http://localhost:8000/v1 \
NEMOCLAW_MODEL=meta-llama/Llama-3.1-8B-Instruct \
COMPATIBLE_API_KEY=dummy \
nemoclaw onboard --non-interactive
```

| Variable | Purpose |
|---|---|
| `NEMOCLAW_PROVIDER` | Set to `ollama`. |
| `NEMOCLAW_MODEL` | Ollama model tag to use. Optional. |
| `NEMOCLAW_YES` | Set to `1` to auto-accept the model-download confirmation prompt. Optional. |
| `NEMOCLAW_PROVIDER` | Set to `custom` for an OpenAI-compatible endpoint. |
| `NEMOCLAW_ENDPOINT_URL` | Base URL of the local server. |
| `NEMOCLAW_MODEL` | Model ID as reported by the server. |
| `COMPATIBLE_API_KEY` | API key for the endpoint. Use any non-empty value if authentication is not required. |

### Selecting the API Path

Expand Down Expand Up @@ -75,6 +112,26 @@ $ NEMOCLAW_PROVIDER=anthropicCompatible \
nemoclaw onboard --non-interactive
```

### Non-Interactive Setup

Use an already-running vLLM server:

```console
$ NEMOCLAW_PROVIDER=vllm \
nemoclaw onboard --non-interactive
```

Install or start managed vLLM when a supported profile is detected.
On DGX Spark and DGX Station, `NEMOCLAW_PROVIDER=install-vllm` is enough for non-interactive runs; add `NEMOCLAW_EXPERIMENTAL=1` on generic Linux NVIDIA GPU hosts.

```console
$ NEMOCLAW_PROVIDER=install-vllm \
nemoclaw onboard --non-interactive
```

NemoClaw records the model returned by vLLM's `/v1/models` endpoint.
Start vLLM with the model you want before onboarding if you manage the server yourself.

### Override the Managed-vLLM Model

Managed vLLM serves the profile default unless you select a different registry entry.
Expand Down Expand Up @@ -120,6 +177,8 @@ This setting is baked into the sandbox at build time.
Changing it after onboarding requires re-running `nemoclaw onboard`.

`NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` only governs the inference-server validation probe.
During local Ollama setup, NemoClaw treats host-side curl process timeouts as retryable probe failures and retries with a larger timeout before it reports a validation failure.
NemoClaw also retries Docker runtime detection with a longer `docker info` timeout before it chooses the local inference route.
The post-create readiness wait (image build, gateway upload, in-sandbox boot) has its own budget, `NEMOCLAW_SANDBOX_READY_TIMEOUT`, also defaulting to 180 seconds.
On hosts where the sandbox image takes minutes to build or upload — large quantised models, DGX Station first runs, or remote VMs over a slow link — raise both together:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ The scanner intercepts Write, Edit, and similar tool calls targeting memory and
| Aspect | Detail |
|---|---|
| Default | Enabled. The plugin registers a `before_tool_call` hook that scans for 14 high-confidence secret patterns. |
| What it covers | Examples include `.openclaw/memory/`, `.openclaw/workspace/`, `.openclaw/agents/`, `.openclaw/skills/`, `.openclaw/hooks/`, `.openclaw/credentials/`, `.openclaw/openclaw.json`, `.nemoclaw/`, and `MEMORY.md`; the exact coverage is defined by `MEMORY_PATH_SEGMENTS` and enforced through `isMemoryPath()`. |
| What it covers | Three classifiers, all enforced through `isMemoryPath()`: (1) absolute `MEMORY_PATH_SEGMENTS` such as `/.openclaw/memory/`, `/.openclaw/workspace/`, `/.openclaw/agents/`, `/.openclaw/skills/`, `/.openclaw/hooks/`, `/.openclaw/credentials/`, `/.openclaw/openclaw.json`, `/.nemoclaw/`; (2) canonical workspace basenames in `MEMORY_BASENAMES` (`IDENTITY.md`, `MEMORY.md`, `SOUL.md`, `USER.md`, `AGENTS.md`) matched regardless of the surrounding path; and (3) lexically-normalized workspace-relative writes matching `MEMORY_RELATIVE_PREFIXES` (`.openclaw/`, `.nemoclaw/`, `memory/`) or named workspace daily memory paths, for embedded-fallback mode where the host's path resolver is unavailable. |
| What you can change | This is not a user-facing knob. The plugin enforces it automatically. |
| Risk if relaxed | Without scanning, the agent could persist API keys or tokens in memory files that survive across sessions and backups. |
| Recommendation | No action needed. If a write is blocked, the agent receives an actionable error listing the detected patterns. |
Expand Down
8 changes: 8 additions & 0 deletions .agents/skills/nemoclaw-user-get-started/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@ Follow these steps to get started with NemoClaw and your first sandboxed OpenCla

Make sure you have completed reviewing the [Prerequisites](references/prerequisites.md) before following this guide.

**Use Agent Skills:**

NemoClaw ships user skills for AI coding assistants.
Load them when you want your assistant to walk through installation, inference choices, policy approvals, monitoring, or troubleshooting with NemoClaw-specific guidance.
Refer to Agent Skills (use the `nemoclaw-user-agent-skills` skill).

## Install NemoClaw and Onboard OpenClaw Agent

Download and run the installer script.
Expand Down Expand Up @@ -115,6 +121,7 @@ If you enable it, enter a Brave Search API key when prompted.

The wizard also offers messaging channels such as Telegram, Discord, Slack, WeChat, and WhatsApp.
Press a channel number to toggle it, then press Enter to continue.
If you leave all channels unselected, pressing Enter skips messaging setup.
If you select a channel, NemoClaw validates the token format before it bakes the channel configuration into the sandbox.
For example, Slack bot tokens must start with `xoxb-`.
WeChat and WhatsApp are experimental.
Expand Down Expand Up @@ -212,3 +219,4 @@ openclaw tui
## Related Skills

- `nemoclaw-user-overview` — NemoClaw Overview (use the `nemoclaw-user-overview` skill) to learn what NemoClaw is and its capabilities
- `nemoclaw-user-agent-skills` — Agent Skills (use the `nemoclaw-user-agent-skills` skill) to load NemoClaw guidance into an AI coding assistant
Original file line number Diff line number Diff line change
Expand Up @@ -67,3 +67,4 @@ The table is generated from [`ci/platform-matrix.json`](https://github.com/NVIDI

- [Prepare Windows for NemoClaw](windows-preparation.md) if you are using Windows.
- [Quickstart](../SKILL.md) to install NemoClaw and launch your first sandbox.
- Agent Skills (use the `nemoclaw-user-agent-skills` skill) to load NemoClaw guidance into an AI coding assistant before setup.
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ Pair only one sandbox per WhatsApp account at a time.

When the wizard reaches **Messaging channels**, it lists Telegram, Discord, Slack, WeChat, and WhatsApp.
Press a channel number to toggle it on or off, then press **Enter** when done.
If no channels are selected, pressing **Enter** skips messaging setup.
If a token-based channel token is not already in the environment or credential store, the wizard prompts for it and saves it.

If you enable WeChat (experimental), the wizard does not prompt for a paste token.
Expand Down
2 changes: 2 additions & 0 deletions .agents/skills/nemoclaw-user-overview/references/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ NemoClaw provides the following product capabilities.
| Feature | Description |
|---------|-------------|
| Guided onboarding | Validates credentials, selects providers, and creates a working sandbox in one command. |
| Agent skills | Packages NemoClaw documentation as user skills so AI coding assistants can guide setup, inference configuration, policy management, monitoring, deployment, security review, and troubleshooting. |
| Hardened blueprint | A security-first Dockerfile with capability drops, least-privilege network rules, and declarative policy. |
| State management | Safe migration of agent state across machines with credential stripping and integrity verification. |
| Messaging channels | OpenShell-managed processes connect Telegram, Discord, Slack, and similar platforms to the sandboxed agent. NemoClaw configures channels during onboarding; OpenShell supplies the native constructs, credential flow, and runtime supervision. |
Expand Down Expand Up @@ -60,4 +61,5 @@ Navigate to the following topics to learn more about NemoClaw and how to install
- [Architecture Overview](how-it-works.md) to understand how NemoClaw works.
- [Ecosystem](ecosystem.md) to understand how OpenClaw, OpenShell, and NemoClaw relate in the wider stack, and when to use NemoClaw versus OpenShell.
- Quickstart (use the `nemoclaw-user-get-started` skill) to install NemoClaw and run your first sandboxed agent.
- Agent Skills (use the `nemoclaw-user-agent-skills` skill) to load NemoClaw guidance into an AI coding assistant.
- Inference Options (use the `nemoclaw-user-configure-inference` skill) to check the inference providers that NemoClaw supports and how inference routing works.
Loading
Loading