Skip to content

refactor(onboard): extract provider host state#5171

Merged
cv merged 7 commits into
mainfrom
codex/onboard-provider-host-state-flow
Jun 11, 2026
Merged

refactor(onboard): extract provider host state#5171
cv merged 7 commits into
mainfrom
codex/onboard-provider-host-state-flow

Conversation

@cv

@cv cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Extracts provider host-state detection from setupNim into a typed helper so the provider menu and dispatch branches consume one coherent local inference snapshot. This keeps the provider-selection behavior unchanged while moving Ollama, Windows-host Ollama, vLLM, and install-menu probes out of the large onboard entrypoint.

Related Issue

Refs #3802

Changes

  • Added src/lib/onboard/provider-host-state.ts to collect local Ollama, Windows-host Ollama, vLLM, Docker runtime, install-menu, and GPU capability state.
  • Rewired setupNim to consume the new host-state object before building the provider menu.
  • Added hermetic tests for local provider detection, WSL plus Windows-host Ollama reachability, mirrored-network duplicate-daemon suppression, and avoiding extra Windows-host probes when Ollama already resolves there.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Local verification run:

  • npx @biomejs/biome format --write src/lib/onboard.ts src/lib/onboard/provider-host-state.ts src/lib/onboard/provider-host-state.test.ts
  • npx @biomejs/biome lint src/lib/onboard.ts src/lib/onboard/provider-host-state.ts src/lib/onboard/provider-host-state.test.ts
  • npm run build:cli
  • npm run typecheck:cli
  • npx vitest run src/lib/onboard/provider-host-state.test.ts src/lib/onboard/provider-menu.test.ts src/lib/onboard/vllm-menu.test.ts src/lib/onboard/ollama-install-menu.test.ts test/onboard.test.ts
  • git diff --check

Known local environment note: full npx prek run --all-files and full npm test remain locally blocked by the unrelated test/release-latest-tag.test.ts fixture commit signing issue on this workstation. This PR does not touch that file.


Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Refactor

    • Consolidated inference-provider detection during onboarding for more reliable vLLM and Ollama detection across Windows, WSL, and containers; better handling of duplicate/mirrored daemon scenarios and host reachability.
  • Tests

    • Added comprehensive tests covering Windows/WSL/host reachability, duplicate-daemon suppression, vLLM detection, and related onboarding edge cases.

@cv cv added refactor PR restructures code without intended behavior change area: cli Command line interface, flags, terminal UX, or output area: architecture Architecture, design debt, major refactors, or maintainability labels Jun 10, 2026
@cv cv self-assigned this Jun 10, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f434061a-bceb-415f-b8ff-a4ff2f716548

📥 Commits

Reviewing files that changed from the base of the PR and between b4c0e8d and 5f57675.

📒 Files selected for processing (2)
  • src/lib/onboard/provider-host-state.test.ts
  • src/lib/onboard/provider-host-state.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard/provider-host-state.ts

📝 Walkthrough

Walkthrough

A new module centralizes inference provider host-state detection (Ollama, vLLM, Windows-host reachability, WSL context) with injectable dependencies. The onboarding setupNim() flow is refactored to call detectInferenceProviderHostState and consume its consolidated state for provider menu construction and recorded-provider checks.

Changes

Inference Provider Host State Detection

Layer / File(s) Summary
Host-state interfaces and dependency wiring
src/lib/onboard/provider-host-state.ts
Introduce exported types for InferenceProviderHostState, DetectInferenceProviderHostStateInput, DetectInferenceProviderHostStateDeps, and InferenceProviderHostGpu; add dependency-builder utilities and default probe adapters.
Host probing helpers (vLLM, Windows-host reachability, duplicate warning)
src/lib/onboard/provider-host-state.ts
Implement vLLM /v1/models probing, Windows-host Ollama reachability checks from WSL via host.docker.internal, and conditional duplicate-daemon warning logic when Ollama runs on both WSL and Windows host.
Main detectInferenceProviderHostState orchestration
src/lib/onboard/provider-host-state.ts
Orchestrate WSL/context detection, Ollama installation/running/host-type checks, vLLM profile/image checks, Windows-host Docker requirement probing, derive ollamaInstallMenu and vllmEntries, and return a consolidated host-state object.
Vitest suite for host-state detection
src/lib/onboard/provider-host-state.test.ts
Add tests for local Ollama+vLLM detection, WSL+Windows-host coexistence and duplicate-daemon warnings, platform/env injection for WSL detection, mirrored-network suppression, and avoiding Windows-host probe when Ollama already resolves to Windows.
onboard.ts integration and setupNim() refactor
src/lib/onboard.ts
Wire in detectInferenceProviderHostState, remove obsolete inline helpers/imports (buildVllmMenuEntries, detectWindowsHostOllama, hostCommandExists, etc.), refactor setupNim() to consume the consolidated host-state (including vllmEntries, hasOllama, ollamaHost, isWslHost, and other flags).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5165: Builds the provider menu from Ollama/vLLM/Windows host-state results and complements this PR's host-state detection refactor.

Suggested reviewers

  • prekshivyas

Poem

🐰 A rabbit hops through the onboard code,
Pulling probes together on a tidy road,
Ollama and vLLM now speak as one,
WSL and Windows checks neatly done,
Hooray — the onboarding race is slowed no more!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor(onboard): extract provider host state' accurately and concisely summarizes the main change—extracting provider host-state detection logic into a centralized module—which aligns with the primary objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/onboard-provider-host-state-flow

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: cloud-inference-e2e, gpu-e2e
Optional E2E: wsl-e2e, gpu-double-onboard-e2e, gpu-repo-local-ollama-openclaw

Dispatch hint: cloud-inference-e2e,gpu-e2e

Auto-dispatched E2E: cloud-inference-e2e, gpu-e2e via nightly-e2e.yaml at 5f576757e49833e5aa624f6b2e678e6550a785b0nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • cloud-inference-e2e (medium; live NVIDIA API key and Docker sandbox, about 30 minutes timeout): Runs install.sh, performs cloud OpenClaw onboarding, creates a sandbox, and verifies live inference through inference.local. This catches regressions where the setupNim/provider-host-state refactor breaks the default cloud provider path or produces a bad route.
  • gpu-e2e (high; requires enabled GPU E2E runner and downloads/pulls local Ollama model): Directly exercises the local Ollama onboarding path affected by this PR: install Ollama, run NemoClaw with NEMOCLAW_PROVIDER=ollama, let onboard detect/start Ollama and the auth proxy, create the sandbox, and verify local inference. This is the highest-signal existing E2E for the changed provider-host-state logic.

Optional E2E

  • wsl-e2e (high; windows-latest WSL setup, up to 90 minutes, may skip full sandbox if Docker is unavailable): Useful platform confidence because the new helper moves WSL and Windows-host Ollama probing/warning logic. The existing WSL job runs a cloud full E2E in WSL and will at least execute the WSL-sensitive detection path, though it does not specifically validate Windows-host Ollama selection.
  • gpu-double-onboard-e2e (high; requires enabled GPU E2E runner and separate local Ollama onboarding run): Optional adjacent regression check for repeated Ollama onboarding and proxy-token consistency. The PR does not modify proxy persistence directly, but changed provider host detection can alter re-onboard provider availability decisions.
  • gpu-repo-local-ollama-openclaw (high; self-hosted GPU/Docker CDI scenario): Optional typed scenario equivalent for local Ollama OpenClaw onboarding with smoke, local-ollama-inference, and ollama-proxy suites. Run if scenario artifacts/state-validation are desired in addition to the script-based GPU E2E.

New E2E recommendations

  • wsl-windows-host-ollama (high): Existing WSL E2E does not install/start a Windows-host Ollama daemon or assert the start-windows-ollama/install-windows-ollama menu and switch behavior that this helper now centralizes.
    • Suggested test: Add a WSL live E2E scenario that provisions or mocks a reachable Windows-host Ollama on host.docker.internal, runs onboarding with the Windows-host Ollama provider choice, and verifies sandbox inference plus duplicate-daemon warning behavior.
  • vllm-local-provider-selection (medium): The PR moves vLLM running/profile/image detection and vLLM menu entry construction, but no existing live E2E was found that selects standalone vLLM and verifies inference through the created sandbox.
    • Suggested test: Add a local vLLM onboarding E2E that preconditions a supported vLLM profile or cached image, selects NEMOCLAW_PROVIDER=vllm, and verifies OpenShell route and inference.local chat from the sandbox.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: cloud-inference-e2e,gpu-e2e

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: ubuntu-repo-cloud-openclaw
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • ubuntu-repo-cloud-openclaw: Onboarding provider-host detection was refactored into src/lib/onboard/provider-host-state.ts and wired through src/lib/onboard.ts. The live-supported Ubuntu cloud OpenClaw scenario exercises the standard provider selection/onboarding path that now consumes this host-state snapshot before completing smoke, inference, and credential validation.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • src/lib/onboard.ts
  • src/lib/onboard/provider-host-state.ts

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 0 new items found

Consider writing more tests for
  • **Runtime validation** — Add a wrapper-level unit test that detectInferenceProviderHostState returns ollamaInstallMenu.hasUpgradableOllama === true and an install-ollama entry when the installed Ollama binary version is below MIN_OLLAMA_VERSION.. The unit tests and existing helper/menu tests provide good deterministic coverage, but the changed path depends on live host infrastructure such as Docker, WSL networking, curl, Ollama daemon/binary versions, vLLM endpoints, and full onboard provider-menu rendering.
  • **Runtime validation** — Add a wrapper-level unit test that detectInferenceProviderHostState returns an upgrade ollamaInstallMenu when the reachable local Ollama daemon version is stale while the installed binary is current.. The unit tests and existing helper/menu tests provide good deterministic coverage, but the changed path depends on live host infrastructure such as Docker, WSL networking, curl, Ollama daemon/binary versions, vLLM endpoints, and full onboard provider-menu rendering.
  • **Runtime validation** — Runtime-validate nemoclaw onboard provider-menu rendering on native Linux when neither Ollama nor vLLM is present, verifying the extracted snapshot still exposes the expected remote providers and Ollama install fallback.. The unit tests and existing helper/menu tests provide good deterministic coverage, but the changed path depends on live host infrastructure such as Docker, WSL networking, curl, Ollama daemon/binary versions, vLLM endpoints, and full onboard provider-menu rendering.
  • **Runtime validation** — Runtime-validate nemoclaw onboard provider-menu rendering on native Linux with local Ollama running on 127.0.0.1:11434, verifying the menu offers Local Ollama and does not show an install or upgrade entry when versions are current.. The unit tests and existing helper/menu tests provide good deterministic coverage, but the changed path depends on live host infrastructure such as Docker, WSL networking, curl, Ollama daemon/binary versions, vLLM endpoints, and full onboard provider-menu rendering.
  • **Runtime validation** — Runtime-validate nemoclaw onboard provider-menu rendering when vLLM is running on localhost:8000, verifying the menu selects the running vLLM path and the model-query branch still works.. The unit tests and existing helper/menu tests provide good deterministic coverage, but the changed path depends on live host infrastructure such as Docker, WSL networking, curl, Ollama daemon/binary versions, vLLM endpoints, and full onboard provider-menu rendering.
  • **Acceptance clause:** Refs Umbrella: refactor onboarding into a serializable FSM #3802 — add test evidence or identify existing coverage. The PR body references Umbrella: refactor onboarding into a serializable FSM #3802, but deterministic linked-issue context reported linkedIssues as empty and did not provide issue Umbrella: refactor onboarding into a serializable FSM #3802 body or comments, so no literal acceptance clauses were available to map. The changed files add provider host-state extraction and focused tests for local provider detection, WSL/Windows-host Ollama reachability, mirrored-network duplicate-daemon suppression, and Windows-host probe gating.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27302116386
Target ref: d1e02e22ef437ecb5f1169bc585abceeba8a9687
Workflow ref: main
Requested jobs: cloud-onboard-e2e,gpu-e2e
Summary: 1 passed, 0 failed, 1 skipped

Job Result
cloud-onboard-e2e ✅ success
gpu-e2e ⏭️ skipped

@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Validation update for the draft host-state slice at d1e02e22ef437ecb5f1169bc585abceeba8a9687:

  • Standard PR CI is green, including static checks, build/typecheck, CLI shards, plugin tests, installer integration, CodeQL, ShellCheck, macOS, DCO, and commit lint.
  • PR Review Advisor is clean: 0 needs attention, 0 worth checking, 0 nice ideas.
  • CodeRabbit skipped review because the PR is draft; its status context is successful.
  • Local validation passed: Biome format/lint on touched files, npm run build:cli, npm run typecheck:cli, targeted provider-host-state/provider-menu/vLLM/Ollama/onboard Vitest coverage, and git diff --check.
  • Live validation:

This leaves the PR in good draft-stack shape. The only missing live confidence is a real GPU local-provider run, which looks blocked by infrastructure rather than by this branch.

@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Next draft stack slice opened: #5173 (refactor(onboard): extract provider selection resolver).

This branch is based on #5171 and keeps shrinking setupNim by moving non-interactive/resume provider-key resolution into src/lib/onboard/provider-selection.ts. The resolver returns structured selected/failure outcomes; setupNim still renders the existing user-facing errors for recorded-provider drift, WSL Ollama vs Windows-host Ollama, Hermes Provider gating, and unavailable requested providers.

Local validation before opening #5173:

  • npm run build:cli
  • npm run typecheck:cli
  • npx @biomejs/biome lint src/lib/onboard.ts src/lib/onboard/provider-selection.ts src/lib/onboard/provider-selection.test.ts
  • npx vitest run src/lib/onboard/provider-selection.test.ts src/lib/onboard/provider-key-fallback.test.ts src/lib/onboard/provider-menu.test.ts src/lib/onboard/provider-host-state.test.ts test/onboard.test.ts
  • git diff --check

Full npx prek run --all-files / npm test remain unchecked here because of the unrelated release-tag signing fixture issue we have been avoiding rather than editing in this stack.

cv added 2 commits June 10, 2026 23:48
…nu-merge-update

# Conflicts:
#	test/e2e/test-onboard-inference-smoke.sh
…' into codex/tmp-provider-host-state-merge-update
Base automatically changed from codex/onboard-provider-menu-flow to main June 11, 2026 07:40
@cv cv added the area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow label Jun 11, 2026
@cv cv marked this pull request as ready for review June 11, 2026 07:55

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lib/onboard/provider-host-state.ts (1)

121-121: ⚡ Quick win

Avoid host literal drift; reuse OLLAMA_HOST_DOCKER_INTERNAL

Line 121 hardcodes host.docker.internal even though this module already imports OLLAMA_HOST_DOCKER_INTERNAL. Reusing the constant keeps probe logic aligned with the host-resolution contract.

Proposed fix
-      `http://host.docker.internal:${OLLAMA_PORT}/api/tags`,
+      `http://${OLLAMA_HOST_DOCKER_INTERNAL}:${OLLAMA_PORT}/api/tags`,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/provider-host-state.ts` at line 121, Replace the hardcoded
host literal in the tags probe URL with the existing OLLAMA_HOST_DOCKER_INTERNAL
constant: change the template literal
`http://host.docker.internal:${OLLAMA_PORT}/api/tags` to use
`OLLAMA_HOST_DOCKER_INTERNAL` (keeping the existing OLLAMA_PORT and path
`/api/tags`) so the probe uses the shared host-resolution constant; locate this
change in the code that constructs the probe URL in provider-host-state (look
for the string/variable that builds the tags endpoint).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/provider-host-state.ts`:
- Around line 65-66: The code calls deps.isWsl() without passing the injected
runtime overrides, so input.platform and input.env are ignored; update the calls
to deps.isWsl() (the calls around the resolve/host-state logic at ~lines
147-153) to pass the overrides: deps.isWsl(input.platform, input.env) (or
equivalent parameter names) so the injected input.platform and input.env are
used when resolving WSL state; ensure every place that currently calls
deps.isWsl() in this module is updated to pass those two arguments.

---

Nitpick comments:
In `@src/lib/onboard/provider-host-state.ts`:
- Line 121: Replace the hardcoded host literal in the tags probe URL with the
existing OLLAMA_HOST_DOCKER_INTERNAL constant: change the template literal
`http://host.docker.internal:${OLLAMA_PORT}/api/tags` to use
`OLLAMA_HOST_DOCKER_INTERNAL` (keeping the existing OLLAMA_PORT and path
`/api/tags`) so the probe uses the shared host-resolution constant; locate this
change in the code that constructs the probe URL in provider-host-state (look
for the string/variable that builds the tags endpoint).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: dc6ef38d-d554-41e7-958d-f69dc6664355

📥 Commits

Reviewing files that changed from the base of the PR and between 9a7bcd5 and b4c0e8d.

📒 Files selected for processing (3)
  • src/lib/onboard.ts
  • src/lib/onboard/provider-host-state.test.ts
  • src/lib/onboard/provider-host-state.ts

Comment thread src/lib/onboard/provider-host-state.ts Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27332576377
Target ref: b4c0e8da380cd67cc7e8150435894432aae2573a
Workflow ref: main
Requested jobs: cloud-onboard-e2e,gpu-e2e
Summary: 1 passed, 0 failed, 1 skipped

Job Result
cloud-onboard-e2e ✅ success
gpu-e2e ⏭️ skipped

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27333190216
Target ref: 5f576757e49833e5aa624f6b2e678e6550a785b0
Workflow ref: main
Requested jobs: cloud-inference-e2e,gpu-e2e
Summary: 1 passed, 0 failed, 1 skipped

Job Result
cloud-inference-e2e ✅ success
gpu-e2e ⏭️ skipped

@cv cv merged commit 48bd12a into main Jun 11, 2026
34 checks passed
@cv cv deleted the codex/onboard-provider-host-state-flow branch June 11, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: architecture Architecture, design debt, major refactors, or maintainability area: cli Command line interface, flags, terminal UX, or output area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow refactor PR restructures code without intended behavior change v0.0.64 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants