fix(state): preserve reporter-owned model metadata on openclaw.json rebuild restore#5205
fix(state): preserve reporter-owned model metadata on openclaw.json rebuild restore#5205yimoj wants to merge 2 commits into
Conversation
…ebuild restore OpenClaw's openclaw.json is backed up and selectively merged on rebuild (NVIDIA#5027), but matching `models.providers` entries were replaced wholesale by the freshly generated provider block. After a version upgrade + rebuild the user's tuned model metadata reset to regenerated defaults: `reasoning` flipped to false, `cost` zeroed, `maxTokens` fell back to 4096, and `compat` disappeared. Reconcile matching provider entries by ownership instead: the fresh rebuild keeps runtime-owned routing/credential fields (baseUrl, api, apiKey) and a model's routing identity (id/name), while backed-up non-secret model tuning (reasoning, cost, contextWindow, maxTokens, compat, input, …) is restored. Provider/model sets and order stay authoritative from the fresh rebuild; backup-only and id-less stale models are not resurrected. No raw secrets are restored — only the already-sanitized backup is merged. Adds a reporter-shaped regression at the merge-helper level and through the real backupSandboxState/restoreSandboxState path (the functions `nemoclaw <name> rebuild` invokes), covering provider model metadata and top-level `mcp.servers`. The regression confirms `mcp.servers` is already preserved by the generic object merge (the fresh config emits no mcp section), so no restore/sanitization change was needed for it. Fixes NVIDIA#5202 Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRefines OpenClaw config restore: updates ownership contract, adds provider/model reconciliation that enforces fresh runtime-owned fields while restoring backup-held model tuning and durable MCP server settings, exports/adjusts the state restore command for safe ChangesOpenClaw Config Restore Ownership and Merge
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…ld restore Real-CLI E2E (onboard -> tune in-sandbox config -> `nemoclaw <name> rebuild`) revealed a second layer of NVIDIA#5202 that the merge fix alone did not cover: even when the restore writes the correctly merged config, OpenClaw's own config-integrity protection reverts it. OpenClaw guards openclaw.json with a `.last-good` recovery anchor. On its integrity check it archives any live config that differs from `.last-good` as `openclaw.json.clobbered.<ts>` and reverts to `.last-good`. The rebuild restore wrote openclaw.json (and `.config-hash`) but never refreshed `.last-good`, so OpenClaw reverted the restored user config to the freshly generated baseline captured at first boot. The archived `.clobbered` file held the correct merged config, confirming the merge worked but was undone downstream. `buildStateFileRestoreCommand` now refreshes the `.last-good` anchor for the openclaw.json merge restore, staged through a temp + atomic rename and failing closed BEFORE the live config is swapped, so OpenClaw's integrity watcher never observes a config that disagrees with its recovery anchor and a partial/failed anchor write can never leave a stale revert target. Verified through the real worktree CLI on a live OpenClaw sandbox: after `node ./bin/nemoclaw.js <name> rebuild`, /sandbox/.openclaw/openclaw.json keeps the reporter's reasoning/cost/maxTokens/compat/input and mcp.servers with no `.clobbered` revert, and survives a subsequent container restart. Adds a command-string unit test (anchor refresh + ordering + fail-closed) and extends the backup/restore integration test to assert the `.last-good` anchor. Fixes NVIDIA#5202 Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
Summary
After a NemoClaw upgrade +
nemoclaw <name> rebuild, an OpenClaw sandbox's reporter-tunedopenclaw.jsonmodel metadata reset to regenerated defaults. Fixing this required two layers: (1) the rebuild restore merge replaced matchingmodels.providersentries wholesale, and (2) even after a correct merge, OpenClaw's own config-integrity protection reverted the restored config to its.last-goodbaseline. This PR fixes both, verified end-to-end through the realnemoclawCLI on a live sandbox.Related Issue
Fixes #5202
Changes
src/lib/state/openclaw-config-merge.ts: reconcile matchingmodels.providersentries by ownership instead of replacing them wholesale. The fresh rebuild keeps runtime-owned routing/credentials (baseUrl,api,apiKey) and a model's identity (id,name); the backup restores user-owned non-secret tuning (reasoning,cost,contextWindow,maxTokens,compat,input). Fresh rebuild stays authoritative for the provider/model set and order; backup-only/id-less stale models are not resurrected; backup-only providers are inherited. No raw secrets restored.src/lib/state/sandbox.ts:buildStateFileRestoreCommandnow refreshes OpenClaw's.last-goodrecovery anchor for theopenclaw.jsonmerge restore — staged through a temp + atomic rename and failing closed before the live config is swapped. Without this, OpenClaw archives the restored config asopenclaw.json.clobbered.<ts>and reverts to the freshly generated baseline.openclaw-config-merge.test.ts), a command-string unit test for the anchor refresh + ordering + fail-closed (state-file-restore-command.test.ts), and an extended backup/restore integration test through the realbackupSandboxState()/restoreSandboxState()path asserting both the merged metadata,mcp.serversretention, and the.last-goodanchor (openclaw-config-snapshot.test.ts).Notes on
mcp.serversThe reporter also saw
mcp.serversbecome{}. The fresh generated config emits nomcpsection, so the backed-upmcpsurvives the generic object merge — confirmed live (fresh in-sandbox config showedmcp: null) and locked in by the tests. The merge layer needed no change for it; the.last-goodrevert was what dropped it end-to-end, which layer 2 fixes.Type of Change
Verification
node ./bin/nemoclaw.js onboard --non-interactive --yes --no-gpu --name i5202e2e— created a real OpenClaw sandbox (cloud inference, no GPU).node ./bin/nemoclaw.js i5202e2e exec -- ...— tuned the in-sandbox/sandbox/.openclaw/openclaw.jsonto the reporter state (reasoning:true, customcost,maxTokens:32768,compat,mcp.servers.filesystem).node ./bin/nemoclaw.js i5202e2e rebuild --yes— the user-facing operation under test (backup → recreate → restore-merge →.last-goodrefresh)./sandbox/.openclaw/openclaw.json: keptreasoning:true,maxTokens:32768,cost {0.5,1.5,0.1,0.2},compat,input ["text","image"],mcp.servers.filesystem; runtime fields fresh (baseUrl https://inference.local/v1,apiKey unused, freshid/name); zeroopenclaw.json.clobbered.*files. Survived a subsequent container restart (OpenClaw integrity check re-ran). Reproduced cleanly across two consecutive fixed rebuilds.reasoning:false/maxTokens:4096/nocompat/nomcp, with the correctly-merged config archived asopenclaw.json.clobbered.*— proving the merge worked but OpenClaw reverted it.npx vitest runon the affected unit + integration tests — passing.tsc -p tsconfig.cli.jsontypecheck passes.npx @biomejs/biome checkclean on changed files.codex review --uncommittedclean.Signed-off-by: Yimo Jiang yimoj@nvidia.com
Summary by CodeRabbit
Bug Fixes
Tests