Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions QUICKSTART.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ If your NVIDIA key changes later, run `npm run key`.

## Advanced: NVIDIA NIM (Recommended for Quality)

NVIDIA hosted models like `nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16`, `z-ai/glm4.7`, and `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16`.
NVIDIA hosted models like `nvidia/nemotron-3-super-120b-a12b`, `z-ai/glm4.7`, and `nvidia/nemotron-3-nano-30b-a3b`.
From the cloned repo root:

```sh
Expand Down Expand Up @@ -130,9 +130,9 @@ claudia-claude --model local-model
| `npm run release:check` | Release gate: typecheck + tests + build + package smoke |
| `npm run config` | Re-run the configuration wizard |
| `claudia-claude` | Launch Claude Code connected to the router |
| `npm run claude:fast` | Default long-context model (nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16) |
| `npm run claude:fast` | Default long-context model (nvidia/nemotron-3-super-120b-a12b) |
| `npm run claude:glm` | High-quality thinking model, slower on purpose (z-ai/glm4.7) |
| `npm run claude:qwen` | Backup coding model, less consistent on complex code (nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16) |
| `npm run claude:qwen` | Backup coding model, less consistent on complex code (nvidia/nemotron-3-nano-30b-a3b) |
| `npm run claude:smoke` | Quick smoke test only (nemotron-mini-4b) |

---
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ npm run claude:fast -- --managed-auth

If you see a managed-login warning, remove `--managed-auth`. Claude managed credentials are sent only to the local router; your NVIDIA key is sent to NVIDIA by the router.

The fast script and default wrapper route `claude-3-5-sonnet-latest` to NVIDIA `nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16`. Use `npm run claude:glm` for the slower thinking-heavy GLM quality profile, `npm run claude:qwen` for the Nano fallback, or `npm run claude:smoke` to test routing with the smallest configured model.
The fast script and default wrapper route `claude-3-5-sonnet-latest` to NVIDIA `nvidia/nemotron-3-super-120b-a12b`. Use `npm run claude:glm` for the slower thinking-heavy GLM quality profile, `npm run claude:qwen` for the Nano fallback, or `npm run claude:smoke` to test routing with the smallest configured model.

Model tradeoffs:

Expand Down Expand Up @@ -176,7 +176,7 @@ LOG_LEVEL=info

2. Keep `defaultBackend` set to `nvidia` in `config.json`.

3. Use a mapped Claude-style model alias such as `claude-3-5-sonnet-latest`, or send any model name and Claudia Router will use the NVIDIA backend default model (`nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16`).
3. Use a mapped Claude-style model alias such as `claude-3-5-sonnet-latest`, or send any model name and Claudia Router will use the NVIDIA backend default model (`nvidia/nemotron-3-super-120b-a12b`).

If you want to switch providers later, use `npm run init -- --provider openrouter` or `npm run init -- --provider local`. Use `npm run config` if you prefer the interactive provider picker.

Expand Down
22 changes: 18 additions & 4 deletions config.example.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"nvidia": {
"baseUrl": "https://integrate.api.nvidia.com/v1",
"apiKeyEnv": "NVIDIA_API_KEY",
"defaultModel": "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
"defaultModel": "nvidia/nemotron-3-super-120b-a12b"
},
"openrouter": {
"baseUrl": "https://openrouter.ai/api/v1",
Expand All @@ -21,9 +21,17 @@
"modelProfiles": {
"claude-3-5-sonnet-latest": {
"backend": "nvidia",
"providerModel": "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16",
"providerModel": "nvidia/nemotron-3-super-120b-a12b",
"retryAttempts": 3,
"retryBaseDelayMs": 500,
"extraBody": {
"chat_template_kwargs": {
"enable_thinking": false,
"force_nonempty_content": true
},
"temperature": 1,
"top_p": 0.95
},
"notes": "Default long-context NVIDIA coding profile; stronger context window, slightly slower than smaller models",
"capabilities": {
"toolCalls": true,
Expand Down Expand Up @@ -66,9 +74,15 @@
},
"claude-3-5-sonnet-qwen": {
"backend": "nvidia",
"providerModel": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
"providerModel": "nvidia/nemotron-3-nano-30b-a3b",
"retryAttempts": 3,
"retryBaseDelayMs": 500,
"extraBody": {
"chat_template_kwargs": {
"enable_thinking": false,
"force_nonempty_content": true
}
},
"notes": "Nano fallback NVIDIA coding profile; useful as a backup, but lighter than the default",
"capabilities": {
"toolCalls": true,
Expand All @@ -90,7 +104,7 @@
"modelMap": {
"legacy-claude-3-5-sonnet-latest": {
"backend": "nvidia",
"model": "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
"model": "nvidia/nemotron-3-super-120b-a12b"
}
}
}
4 changes: 2 additions & 2 deletions scripts/claudia-claude.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ Claudia Router Model Profiles

Shortcuts (use with --model or in npm scripts):

--model fast Default: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 (NVIDIA) — best long-context option, a bit slower
--model fast Default: nvidia/nemotron-3-super-120b-a12b (NVIDIA) — best long-context option, a bit slower
--model glm Thinking-heavy: z-ai/glm4.7 (NVIDIA) — slower, but better on hard tasks
--model qwen Fallback: nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (NVIDIA) — useful fallback, less consistent
--model qwen Fallback: nvidia/nemotron-3-nano-30b-a3b (NVIDIA) — useful fallback, less consistent
--model smoke Lightweight: nvidia/nemotron-mini-4b-instruct (NVIDIA) — for quick checks only

Built-in npm scripts:
Expand Down
18 changes: 17 additions & 1 deletion scripts/presets.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,16 @@ export function buildProfileModelProfiles(providerKey, provider) {
providerModel: provider.defaultModel,
retryAttempts: 3,
retryBaseDelayMs: 500,
extraBody: providerKey === "nvidia"
? {
chat_template_kwargs: {
enable_thinking: false,
force_nonempty_content: true
},
temperature: 1.0,
top_p: 0.95
}
: undefined,
notes: PROFILE_PRESETS.fast.notes
},
[PROFILE_PRESETS.smoke.model]: {
Expand Down Expand Up @@ -111,9 +121,15 @@ export function buildProfileModelProfiles(providerKey, provider) {

modelProfiles[PROFILE_PRESETS.qwen.model] = {
backend: providerKey,
providerModel: "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
providerModel: "nvidia/nemotron-3-nano-30b-a3b",
retryAttempts: 3,
retryBaseDelayMs: 500,
extraBody: {
chat_template_kwargs: {
enable_thinking: false,
force_nonempty_content: true
}
},
notes: PROFILE_PRESETS.qwen.notes
};
}
Expand Down
2 changes: 1 addition & 1 deletion scripts/providers.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ export const PROVIDERS = {
name: "NVIDIA NIM",
baseUrl: "https://integrate.api.nvidia.com/v1",
apiKeyEnv: "NVIDIA_API_KEY",
defaultModel: "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16",
defaultModel: "nvidia/nemotron-3-super-120b-a12b",
smokeModel: "nvidia/nemotron-mini-4b-instruct",
requiresKey: true,
description: "Long-context and coding-capable models hosted by NVIDIA"
Expand Down
6 changes: 3 additions & 3 deletions scripts/release-smoke.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -145,17 +145,17 @@ function main() {
const nvidiaConfig = readJson(configPath);
assert(nvidiaConfig.defaultBackend === "nvidia", `Expected defaultBackend=nvidia, got ${nvidiaConfig.defaultBackend}`);
assert(
nvidiaConfig.backends?.nvidia?.defaultModel === "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16",
nvidiaConfig.backends?.nvidia?.defaultModel === "nvidia/nemotron-3-super-120b-a12b",
`Expected NVIDIA defaultModel to use the Nemotron Super model, got ${nvidiaConfig.backends?.nvidia?.defaultModel}`
);
assert(
nvidiaConfig.modelProfiles?.["claude-3-5-sonnet-latest"]?.providerModel ===
"nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16",
"nvidia/nemotron-3-super-120b-a12b",
"Expected fast profile to use the Nemotron Super model"
);
assert(
nvidiaConfig.modelProfiles?.["claude-3-5-sonnet-qwen"]?.providerModel ===
"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
"nvidia/nemotron-3-nano-30b-a3b",
"Expected qwen fallback profile to use the Nemotron Nano model"
);
const nvidiaEnvFile = fs.readFileSync(envPath, "utf8");
Expand Down
4 changes: 2 additions & 2 deletions tests/claudia-config.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,11 +73,11 @@ test("configuration wizard awaits remote connectivity before completion", async
const config = JSON.parse(fs.readFileSync(path.join(cwd, "config.json"), "utf8"));
assert.equal(
config.modelProfiles["claude-3-5-sonnet-latest"]?.providerModel,
"nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
"nvidia/nemotron-3-super-120b-a12b"
);
assert.equal(config.modelProfiles["claude-3-5-sonnet-glm"]?.providerModel, "z-ai/glm4.7");
assert.equal(
config.modelProfiles["claude-3-5-sonnet-qwen"]?.providerModel,
"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"
"nvidia/nemotron-3-nano-30b-a3b"
);
});
6 changes: 3 additions & 3 deletions tests/profile.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,22 +23,22 @@ function writeNvidiaConfig(cwd: string): void {
nvidia: {
baseUrl: "https://integrate.api.nvidia.com/v1",
apiKeyEnv: "NVIDIA_API_KEY",
defaultModel: "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
defaultModel: "nvidia/nemotron-3-super-120b-a12b"
}
},
modelMap: {},
modelProfiles: {
"claude-3-5-sonnet-latest": {
backend: "nvidia",
providerModel: "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
providerModel: "nvidia/nemotron-3-super-120b-a12b"
},
"claude-3-5-sonnet-glm": {
backend: "nvidia",
providerModel: "z-ai/glm4.7"
},
"claude-3-5-sonnet-qwen": {
backend: "nvidia",
providerModel: "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"
providerModel: "nvidia/nemotron-3-nano-30b-a3b"
},
"claude-3-haiku-latest": {
backend: "nvidia",
Expand Down
4 changes: 2 additions & 2 deletions tests/setup.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,11 @@ test("creates setup files, prompts for a missing key, and runs the NVIDIA smoke
};
assert.equal(
generatedConfig.modelProfiles["claude-3-5-sonnet-latest"]?.providerModel,
"nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
"nvidia/nemotron-3-super-120b-a12b"
);
assert.equal(
generatedConfig.modelProfiles["claude-3-5-sonnet-qwen"]?.providerModel,
"nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16"
"nvidia/nemotron-3-nano-30b-a3b"
);
assert.doesNotMatch(result.output, /secret-test-key/);
assert.match(result.output, /Configuration complete!/);
Expand Down
2 changes: 1 addition & 1 deletion tests/status.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function createStatusDirectory(env = "NVIDIA_API_KEY=test-key\nCLAUDIA_CLAUDE_MO
nvidia: {
baseUrl: "https://example.invalid/v1",
apiKeyEnv: "NVIDIA_API_KEY",
defaultModel: "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16"
defaultModel: "nvidia/nemotron-3-super-120b-a12b"
}
},
modelMap: {},
Expand Down
Loading