Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 33 additions & 49 deletions docs/customizer/about.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ Below are some examples of how you might format your dataset to perform a handfu

When testing models trained with prompt/completion datasets, use the `/v1/completions` endpoint instead of `/v1/chat/completions`.

For details, refer to the [Dataset Formatting tutorial](/documentation/fine-tune-models/tutorials/format-training-dataset#format-a-prompt-completion-dataset).
For details, refer to the [Dataset Formatting tutorial](/documentation/customizer-reference/tutorials/format-training-dataset#format-a-prompt-completion-dataset).

</Note>
#### Document Classification
Expand Down Expand Up @@ -197,31 +197,37 @@ completion: "<simple>"

Most of the models support Instruction Templates for training, the expected dataset conforms with the standard [OpenAI messages format](https://platform.openai.com/docs/guides/fine-tuning#multi-turn-chat-examples). Additionally, some models support tool calling which have additional optional parameters of `tools` at the top level of each entry and `tool_calls` per message.

For more information refer to our [in-depth instructions](/documentation/fine-tune-models/tutorials/format-training-dataset#format-a-conversation-dataset).
For more information refer to our [in-depth instructions](/documentation/customizer-reference/tutorials/format-training-dataset#format-a-conversation-dataset).

## Hyperparameters

Hyperparameters are configuration settings used to control the training process. You'll set these values before training begins to optimize how the model learns from your data. While the model automatically learns its internal parameters during training, these hyperparameters help guide that learning process. The right values depend on your specific use case, dataset size, and computational resources.

| Hyperparameter | Description | Default |
|----------------|-------------|---------|
| `epochs` | Number of complete passes through the training dataset | Model-dependent |
| `batch_size` | Number of samples processed before updating model weights | Model-dependent |
| `learning_rate` | Step size for weight updates during training | Model-dependent |
| `training.type` | Training type: `"sft"` for supervised fine-tuning | `"sft"` |
| `training.peft.type` | PEFT method: `"lora"` for Low-Rank Adaptation | — |
| `training.peft.rank` | LoRA rank (lower = fewer parameters, higher = more expressive) | 8 |
| `training.peft.alpha` | LoRA scaling factor | 32 |
Common hyperparameters you'll tune include:

| Hyperparameter | Description |
|----------------|-------------|
| Epochs | Number of complete passes through the training dataset |
| Batch size | Number of samples processed before updating model weights |
| Learning rate | Step size for weight updates during training |
| LoRA rank | Low-rank dimension of the adapter (lower = fewer parameters, higher = more expressive) |
| LoRA alpha | LoRA scaling factor |

<Note>

NeMo Customizer offers **two training backends** — Automodel (multi-GPU) and Unsloth (single-GPU, quantized) — and each accepts its own job configuration. The exact field names, defaults, and available knobs differ between them. For the full per-backend hyperparameter reference, see [Training Configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).

</Note>

## Parallelism

NeMo Platform Customizer supports various distributed training parallelization methods, which can be mixed together.
The Automodel backend supports several distributed training parallelization methods, which can be mixed together. (The Unsloth backend runs on a single GPU and does not use these settings.)

### Tensor Parallelism

[Tensor Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#tensor-parallelism) (TP) distributes the parameter tensor of an individual layer across GPUs. In addition to reducing model state memory usage, it also saves activation memory as the per-GPU tensor sizes shrink. The tradeoff is increased CPU overhead.

TP can be configured via `parallelism.tensor_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-jobs/training-configuration).
TP can be configured via `parallelism.tensor_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).

<Note>

Expand All @@ -232,7 +238,7 @@ As of release 25.10.0, AutoModel engines including Phi-4, Qwen, and Gemma suppor

[Pipeline Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#pipeline-parallelism) (PP) distributes the layers of a neural network across GPUs. The GPUs then process the different layers sequentially.

PP can be configured via `parallelism.pipeline_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-jobs/training-configuration).
PP can be configured via `parallelism.pipeline_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).

#### Configuration

Expand All @@ -246,11 +252,11 @@ PP can be configured via `parallelism.pipeline_parallel_size` in the [training c
- Smaller TP values generally have less communication overhead.
- Larger TP values provide more memory savings but increase communication costs.

### Sequence Parallelism
### Context Parallelism

[Sequence Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#sequence-parallelism) (SP) extends tensor-level model parallelism by distributing computing load and activation memory across multiple GPUs along the sequence dimension of transformer layers. This method is particularly useful when training on the datasets with longer sequences. It also benefits portions of the layer that have previously not been parallelized, enhancing overall model performance and efficiency.
[Context Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#context-parallelism) (CP) distributes activation memory along the sequence dimension across GPUs, which is particularly useful when training on datasets with very long sequences.

Sequence Parallelism can be enabled/disabled using `parallelism.sequence_parallel` in the [training configuration](/documentation/customizer-reference/manage-jobs/training-configuration).
Context Parallelism can be configured via `parallelism.context_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).

## Sequence Packing

Expand All @@ -260,46 +266,24 @@ Sequence Parallelism can be enabled/disabled using `parallelism.sequence_paralle
- Maximize GPU compute efficiency
- Optimize GPU memory usage

When enabled, the `batch_size` and number of training steps update so that each gradient iteration sees, on average, the same number of tokens compared to running fine-tuning _without_ sequence packing.
When enabled, the effective batch size and number of training steps update so that each gradient iteration sees, on average, the same number of tokens compared to running fine-tuning _without_ sequence packing.

### Limitations
Sequence packing is enabled per backend:

- **Automodel**: set `batch.sequence_packing` to `true`.
- **Unsloth**: set `dataset.packing` to `true`.

- Sequence packing is an experimental feature only supprted by the following models:
- meta/llama-3.1-8b-instruct
- meta/llama-3.1-70b-instruct
- meta/llama3-70b-instruct
- meta/llama-3.2-3b-instruct
- meta/llama-3.2-1b
- meta/llama-3.2-1b-instruct
See [Training Configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration) for the full batch and dataset options.

### Limitations

- Sequence packing is an experimental feature whose support varies by model and backend.
- Chat prompt templates do not have support for sequence packing.

<Note>

If `training.sequence_packing` is enabled when using a model that does not support sequence packing, the fine-tuning will proceed _without_ sequence packing and a warning will be returned in the API response.
If sequence packing is enabled for a model that does not support it, fine-tuning proceeds _without_ sequence packing and a warning is returned in the API response.

</Note>
### Example of using in the API

Example of creating a customization job with sequence packing enabled:

```python
job = client.customization.jobs.create(
workspace="default",
name="my-packed-job",
spec={
"model": "default/llama-3.1-8b-instruct",
"dataset": "fileset://default/test-dataset",
"training": {
"type": "sft",
"peft": {"type": "lora", "rank": 16},
"sequence_packing": True,
"epochs": 10,
"batch_size": 16,
"learning_rate": 0.00001,
},
},
)
```

Learn how to create a LoRA customization job with sequence packing by following the [Optimizing for Tokens/GPU](tutorials/optimize-throughput.ipynb) tutorial.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace the relative tutorial link with a canonical Fern nav URL.

Line 289 links to tutorials/optimize-throughput.ipynb, which is a relative source-style path and can fail Fern link checks/build. Use the canonical /documentation/... route for that tutorial instead. As per coding guidelines, "Internal links must use canonical nav URLs like /documentation/get-started/core-concepts/workspaces, not relative .md/source paths. make docs-broken-links is the check."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/customizer/about.mdx` at line 289, Replace the relative source link
"tutorials/optimize-throughput.ipynb" in docs/customizer/about.mdx with the
canonical Fern nav URL for that tutorial (use
"/documentation/tutorials/optimize-throughput"), keeping the link text
unchanged; update the markdown link target where the phrase "Optimizing for
Tokens/GPU" is referenced so the href uses the canonical /documentation/...
route instead of the relative .ipynb path to satisfy the internal-linking
guidelines.

Source: Coding guidelines

84 changes: 84 additions & 0 deletions docs/customizer/cli.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: "Using the NeMo Customizer Skill"
description: ""
---
<a id="ft-customizer-skill"></a>

The `nemo-customizer` skill fine-tunes models on NeMo Platform from the command line. It drives the `nemo customization` CLI, which submits **SFT + LoRA** (as well as full-weight and distillation) training as GPU container jobs on the platform's Jobs service — training runs on the platform, not in your shell. Two backends ship in the repo: **`automodel`** (default, multi-GPU capable) and **`unsloth`** (single-GPU 4-bit LoRA). Both are `submit`-only.

<Note>

This page documents the plugin CLI workflow (`nemo customization automodel|unsloth submit`). The job JSON shape shown here (`training.training_type`, `training.finetuning_type`) is specific to these backends.

</Note>

## Prerequisites

- A NeMo Platform deployment with a GPU execution profile (check with `nemo jobs list-execution-profiles`).
- The `nemo-customizer` plugin and a backend (`nemo-automodel` or `nemo-unsloth`) installed.
- A base model (Hugging Face repo) and a training dataset in mind.

## Example: Fine-tune with Automodel

Run these commands from the `nemo-platform` repository root. Substitute your own model, dataset, and names.

### 1. Authenticate

```bash
uv run nemo auth login --unsigned-token --email admin@example.com
```

### 2. Upload the dataset as a fileset

```bash
uv run nemo files filesets create commonsense_qa --workspace default --purpose dataset --exist-ok
uv run nemo files upload /tmp/train.jsonl commonsense_qa --workspace default --remote-path train.jsonl
```

See [Manage Files](/documentation/get-started/core-concepts/manage-files) for dataset upload details.

### 3. Register the base model

```bash
uv run nemo files filesets create qwen3-1.7b --workspace default --purpose model --exist-ok \
--storage '{"type":"huggingface","repo_id":"Qwen/Qwen3-1.7B","repo_type":"model","revision":"main"}'
uv run nemo models create qwen3-1.7b --workspace default --exist-ok \
--input-data '{"name":"qwen3-1.7b","fileset":"default/qwen3-1.7b"}'
```

### 4. Define the job

Write `/tmp/job.json` describing an SFT + LoRA job:

```json
{
"model": "default/qwen3-1.7b",
"dataset": { "training": "default/commonsense_qa" },
"training": {
"training_type": "sft",
"finetuning_type": "lora",
"lora": { "rank": 16, "alpha": 32 },
"max_seq_length": 2048
},
"schedule": { "epochs": 1 },
"batch": { "global_batch_size": 4, "micro_batch_size": 1 },
"optimizer": { "learning_rate": 5e-5 },
"output": { "name": "qwen3-1.7b-commonsense-qa-lora" }
}
```

### 5. Submit and poll

```bash
uv run nemo customization automodel submit /tmp/job.json --workspace default
uv run nemo jobs get-status automodel-<job-id>
```

Read `<job-id>` from the `name` field in the submit output. The job is finished when its top-level `status` is `completed`, `error`, or `cancelled`.

## Going Further

- Use the `unsloth` backend for single-GPU 4-bit LoRA: `uv run nemo customization unsloth submit /tmp/job.json --workspace default`.
- Print the live job schema: `uv run nemo customization automodel explain` (or `unsloth explain`).
- For hyperparameters, batch sizing, multi-GPU, and distillation, see [Training Configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).
- The full skill, including dataset conversion and troubleshooting references, lives in the repository at `plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/SKILL.md`.
21 changes: 7 additions & 14 deletions docs/customizer/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ Learn how to fine-tune models by making requests to NVIDIA NeMo Customizer throu
At a high level, the fine-tuning workflow consists of the following steps:

1. [Create a Model Entity](/documentation/customizer-reference/manage-model-entities/overview) pointing to your base model checkpoint (stored as a FileSet).
1. Format a compatible [dataset](/documentation/fine-tune-models/tutorials/format-training-dataset).
1. [Create a customization job](/documentation/fine-tune-models/manage-customization-jobs) referencing the Model Entity.
1. Format a compatible [dataset](/documentation/customizer-reference/tutorials/format-training-dataset).
1. [Create a customization job](/documentation/customizer-reference/manage-customization-jobs) referencing the Model Entity.
1. Monitor the job until it completes.
1. The customization job automatically creates either:
- **LoRA jobs**: An adapter attached to the original Model Entity
Expand Down Expand Up @@ -49,7 +49,7 @@ View the available Phi models from Microsoft, designed for strong reasoning capa
View the available GPT-OSS models supported for Full SFT customization.

</Card>
<Card title="Embedding Models" href="/documentation/fine-tune-models/models/embedding">
<Card title="Embedding Models" href="/documentation/customizer-reference/models/embedding">

View the available embedding models for question-answering and retrieval tasks.

Expand All @@ -63,7 +63,7 @@ Perform common fine-tuning tasks.

<Cards>

<Card title="Manage Customization Jobs" href="/documentation/fine-tune-models/manage-customization-jobs">
<Card title="Manage Customization Jobs" href="/documentation/customizer-reference/manage-customization-jobs">

Create, list, view, and cancel customization jobs.

Expand All @@ -89,7 +89,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks.

<Cards>

<Card title="Format Training Datasets" href="/documentation/fine-tune-models/tutorials/format-training-dataset">
<Card title="Format Training Datasets" href="/documentation/customizer-reference/tutorials/format-training-dataset">

Learn how to format datasets for different model types.

Expand All @@ -109,13 +109,6 @@ Learn how to start a SFT customization job using a custom dataset.

<small><span class="md-tag">nemo-customizer</span></small>

</Card>
<Card title="Align a Model with DPO" href="tutorials/dpo-customization-job.ipynb">

Learn how to align a model with DPO (Direct Preference Optimization) using preference data.

<small><span class="md-tag">nemo-customizer</span> <span class="md-tag">dpo</span></small>

</Card>
<Card title="Distill a Model with Knowledge Distillation" href="tutorials/distillation-customization-job.ipynb">

Expand All @@ -124,7 +117,7 @@ Learn how to compress a larger teacher model into a smaller student model.
<small><span class="md-tag">nemo-customizer</span> <span class="md-tag">knowledge-distillation</span></small>

</Card>
<Card title="Check Customization Job Metrics" href="/documentation/fine-tune-models/tutorials/metrics">
<Card title="Check Customization Job Metrics" href="/documentation/customizer-reference/tutorials/metrics">

Learn how to check job metrics using MLFlow or Weights & Biases.

Expand All @@ -147,7 +140,7 @@ Learn how to optimize the token-per-GPU throughput for a LoRA optimization job.

<Cards>

<Card title="Hyperparameters" href="/documentation/customizer-reference/manage-jobs/training-configuration">
<Card title="Hyperparameters" href="/documentation/customizer-reference/manage-customization-jobs/training-configuration">

View the available hyperparameters and their valid options that you can set when creating a customization job.

Expand Down
38 changes: 20 additions & 18 deletions docs/customizer/manage-customization-jobs/cancel-job.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ export NMP_BASE_URL="https://your-nmp-base-url"

## To Cancel a Customization Job

Running jobs may be cancelled. A cancelled job does not upload checkpoints. You need the job's name and workspace; you can get these from [List Active Jobs](/documentation/customizer-reference/manage-jobs/list-active-jobs).
Running jobs may be cancelled. A cancelled job does not upload checkpoints. Customization jobs run on the platform's Jobs service, so you cancel them through that service (the same way for both backends) using the job's name and workspace. You can get these from [List Active Jobs](/documentation/customizer-reference/manage-customization-jobs/list-active-jobs).

Use the SDK to cancel a customization job:
Use the SDK to cancel a job:

```python
import os
Expand All @@ -32,10 +32,10 @@ client = NeMoPlatform(
workspace="default",
)

# Cancel a customization job (use the job name and workspace from List Active Jobs)
job_name = "my-sft-job"
# Cancel a job (use the job name and workspace from List Active Jobs)
job_name = "automodel-a1b2c3d4e5f6"
workspace = "default"
cancelled_job = client.customization.jobs.cancel(name=job_name, workspace=workspace)
cancelled_job = client.jobs.cancel(name=job_name, workspace=workspace)

print(f"Job {cancelled_job.name} has been cancelled")
print(f"Current status: {cancelled_job.status}")
Expand All @@ -48,23 +48,25 @@ print(f"Updated at: {cancelled_job.updated_at}")

```json
{
"name": "my-sft-job",
"name": "automodel-a1b2c3d4e5f6",
"workspace": "default",
"id": "job-abc123def456",
"id": "platform-job-2k8i3i1HqJHHPVB5M6Bk9Z",
"source": "automodel",
"status": "cancelled",
"spec": {
"model": "default/llama-3-2-1b",
"dataset": "fileset://default/my-training-dataset",
"model": "default/llama-3-2-1b-instruct",
"dataset": { "training": "default/my-training-dataset" },
"training": {
"type": "sft",
"batch_size": 16,
"epochs": 3,
"learning_rate": 1e-05,
"max_seq_length": 4096,
"parallelism": {
"num_gpus_per_node": 2,
"tensor_parallel_size": 2
}
"training_type": "sft",
"finetuning_type": "all_weights",
"max_seq_length": 4096
},
"schedule": { "epochs": 3 },
"batch": { "global_batch_size": 16, "micro_batch_size": 1 },
"optimizer": { "learning_rate": 1e-05 },
"parallelism": {
"num_gpus_per_node": 2,
"tensor_parallel_size": 2
},
"output": {
"name": "my-finetuned-llama",
Expand Down
Loading
Loading