NVIDIA-NeMo · gabwow · Jun 12, 2026 · Jun 13, 2026 · coderabbitai · Jun 13, 2026
@@ -147,7 +147,7 @@ Below are some examples of how you might format your dataset to perform a handfu
 
 When testing models trained with prompt/completion datasets, use the `/v1/completions` endpoint instead of `/v1/chat/completions`.
 
-For details, refer to the [Dataset Formatting tutorial](/documentation/fine-tune-models/tutorials/format-training-dataset#format-a-prompt-completion-dataset).
+For details, refer to the [Dataset Formatting tutorial](/documentation/customizer-reference/tutorials/format-training-dataset#format-a-prompt-completion-dataset).
 
 </Note>
 #### Document Classification
@@ -197,31 +197,37 @@ completion: "<simple>"
 
 Most of the models support Instruction Templates for training, the expected dataset conforms with the standard [OpenAI messages format](https://platform.openai.com/docs/guides/fine-tuning#multi-turn-chat-examples). Additionally, some models support tool calling which have additional optional parameters of `tools` at the top level of each entry and `tool_calls` per message.
 
-For more information refer to our [in-depth instructions](/documentation/fine-tune-models/tutorials/format-training-dataset#format-a-conversation-dataset).
+For more information refer to our [in-depth instructions](/documentation/customizer-reference/tutorials/format-training-dataset#format-a-conversation-dataset).
 
 ## Hyperparameters
 
 Hyperparameters are configuration settings used to control the training process. You'll set these values before training begins to optimize how the model learns from your data. While the model automatically learns its internal parameters during training, these hyperparameters help guide that learning process. The right values depend on your specific use case, dataset size, and computational resources.
 
-| Hyperparameter | Description | Default |
-|----------------|-------------|---------|
-| `epochs` | Number of complete passes through the training dataset | Model-dependent |
-| `batch_size` | Number of samples processed before updating model weights | Model-dependent |
-| `learning_rate` | Step size for weight updates during training | Model-dependent |
-| `training.type` | Training type: `"sft"` for supervised fine-tuning | `"sft"` |
-| `training.peft.type` | PEFT method: `"lora"` for Low-Rank Adaptation | — |
-| `training.peft.rank` | LoRA rank (lower = fewer parameters, higher = more expressive) | 8 |
-| `training.peft.alpha` | LoRA scaling factor | 32 |
+Common hyperparameters you'll tune include:
+
+| Hyperparameter | Description |
+|----------------|-------------|
+| Epochs | Number of complete passes through the training dataset |
+| Batch size | Number of samples processed before updating model weights |
+| Learning rate | Step size for weight updates during training |
+| LoRA rank | Low-rank dimension of the adapter (lower = fewer parameters, higher = more expressive) |
+| LoRA alpha | LoRA scaling factor |
+
+<Note>
+
+NeMo Customizer offers **two training backends** — Automodel (multi-GPU) and Unsloth (single-GPU, quantized) — and each accepts its own job configuration. The exact field names, defaults, and available knobs differ between them. For the full per-backend hyperparameter reference, see [Training Configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).
+
+</Note>
 
 ## Parallelism
 
-NeMo Platform Customizer supports various distributed training parallelization methods, which can be mixed together.
+The Automodel backend supports several distributed training parallelization methods, which can be mixed together. (The Unsloth backend runs on a single GPU and does not use these settings.)
 
 ### Tensor Parallelism
 
 [Tensor Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#tensor-parallelism) (TP) distributes the parameter tensor of an individual layer across GPUs. In addition to reducing model state memory usage, it also saves activation memory as the per-GPU tensor sizes shrink. The tradeoff is increased CPU overhead.
 
-TP can be configured via `parallelism.tensor_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-jobs/training-configuration).
+TP can be configured via `parallelism.tensor_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).
 
 <Note>
 
@@ -232,7 +238,7 @@ As of release 25.10.0, AutoModel engines including Phi-4, Qwen, and Gemma suppor
 
 [Pipeline Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#pipeline-parallelism) (PP) distributes the layers of a neural network across GPUs. The GPUs then process the different layers sequentially.
 
-PP can be configured via `parallelism.pipeline_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-jobs/training-configuration).
+PP can be configured via `parallelism.pipeline_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).
 
 #### Configuration
 
@@ -246,11 +252,11 @@ PP can be configured via `parallelism.pipeline_parallel_size` in the [training c
  - Smaller TP values generally have less communication overhead.
  - Larger TP values provide more memory savings but increase communication costs.
 
-### Sequence Parallelism
+### Context Parallelism
 
-[Sequence Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#sequence-parallelism) (SP) extends tensor-level model parallelism by distributing computing load and activation memory across multiple GPUs along the sequence dimension of transformer layers. This method is particularly useful when training on the datasets with longer sequences. It also benefits portions of the layer that have previously not been parallelized, enhancing overall model performance and efficiency.
+[Context Parallelism](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/parallelisms.html#context-parallelism) (CP) distributes activation memory along the sequence dimension across GPUs, which is particularly useful when training on datasets with very long sequences.
 
-Sequence Parallelism can be enabled/disabled using `parallelism.sequence_parallel` in the [training configuration](/documentation/customizer-reference/manage-jobs/training-configuration).
+Context Parallelism can be configured via `parallelism.context_parallel_size` in the [training configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).
 
 ## Sequence Packing
 
@@ -260,46 +266,24 @@ Sequence Parallelism can be enabled/disabled using `parallelism.sequence_paralle
 - Maximize GPU compute efficiency
 - Optimize GPU memory usage
 
-When enabled, the `batch_size` and number of training steps update so that each gradient iteration sees, on average, the same number of tokens compared to running fine-tuning _without_ sequence packing.
+When enabled, the effective batch size and number of training steps update so that each gradient iteration sees, on average, the same number of tokens compared to running fine-tuning _without_ sequence packing.
 
-### Limitations
+Sequence packing is enabled per backend:
+
+- **Automodel**: set `batch.sequence_packing` to `true`.
+- **Unsloth**: set `dataset.packing` to `true`.
 
-- Sequence packing is an experimental feature only supprted by the following models:
- - meta/llama-3.1-8b-instruct
- - meta/llama-3.1-70b-instruct
- - meta/llama3-70b-instruct
- - meta/llama-3.2-3b-instruct
- - meta/llama-3.2-1b
- - meta/llama-3.2-1b-instruct
+See [Training Configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration) for the full batch and dataset options.
 
+### Limitations
+
+- Sequence packing is an experimental feature whose support varies by model and backend.
 - Chat prompt templates do not have support for sequence packing.
 
 <Note>
 
-If `training.sequence_packing` is enabled when using a model that does not support sequence packing, the fine-tuning will proceed _without_ sequence packing and a warning will be returned in the API response.
+If sequence packing is enabled for a model that does not support it, fine-tuning proceeds _without_ sequence packing and a warning is returned in the API response.
 
 </Note>
-### Example of using in the API
-
-Example of creating a customization job with sequence packing enabled:
-
-```python
-job = client.customization.jobs.create(
-    workspace="default",
-    name="my-packed-job",
-    spec={
-        "model": "default/llama-3.1-8b-instruct",
-        "dataset": "fileset://default/test-dataset",
-        "training": {
-            "type": "sft",
-            "peft": {"type": "lora", "rank": 16},
-            "sequence_packing": True,
-            "epochs": 10,
-            "batch_size": 16,
-            "learning_rate": 0.00001,
-        },
-    },
-)
-```
 
 Learn how to create a LoRA customization job with sequence packing by following the [Optimizing for Tokens/GPU](tutorials/optimize-throughput.ipynb) tutorial.
@@ -0,0 +1,84 @@
+---
+title: "Using the NeMo Customizer Skill"
+description: ""
+---
+<a id="ft-customizer-skill"></a>
+
+The `nemo-customizer` skill fine-tunes models on NeMo Platform from the command line. It drives the `nemo customization` CLI, which submits **SFT + LoRA** (as well as full-weight and distillation) training as GPU container jobs on the platform's Jobs service — training runs on the platform, not in your shell. Two backends ship in the repo: **`automodel`** (default, multi-GPU capable) and **`unsloth`** (single-GPU 4-bit LoRA). Both are `submit`-only.
+
+<Note>
+
+This page documents the plugin CLI workflow (`nemo customization automodel|unsloth submit`). The job JSON shape shown here (`training.training_type`, `training.finetuning_type`) is specific to these backends.
+
+</Note>
+
+## Prerequisites
+
+- A NeMo Platform deployment with a GPU execution profile (check with `nemo jobs list-execution-profiles`).
+- The `nemo-customizer` plugin and a backend (`nemo-automodel` or `nemo-unsloth`) installed.
+- A base model (Hugging Face repo) and a training dataset in mind.
+
+## Example: Fine-tune with Automodel
+
+Run these commands from the `nemo-platform` repository root. Substitute your own model, dataset, and names.
+
+### 1. Authenticate
+
+```bash
+uv run nemo auth login --unsigned-token --email admin@example.com
+```
+
+### 2. Upload the dataset as a fileset
+
+```bash
+uv run nemo files filesets create commonsense_qa --workspace default --purpose dataset --exist-ok
+uv run nemo files upload /tmp/train.jsonl commonsense_qa --workspace default --remote-path train.jsonl
+```
+
+See [Manage Files](/documentation/get-started/core-concepts/manage-files) for dataset upload details.
+
+### 3. Register the base model
+
+```bash
+uv run nemo files filesets create qwen3-1.7b --workspace default --purpose model --exist-ok \
+  --storage '{"type":"huggingface","repo_id":"Qwen/Qwen3-1.7B","repo_type":"model","revision":"main"}'
+uv run nemo models create qwen3-1.7b --workspace default --exist-ok \
+  --input-data '{"name":"qwen3-1.7b","fileset":"default/qwen3-1.7b"}'
+```
+
+### 4. Define the job
+
+Write `/tmp/job.json` describing an SFT + LoRA job:
+
+```json
+{
+  "model": "default/qwen3-1.7b",
+  "dataset": { "training": "default/commonsense_qa" },
+  "training": {
+    "training_type": "sft",
+    "finetuning_type": "lora",
+    "lora": { "rank": 16, "alpha": 32 },
+    "max_seq_length": 2048
+  },
+  "schedule": { "epochs": 1 },
+  "batch": { "global_batch_size": 4, "micro_batch_size": 1 },
+  "optimizer": { "learning_rate": 5e-5 },
+  "output": { "name": "qwen3-1.7b-commonsense-qa-lora" }
+}
+```
+
+### 5. Submit and poll
+
+```bash
+uv run nemo customization automodel submit /tmp/job.json --workspace default
+uv run nemo jobs get-status automodel-<job-id>
+```
+
+Read `<job-id>` from the `name` field in the submit output. The job is finished when its top-level `status` is `completed`, `error`, or `cancelled`.
+
+## Going Further
+
+- Use the `unsloth` backend for single-GPU 4-bit LoRA: `uv run nemo customization unsloth submit /tmp/job.json --workspace default`.
+- Print the live job schema: `uv run nemo customization automodel explain` (or `unsloth explain`).
+- For hyperparameters, batch sizing, multi-GPU, and distillation, see [Training Configuration](/documentation/customizer-reference/manage-customization-jobs/training-configuration).
+- The full skill, including dataset conversion and troubleshooting references, lives in the repository at `plugins/nemo-customizer/src/nemo_customizer/skills/nemo-customizer/SKILL.md`.
@@ -11,8 +11,8 @@ Learn how to fine-tune models by making requests to NVIDIA NeMo Customizer throu
 At a high level, the fine-tuning workflow consists of the following steps:
 
 1. [Create a Model Entity](/documentation/customizer-reference/manage-model-entities/overview) pointing to your base model checkpoint (stored as a FileSet).
-1. Format a compatible [dataset](/documentation/fine-tune-models/tutorials/format-training-dataset).
-1. [Create a customization job](/documentation/fine-tune-models/manage-customization-jobs) referencing the Model Entity.
+1. Format a compatible [dataset](/documentation/customizer-reference/tutorials/format-training-dataset).
+1. [Create a customization job](/documentation/customizer-reference/manage-customization-jobs) referencing the Model Entity.
 1. Monitor the job until it completes.
 1. The customization job automatically creates either:
  - **LoRA jobs**: An adapter attached to the original Model Entity
@@ -49,7 +49,7 @@ View the available Phi models from Microsoft, designed for strong reasoning capa
 View the available GPT-OSS models supported for Full SFT customization.
 
 </Card>
-<Card title="Embedding Models" href="/documentation/fine-tune-models/models/embedding">
+<Card title="Embedding Models" href="/documentation/customizer-reference/models/embedding">
 
 View the available embedding models for question-answering and retrieval tasks.
 
@@ -63,7 +63,7 @@ Perform common fine-tuning tasks.
 
 <Cards>
 
-<Card title="Manage Customization Jobs" href="/documentation/fine-tune-models/manage-customization-jobs">
+<Card title="Manage Customization Jobs" href="/documentation/customizer-reference/manage-customization-jobs">
 
 Create, list, view, and cancel customization jobs.
 
@@ -89,7 +89,7 @@ Follow these tutorials to learn how to accomplish common fine-tuning tasks.
 
 <Cards>
 
-<Card title="Format Training Datasets" href="/documentation/fine-tune-models/tutorials/format-training-dataset">
+<Card title="Format Training Datasets" href="/documentation/customizer-reference/tutorials/format-training-dataset">
 
 Learn how to format datasets for different model types.
 
@@ -109,13 +109,6 @@ Learn how to start a SFT customization job using a custom dataset.
 
 <small><span class="md-tag">nemo-customizer</span></small>
 
-</Card>
-<Card title="Align a Model with DPO" href="tutorials/dpo-customization-job.ipynb">
-
-Learn how to align a model with DPO (Direct Preference Optimization) using preference data.
-
-<small><span class="md-tag">nemo-customizer</span> <span class="md-tag">dpo</span></small>
-
 </Card>
 <Card title="Distill a Model with Knowledge Distillation" href="tutorials/distillation-customization-job.ipynb">
 
@@ -124,7 +117,7 @@ Learn how to compress a larger teacher model into a smaller student model.
 <small><span class="md-tag">nemo-customizer</span> <span class="md-tag">knowledge-distillation</span></small>
 
 </Card>
-<Card title="Check Customization Job Metrics" href="/documentation/fine-tune-models/tutorials/metrics">
+<Card title="Check Customization Job Metrics" href="/documentation/customizer-reference/tutorials/metrics">
 
 Learn how to check job metrics using MLFlow or Weights & Biases.
 
@@ -147,7 +140,7 @@ Learn how to optimize the token-per-GPU throughput for a LoRA optimization job.
 
 <Cards>
 
-<Card title="Hyperparameters" href="/documentation/customizer-reference/manage-jobs/training-configuration">
+<Card title="Hyperparameters" href="/documentation/customizer-reference/manage-customization-jobs/training-configuration">
 
 View the available hyperparameters and their valid options that you can set when creating a customization job.
 

@@ -18,9 +18,9 @@ export NMP_BASE_URL="https://your-nmp-base-url"
 
 ## To Cancel a Customization Job
 
-Running jobs may be cancelled. A cancelled job does not upload checkpoints. You need the job's name and workspace; you can get these from [List Active Jobs](/documentation/customizer-reference/manage-jobs/list-active-jobs).
+Running jobs may be cancelled. A cancelled job does not upload checkpoints. Customization jobs run on the platform's Jobs service, so you cancel them through that service (the same way for both backends) using the job's name and workspace. You can get these from [List Active Jobs](/documentation/customizer-reference/manage-customization-jobs/list-active-jobs).
 
-Use the SDK to cancel a customization job:
+Use the SDK to cancel a job:
 
 ```python
 import os
@@ -32,10 +32,10 @@ client = NeMoPlatform(
     workspace="default",
 )
 
-# Cancel a customization job (use the job name and workspace from List Active Jobs)
-job_name = "my-sft-job"
+# Cancel a job (use the job name and workspace from List Active Jobs)
+job_name = "automodel-a1b2c3d4e5f6"
 workspace = "default"
-cancelled_job = client.customization.jobs.cancel(name=job_name, workspace=workspace)
+cancelled_job = client.jobs.cancel(name=job_name, workspace=workspace)
 
 print(f"Job {cancelled_job.name} has been cancelled")
 print(f"Current status: {cancelled_job.status}")
@@ -48,23 +48,25 @@ print(f"Updated at: {cancelled_job.updated_at}")
 
 ```json
 {
-  "name": "my-sft-job",
+  "name": "automodel-a1b2c3d4e5f6",
   "workspace": "default",
-  "id": "job-abc123def456",
+  "id": "platform-job-2k8i3i1HqJHHPVB5M6Bk9Z",
+  "source": "automodel",
   "status": "cancelled",
   "spec": {
-    "model": "default/llama-3-2-1b",
-    "dataset": "fileset://default/my-training-dataset",
+    "model": "default/llama-3-2-1b-instruct",
+    "dataset": { "training": "default/my-training-dataset" },
     "training": {
-      "type": "sft",
-      "batch_size": 16,
-      "epochs": 3,
-      "learning_rate": 1e-05,
-      "max_seq_length": 4096,
-      "parallelism": {
-        "num_gpus_per_node": 2,
-        "tensor_parallel_size": 2
-      }
+      "training_type": "sft",
+      "finetuning_type": "all_weights",
+      "max_seq_length": 4096
+    },
+    "schedule": { "epochs": 3 },
+    "batch": { "global_batch_size": 16, "micro_batch_size": 1 },
+    "optimizer": { "learning_rate": 1e-05 },
+    "parallelism": {
+      "num_gpus_per_node": 2,
+      "tensor_parallel_size": 2
     },
     "output": {
       "name": "my-finetuned-llama",