NVIDIA
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 16 additions & 1 deletion b/‎.pre-commit-config.yaml‎
Lines changed: 16 additions & 1 deletion
diff --git a/‎examples/llm_qad/README.md‎
Lines changed: 2 additions & 0 deletions b/‎examples/llm_qad/README.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎examples/llm_qat/.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎examples/llm_qat/.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎examples/llm_qat/ARGUMENTS.md‎
Lines changed: 45 additions & 0 deletions b/‎examples/llm_qat/ARGUMENTS.md‎
Lines changed: 45 additions & 0 deletions
diff --git a/‎examples/llm_qat/README.md‎
Lines changed: 179 additions & 264 deletions b/‎examples/llm_qat/README.md‎
Lines changed: 179 additions & 264 deletions
diff --git a/‎examples/llm_qat/accelerate_config/fsdp1.yaml‎
Lines changed: 0 additions & 29 deletions b/‎examples/llm_qat/accelerate_config/fsdp1.yaml‎
Lines changed: 0 additions & 29 deletions
diff --git a/‎examples/llm_qat/arguments.py‎
Lines changed: 118 additions & 0 deletions b/‎examples/llm_qat/arguments.py‎
Lines changed: 118 additions & 0 deletions
diff --git a/‎…mples/llm_qat/accelerate_config/ddp.yaml‎ ‎…ples/llm_qat/configs/accelerate/ddp.yaml‎examples/llm_qat/accelerate_config/ddp.yaml renamed to examples/llm_qat/configs/accelerate/ddp.yaml b/‎…mples/llm_qat/accelerate_config/ddp.yaml‎ ‎…ples/llm_qat/configs/accelerate/ddp.yaml‎examples/llm_qat/accelerate_config/ddp.yaml renamed to examples/llm_qat/configs/accelerate/ddp.yaml
diff --git a/‎…llm_qat/accelerate_config/deepspeed.yaml‎ ‎…lm_qat/configs/accelerate/deepspeed.yaml‎examples/llm_qat/accelerate_config/deepspeed.yaml renamed to examples/llm_qat/configs/accelerate/deepspeed.yaml b/‎…llm_qat/accelerate_config/deepspeed.yaml‎ ‎…lm_qat/configs/accelerate/deepspeed.yaml‎examples/llm_qat/accelerate_config/deepspeed.yaml renamed to examples/llm_qat/configs/accelerate/deepspeed.yaml
diff --git a/‎…les/llm_qat/accelerate_config/fsdp2.yaml‎ ‎…es/llm_qat/configs/accelerate/fsdp2.yaml‎examples/llm_qat/accelerate_config/fsdp2.yaml renamed to examples/llm_qat/configs/accelerate/fsdp2.yaml b/‎…les/llm_qat/accelerate_config/fsdp2.yaml‎ ‎…es/llm_qat/configs/accelerate/fsdp2.yaml‎examples/llm_qat/accelerate_config/fsdp2.yaml renamed to examples/llm_qat/configs/accelerate/fsdp2.yaml
@@ -93,7 +93,7 @@ repos:
               examples/llm_eval/lm_eval_hf.py|
               examples/llm_eval/mmlu.py|
               examples/llm_eval/modeling.py|
-              examples/llm_qat/main.py|
+              examples/llm_qat/train.py|
               examples/llm_sparsity/weight_sparsity/finetune.py|
               examples/specdec_bench/specdec_bench/models/specbench_medusa.py|
               examples/speculative_decoding/main.py|
@@ -122,6 +122,21 @@ repos:
         args: ["-c", "pyproject.toml", "-q"]
         additional_dependencies: ["bandit[toml]"]
 
+  - repo: local
+    hooks:
+      - id: generate-arguments-md
+        name: Regenerate examples/llm_qat/ARGUMENTS.md
+        entry: bash -c 'python examples/llm_qat/train.py --generate_docs examples/llm_qat/ARGUMENTS.md'
+        language: system
+        files: >-
+          (?x)^(
+            examples/llm_qat/arguments\.py|
+            examples/llm_qat/train\.py|
+            modelopt/torch/opt/plugins/transformers\.py|
+            modelopt/torch/quantization/plugins/transformers_trainer\.py
+          )$
+        pass_filenames: false
+
   - repo: https://github.com/DavidAnson/markdownlint-cli2
     rev: v0.18.1
     hooks:
 
@@ -2,6 +2,8 @@
 
 Quantization-Aware Distillation (QAD) training scripts for language models using Megatron-LM. These scripts enable training quantized (e.g., NVFP4) student models with knowledge distillation from full-precision teacher models.
 
+> **Note:** For Hugging Face LLM QAD, see the [LLM QAT QAD section](../llm_qat/README.md#end-to-end-qad-example).
+
 ## Overview
 
 | Script | Purpose |
 
@@ -0,0 +1 @@
+.cache/
@@ -0,0 +1,45 @@
+# Argument Reference
+
+_Auto-generated — do not edit by hand._
+
+## DistillArguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--distill` | `bool` | `False` | Enable training with knowledge distillation. |
+| `--teacher_model` | `str` | `None` | The name or path of the teacher model to use for distillation. |
+| `--criterion` | `str` | `"logits_loss"` | Distillation loss criterion. Currently only 'logits_loss' is supported. |
+
+## DataArguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--dataset_config` | `str` | `"configs/dataset/blend.yaml"` | Path to a dataset blend YAML config file. See configs/dataset/README.md for schema documentation. |
+| `--train_samples` | `int` | `0` | Override train_samples from dataset config. 0 = use config value. |
+| `--eval_samples` | `int` | `0` | Override eval_samples from dataset config. 0 = use config value. |
+
+## ModelArguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--model_name_or_path` | `str` | `"meta-llama/Llama-2-7b-hf"` |  |
+| `--model_max_length` | `int` | `4096` | Maximum sequence length. Sequences will be right padded (and possibly truncated). |
+
+## QuantizeArguments
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--recipe` | `str` | `None` | Path to a quantization recipe YAML file (built-in or custom). Built-in recipes can be specified by relative path, e.g. 'general/ptq/nvfp4_default-fp8_kv'. |
+| `--calib_size` | `int` | `512` | Specify the calibration size for quantization. The calibration dataset is used to setup the quantization scale parameters. |
+| `--calib_batch_size` | `int` | `1` | Batch size for calibration data during quantization. |
+| `--compress` | `bool` | `False` | Whether to compress the model weights after quantization for QLoRA. This is useful for reducing the model size. |
+| `--quantize_output_dir` | `str` | `"quantized_model"` | Directory to save the quantized model checkpoint. |
+
+## TrainingArguments
+
+Extends [HuggingFace TrainingArguments](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments). Only additional/overridden arguments are shown below.
+
+| Argument | Type | Default | Description |
+|----------|------|---------|-------------|
+| `--cache_dir` | `str` | `None` |  |
+| `--lora` | `bool` | `False` | Whether to add LoRA (Low-Rank Adaptation) adapter before training. When using real quantization, the LoRA adapter must be set, as quantized weights will be frozen during training. |
@@ -0,0 +1,118 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Shared argument dataclasses for llm_qat scripts (quantize.py, train.py)."""
+
+from dataclasses import field
+
+import transformers
+
+from modelopt.torch.opt.plugins.transformers import ModelOptHFArguments
+
+
+class ModelArguments(ModelOptHFArguments):
+    model_name_or_path: str = field(default="meta-llama/Llama-2-7b-hf")
+    model_max_length: int = field(
+        default=4096,
+        metadata={
+            "help": (
+                "Maximum sequence length. Sequences will be right padded (and possibly truncated)."
+            )
+        },
+    )
+
+
+class DataArguments(ModelOptHFArguments):
+    dataset_config: str = field(
+        default="configs/dataset/blend.yaml",
+        metadata={
+            "help": (
+                "Path to a dataset blend YAML config file. "
+                "See configs/dataset/README.md for schema documentation."
+            )
+        },
+    )
+    train_samples: int = field(
+        default=0,
+        metadata={"help": "Override train_samples from dataset config. 0 = use config value."},
+    )
+    eval_samples: int = field(
+        default=0,
+        metadata={"help": "Override eval_samples from dataset config. 0 = use config value."},
+    )
+
+
+class TrainingArguments(ModelOptHFArguments, transformers.TrainingArguments):
+    cache_dir: str | None = field(default=None)
+    dataloader_drop_last: bool = field(default=True)
+    bf16: bool = field(default=True)
+    lora: bool = field(
+        default=False,
+        metadata={
+            "help": (
+                "Whether to add LoRA (Low-Rank Adaptation) adapter before training. When using real quantization, "
+                "the LoRA adapter must be set, as quantized weights will be frozen during training."
+            )
+        },
+    )
+    # Sensible defaults (previously set by launch.sh)
+    eval_strategy: str = field(default="steps")
+    load_best_model_at_end: bool = field(default=True)
+    save_total_limit: int = field(default=2)
+    warmup_ratio: float = field(default=0.1)
+    logging_steps: int = field(default=1)
+    report_to: str = field(default="tensorboard")
+    do_eval: bool = field(default=True)
+    eval_accumulation_steps: int = field(default=1)
+    learning_rate: float = field(default=1e-4)
+
+
+class QuantizeArguments(ModelOptHFArguments):
+    recipe: str | None = field(
+        default=None,
+        metadata={
+            "help": (
+                "Path to a quantization recipe YAML file (built-in or custom). "
+                "Built-in recipes can be specified by relative path, e.g. "
+                "'general/ptq/nvfp4_default-fp8_kv'."
+            ),
+        },
+    )
+    calib_size: int = field(
+        default=512,
+        metadata={
+            "help": (
+                "Specify the calibration size for quantization. The calibration dataset is used to"
+                " setup the quantization scale parameters."
+            )
+        },
+    )
+    calib_batch_size: int = field(
+        default=1,
+        metadata={"help": "Batch size for calibration data during quantization."},
+    )
+    compress: bool = field(
+        default=False,
+        metadata={
+            "help": (
+                "Whether to compress the model weights after quantization for QLoRA. "
+                "This is useful for reducing the model size."
+            )
+        },
+    )
+    quantize_output_dir: str = field(
+        default="quantized_model",
+        metadata={"help": "Directory to save the quantized model checkpoint."},
+    )