diff --git a/docs.json b/docs.json
index 8031a0653e..2401a8df8b 100644
--- a/docs.json
+++ b/docs.json
@@ -1107,14 +1107,27 @@
"icon": "/icons/cropped-training.svg",
"pages": [
"training",
- "training/prerequisites",
{
- "group": "Serverless RL",
+ "group": "Getting Started",
+ "pages": [
+ "training/what-is-training",
+ "training/getting-started/prerequisites"
+ ]
+ },
+ {
+ "group": "Guides",
+ "pages": [
+ "training/guides/serverless-rl",
+ "training/guides/sft-training",
+ "training/guides/use-trained-models"
+ ]
+ },
+ {
+ "group": "Details",
"pages": [
- "training/serverless-rl",
- "training/serverless-rl/available-models",
- "training/serverless-rl/usage-limits",
- "training/serverless-rl/use-trained-models"
+ "training/details/pricing",
+ "training/details/available-models",
+ "training/details/usage-limits"
]
},
{
diff --git a/training.mdx b/training.mdx
index 4904352971..036cf43d60 100644
--- a/training.mdx
+++ b/training.mdx
@@ -1,15 +1,18 @@
---
title: W&B Training
-description: Post-train your models using reinforcement learning
+description: Post-train your models using reinforcement learning and supervised fine-tuning
mode: wide
---
-Now in public preview, W&B Training offers serverless reinforcement learning (RL) for post-training large language models (LLMs) to improve their reliability performing multi-turn, agentic tasks while also increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs.
+Now in public preview, W&B Training offers serverless post-training for large language models (LLMs), including both reinforcement learning (RL) and supervised fine-tuning (SFT).
+
+* **[Serverless RL](/training/serverless-rl)**: Improve model reliability performing multi-turn, agentic tasks while increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs.
+* **[Serverless SFT](/training/sft-training)**: Fine-tune models using curated datasets for distillation, teaching output style and format, or warming up before RL.
W&B Training includes integration with:
-* [ART](https://art.openpipe.ai/getting-started/about), a flexible RL fine-tuning framework.
-* [RULER](https://openpipe.ai/blog/ruler), a universal verifier.
+* [ART](https://art.openpipe.ai/getting-started/about), a flexible fine-tuning framework.
+* [RULER](https://openpipe.ai/blog/ruler), a universal verifier.
* A fully-managed backend on [CoreWeave Cloud](https://docs.coreweave.com/docs/platform).
-To get started, satisfy the [prerequisites](/training/prerequisites) to start using the service and then see [OpenPipe's Serverless RL quickstart](https://art.openpipe.ai/getting-started/quick-start) to learn how to post-train your models.
+To get started, satisfy the [prerequisites](/training/prerequisites) to start using the service and then see the [Serverless RL quickstart](https://art.openpipe.ai/getting-started/quick-start) or the [Serverless SFT docs](https://art.openpipe.ai/fundamentals/sft-training) to learn how to post-train your models.
diff --git a/training/api-reference.mdx b/training/api-reference.mdx
index 0c03083bbe..83ba064418 100644
--- a/training/api-reference.mdx
+++ b/training/api-reference.mdx
@@ -4,7 +4,7 @@ description: Complete API documentation for W&B Training
---
-The W&B Training API provides endpoints for managing and interacting with serverless reinforcement learning training jobs. The API is OpenAI-compatible for chat completions.
+The W&B Training API provides endpoints for managing and interacting with training jobs, including serverless reinforcement learning (RL) and supervised fine-tuning (SFT). The API is OpenAI-compatible for chat completions.
## Authentication
@@ -41,7 +41,8 @@ https://api.training.wandb.ai/v1
### training-jobs
-- **[POST /v1/preview/training-jobs](https://docs.wandb.ai/training/api-reference/training-jobs/create-training-job)** - Create Training Job
+- **[POST /v1/preview/sft-training-jobs](https://docs.wandb.ai/training/api-reference/training-jobs/create-sft-training-job)** - Create SFT Training Job
+- **[POST /v1/preview/training-jobs](https://docs.wandb.ai/training/api-reference/training-jobs/create-rl-training-job)** - Create RL Training Job
- **[GET /v1/preview/training-jobs/{training_job_id}](https://docs.wandb.ai/training/api-reference/training-jobs/get-training-job)** - Get Training Job
- **[GET /v1/preview/training-jobs/{training_job_id}/events](https://docs.wandb.ai/training/api-reference/training-jobs/get-training-job-events)** - Get Training Job Events
@@ -54,6 +55,7 @@ https://api.training.wandb.ai/v1
- [W&B Training overview](/training)
- [Prerequisites](/training/prerequisites)
+- [Serverless SFT](/training/sft-training)
- [Use your trained models](/training/serverless-rl/use-trained-models)
- [Available models](/training/serverless-rl/available-models)
- [Usage limits](/training/serverless-rl/usage-limits)
diff --git a/training/api-reference/openapi.json b/training/api-reference/openapi.json
index 5f18e92fe2..b1175206ef 100644
--- a/training/api-reference/openapi.json
+++ b/training/api-reference/openapi.json
@@ -399,7 +399,7 @@
"tags": [
"training-jobs"
],
- "summary": "Create Training Job",
+ "summary": "Create RL Training Job",
"operationId": "create_training_job_v1_preview_training_jobs_post",
"requestBody": {
"content": {
@@ -440,6 +440,53 @@
]
}
},
+ "/v1/preview/sft-training-jobs": {
+ "post": {
+ "tags": [
+ "training-jobs"
+ ],
+ "summary": "Create SFT Training Job",
+ "description": "Create a new SFT (Supervised Fine-Tuning) training job.",
+ "operationId": "create_sft_training_job_v1_preview_sft_training_jobs_post",
+ "requestBody": {
+ "content": {
+ "application/json": {
+ "schema": {
+ "$ref": "#/components/schemas/CreateSFTTrainingJob"
+ }
+ }
+ },
+ "required": true
+ },
+ "responses": {
+ "200": {
+ "description": "Successful Response",
+ "content": {
+ "application/json": {
+ "schema": {
+ "$ref": "#/components/schemas/TrainingJobResponse"
+ }
+ }
+ }
+ },
+ "422": {
+ "description": "Validation Error",
+ "content": {
+ "application/json": {
+ "schema": {
+ "$ref": "#/components/schemas/HTTPValidationError"
+ }
+ }
+ }
+ }
+ },
+ "security": [
+ {
+ "HTTPBearer": []
+ }
+ ]
+ }
+ },
"/v1/preview/training-jobs/{training_job_id}": {
"get": {
"tags": [
@@ -2671,6 +2718,49 @@
"type": "object",
"title": "Content"
},
+ "CreateSFTTrainingJob": {
+ "properties": {
+ "model_id": {
+ "type": "string",
+ "format": "uuid",
+ "title": "Model Id"
+ },
+ "training_data_url": {
+ "type": "string",
+ "title": "Training Data Url",
+ "description": "W&B artifact path for training data (e.g., 'wandb-artifact:///entity/project/artifact-name:version')"
+ },
+ "config": {
+ "anyOf": [
+ {
+ "$ref": "#/components/schemas/SFTTrainingConfig"
+ },
+ {
+ "type": "null"
+ }
+ ]
+ },
+ "experimental_config": {
+ "anyOf": [
+ {
+ "additionalProperties": true,
+ "type": "object"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "title": "Experimental Config"
+ }
+ },
+ "type": "object",
+ "required": [
+ "model_id",
+ "training_data_url"
+ ],
+ "title": "CreateSFTTrainingJob",
+ "description": "Schema for creating a new SFT (Supervised Fine-Tuning) TrainingJob.\n\nThe client should upload the training data (trajectories.jsonl and metadata.json)\nto W&B Artifacts and provide the artifact URL."
+ },
"CreateTrainingJob": {
"properties": {
"model_id": {
@@ -3734,6 +3824,17 @@
"base_model": {
"type": "string",
"title": "Base Model"
+ },
+ "run_id": {
+ "anyOf": [
+ {
+ "type": "string"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "title": "Run Id"
}
},
"type": "object",
@@ -3888,6 +3989,45 @@
"title": "Role",
"description": "The role of a message author (mirrors ``chat::Role``)."
},
+ "SFTTrainingConfig": {
+ "properties": {
+ "batch_size": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "string",
+ "const": "auto"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "title": "Batch Size"
+ },
+ "learning_rate": {
+ "anyOf": [
+ {
+ "type": "number"
+ },
+ {
+ "items": {
+ "type": "number"
+ },
+ "type": "array"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "title": "Learning Rate"
+ }
+ },
+ "type": "object",
+ "title": "SFTTrainingConfig",
+ "description": "Schema for SFT training config."
+ },
"StreamOptions": {
"properties": {
"include_usage": {
@@ -4223,6 +4363,28 @@
"type": "number",
"title": "Reward"
},
+ "initial_policy_version": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "title": "Initial Policy Version"
+ },
+ "final_policy_version": {
+ "anyOf": [
+ {
+ "type": "integer"
+ },
+ {
+ "type": "null"
+ }
+ ],
+ "title": "Final Policy Version"
+ },
"metrics": {
"additionalProperties": {
"anyOf": [
@@ -4448,4 +4610,4 @@
}
}
}
-}
+}
\ No newline at end of file
diff --git a/training/serverless-rl/available-models.mdx b/training/details/available-models.mdx
similarity index 100%
rename from training/serverless-rl/available-models.mdx
rename to training/details/available-models.mdx
diff --git a/training/details/pricing.mdx b/training/details/pricing.mdx
new file mode 100644
index 0000000000..b1a55adf9e
--- /dev/null
+++ b/training/details/pricing.mdx
@@ -0,0 +1,4 @@
+---
+title: "W&B Training pricing"
+url: "https://wandb.ai/site/pricing/reinforcement-learning"
+---
diff --git a/training/details/usage-limits.mdx b/training/details/usage-limits.mdx
new file mode 100644
index 0000000000..b5e4e0d2eb
--- /dev/null
+++ b/training/details/usage-limits.mdx
@@ -0,0 +1,11 @@
+---
+title: Limits
+description: Understand pricing, usage limits, and account restrictions for W&B Serverless RL
+---
+
+
+## Limits
+
+* **Inference concurrency limits**: By default, Serverless RL currently supports up to 2000 concurrent requests per user and 6000 per project. If you exceed your rate limit, the Inference API returns a `429 Concurrency limit reached for requests` response. To avoid this error, reduce the number of concurrent requests your training job or production workload makes at once. If you need a higher rate limit, you can request one at support@wandb.com.
+
+* **Geographic restrictions**: Serverless RL is only available in supported geographic locations. For more information, see the [Terms of Service](https://site.wandb.ai/terms/).
diff --git a/training/prerequisites.mdx b/training/getting-started/prerequisites.mdx
similarity index 95%
rename from training/prerequisites.mdx
rename to training/getting-started/prerequisites.mdx
index f365424705..440daacdb3 100644
--- a/training/prerequisites.mdx
+++ b/training/getting-started/prerequisites.mdx
@@ -1,5 +1,5 @@
---
-title: Prerequisites
+title: Set up your environment
description: Set up your environment to use W&B Training
---
@@ -26,4 +26,4 @@ Create a project in your W&B account to track usage, record training metrics, an
After completing the prerequisites:
* Check the [API reference](/training/api-reference) to learn about available endpoints
-* Try the [ART quickstart](https://art.openpipe.ai/getting-started/quick-start)
+* Try the [ART quickstart](https://art.openpipe.ai/getting-started/quick-start)
\ No newline at end of file
diff --git a/training/serverless-rl/serverless-rl.mdx b/training/guides/serverless-rl.mdx
similarity index 95%
rename from training/serverless-rl/serverless-rl.mdx
rename to training/guides/serverless-rl.mdx
index d4550de92e..13775463a9 100644
--- a/training/serverless-rl/serverless-rl.mdx
+++ b/training/guides/serverless-rl.mdx
@@ -1,5 +1,5 @@
---
-title: How to use Serverless RL
+title: Use Serverless RL
---
Serverless RL is supported through [OpenPipe's ART framework](https://art.openpipe.ai/getting-started/about) and the [W&B Training API](/training/api-reference).
diff --git a/training/guides/sft-training.mdx b/training/guides/sft-training.mdx
new file mode 100644
index 0000000000..e8206fd5de
--- /dev/null
+++ b/training/guides/sft-training.mdx
@@ -0,0 +1,8 @@
+---
+title: Use Serverless SFT
+---
+
+Serverless SFT is supported through [OpenPipe's ART framework](https://art.openpipe.ai/getting-started/about) and the [W&B Training API](/training/api-reference).
+
+To start using Serverless SFT, satisfy the [prerequisites](/training/prerequisites) to use W&B tools, and then go through the ART [Serverless SFT docs](https://art.openpipe.ai/fundamentals/sft-training).
+- To learn about Serverless SFT's API endpoints, see the [W&B Training API reference](/training/api-reference).
diff --git a/training/serverless-rl/use-trained-models.mdx b/training/guides/use-trained-models.mdx
similarity index 98%
rename from training/serverless-rl/use-trained-models.mdx
rename to training/guides/use-trained-models.mdx
index 5482136dd6..74e088a411 100644
--- a/training/serverless-rl/use-trained-models.mdx
+++ b/training/guides/use-trained-models.mdx
@@ -1,6 +1,6 @@
---
-title: Use your trained models
-description: Make inference requests to the models you've trained
+title: Inference trained models
+description: Make inference requests to the models you've trained.
---
After training a model with Serverless RL, it is automatically available for inference.
diff --git a/training/serverless-rl.mdx b/training/serverless-rl.mdx
deleted file mode 100644
index 41775df24c..0000000000
--- a/training/serverless-rl.mdx
+++ /dev/null
@@ -1,35 +0,0 @@
----
-title: Serverless RL
-description: Learn about how to more efficiently post-train your models using reinforcement learning
----
-
-Now in public preview, Serverless RL helps developers post-train LLMs to learn new behaviors and improve reliability, speed, and costs when performing multi-turn agentic tasks. W&B provision the training infrastructure ([on CoreWeave](https://docs.coreweave.com/docs/platform)) for you while allowing full flexibility in your environment's setup. Serverless RL gives you instant access to a managed training cluster that elastically auto-scales to dozens of GPUs. By splitting RL workflows into inference and training phases and multiplexing them across jobs, Serverless RL increases GPU utilization and reduces your training time and costs.
-
-Serverless RL is ideal for tasks like:
-* Voice agents
-* Deep research assistants
-* On-prem models
-* Content marketing analysis agents
-
-Serverless RL trains low-rank adapters (LoRAs) to specialize a model for your agent's specific task. This extends the original model's capabilities with on-the-job experience. The LoRAs you train are automatically stored as artifacts in your W&B account, and can be saved locally or to a third party for backup. Models that you train through Serverless RL are also automatically hosted on W&B Inference.
-
-See the ART [quickstart](https://art.openpipe.ai/getting-started/quick-start) or [Google Colab notebook](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb) to get started.
-
-## Why Serverless RL?
-
-Reinforcement learning (RL) is a set of powerful training techniques that you can use in many kinds of training setups, including on GPUs that you own or rent directly. Serverless RL can provide the following advantages in your RL post-training:
-
-* **Lower training costs**: By multiplexing shared infrastructure across many users, skipping the setup process for each job, and scaling your GPU costs down to 0 when you're not actively training, Serverless RL reduces training costs significantly.
-* **Faster training time**: By splitting inference requests across many GPUs and immediately provisioning training infrastructure when you need it, Serverless RL speeds up your training jobs and lets you iterate faster.
-* **Automatic deployment**: Serverless RL automatically deploys every checkpoint you train, eliminating the need to manually set up hosting infrastructure. Trained models can be accessed and tested immediately in local, staging, or production environments.
-
-## How Serverless RL uses W&B services
-
-Serverless RL uses a combination of the following W&B components to operate:
-
-* [Inference](/inference): To run your models
-* [Models](/models): To track performance metrics during the LoRA adapter's training
-* [Artifacts](/models/artifacts): To store and version the LoRA adapters
-* [Weave (optional)](/weave): To gain observability into how the model responds at each step of the training loop
-
-Serverless RL is in public preview. During the preview, you are charged only for the use of inference and the storage of artifacts. W&B does not charge for adapter training during the preview period.
diff --git a/training/serverless-rl/usage-limits.mdx b/training/serverless-rl/usage-limits.mdx
deleted file mode 100644
index 10cf6cf051..0000000000
--- a/training/serverless-rl/usage-limits.mdx
+++ /dev/null
@@ -1,28 +0,0 @@
----
-title: Usage information and limits
-description: Understand pricing, usage limits, and account restrictions for W&B Serverless RL
----
-
-## Pricing
-
-Pricing has three components: inference, training, and storage. For specific billing rates, visit our [pricing page](https://wandb.ai/site/pricing/reinforcement-learning).
-
-### Inference
-
-Pricing for Serverless RL inference requests matches W&B Inference pricing. See [model-specific costs](https://site.wandb.ai/pricing/reinforcement-learning) for more details. Learn more about purchasing credits, account tiers, and usage caps in the [W&B Inference docs](/inference/usage-limits#purchase-more-credits).
-
-### Training
-
-At each training step, Serverless RL collects batches of trajectories that include your agent's outputs and associated rewards (calculated by your reward function). The batched trajectories are then used to update the weights of a LoRA adapter that specializes a base model for your task. The training jobs to update these LoRAs run on dedicated GPU clusters managed by Serverless RL.
-
-Training is free during the public preview period.
-
-### Model storage
-
-Serverless RL stores checkpoints of your trained LoRAs so you can evaluate, serve, or continue training them at any time. Storage is billed monthly based on total checkpoint size and your [pricing plan](https://wandb.ai/site/pricing). Every plan includes at least 5GB of free storage, which is enough for roughly 30 LoRAs. We recommend deleting low-performing LoRAs to save space. See the [ART SDK](https://art.openpipe.ai/features/checkpoint-deletion) for instructions on how to do this.
-
-## Limits
-
-* **Inference concurrency limits**: By default, Serverless RL currently supports up to 2000 concurrent requests per user and 6000 per project. If you exceed your rate limit, the Inference API returns a `429 Concurrency limit reached for requests` response. To avoid this error, reduce the number of concurrent requests your training job or production workload makes at once. If you need a higher rate limit, you can request one at support@wandb.com.
-
-* **Geographic restrictions**: Serverless RL is only available in supported geographic locations. For more information, see the [Terms of Service](https://site.wandb.ai/terms/).
diff --git a/training/what-is-training.mdx b/training/what-is-training.mdx
new file mode 100644
index 0000000000..6ad10e2b46
--- /dev/null
+++ b/training/what-is-training.mdx
@@ -0,0 +1,59 @@
+---
+title: What is W&B Training?
+description: Serverless post-training for LLMs using reinforcement learning and supervised fine-tuning.
+---
+
+W&B Training is a serverless post-training service for large language models (LLMs). W&B provisions the training infrastructure ([on CoreWeave](https://docs.coreweave.com/docs/platform)) for you while allowing full flexibility in your environment's setup, giving you instant access to a managed training cluster that elastically auto-scales to dozens of GPUs.
+
+W&B Training trains low-rank adapters (LoRAs) to specialize a foundation model for your specific task. The LoRAs you train are automatically stored as [artifacts](/models/artifacts) in your W&B account, and can be saved locally or to a third party for backup. Trained models are also automatically hosted on [W&B Inference](/inference).
+
+W&B Training offers two post-training methods:
+
+* **[Serverless RL](/training/guides/serverless-rl)**: Post-train models with reinforcement learning to learn new behaviors and improve reliability, speed, and costs when performing multi-turn agentic tasks.
+* **[Serverless SFT](/training/guides/sft-training)**: Fine-tune models with supervised learning on curated datasets for distillation, teaching output style and format, or warming up before RL.
+
+
+W&B Training is in public preview. During the preview, you are charged only for the use of inference and the storage of artifacts. W&B does not charge for adapter training during the preview period. See [Usage information and limits](/training/details/usage-limits) for details.
+
+
+## Why W&B Training?
+
+Setting up your own training infrastructure requires provisioning GPUs, configuring clusters, and managing deployment pipelines. W&B Training eliminates this overhead by providing a fully managed backend. Both Serverless RL and Serverless SFT share the following advantages:
+
+* **Lower training costs**: By multiplexing shared infrastructure across many users, skipping the setup process for each job, and scaling your GPU costs down to zero when you're not actively training, W&B Training reduces training costs significantly.
+* **Faster training time**: By immediately provisioning training infrastructure when you need it, W&B Training speeds up your training jobs and lets you iterate faster. Serverless RL further optimizes throughput by splitting inference requests across many GPUs.
+* **Automatic deployment**: W&B Training automatically deploys every checkpoint you train, eliminating the need to manually set up hosting infrastructure. Trained models can be accessed and tested immediately in local, staging, or production environments.
+
+## Serverless RL
+
+Reinforcement learning (RL) is a training technique where models learn to improve their behavior through feedback on their outputs. Serverless RL splits RL workflows into inference and training phases and multiplexes them across jobs, increasing GPU utilization and reducing your training time and costs.
+
+Serverless RL is ideal for tasks like:
+
+* Voice agents
+* Deep research assistants
+* On-prem models
+* Content marketing analysis agents
+
+To get started with Serverless RL, see the [How to use Serverless RL](/training/guides/serverless-rl) guide, the ART [quickstart](https://art.openpipe.ai/getting-started/quick-start), or the [Google Colab notebook](https://colab.research.google.com/github/openpipe/art-notebooks/blob/main/examples/2048/2048.ipynb).
+
+## Serverless SFT
+
+Supervised fine-tuning (SFT) is a training technique where a model learns from curated input-output examples. Serverless SFT gives you instant access to a managed training cluster that elastically auto-scales to handle your training workloads.
+
+Serverless SFT is ideal for tasks like:
+
+* **Distillation**: Transferring knowledge from a larger, more capable model into a smaller, faster one.
+* **Teaching output style and format**: Training a model to follow specific response formats, tone, or structure.
+* **Warmup before RL**: Pre-training a model with supervised examples before applying reinforcement learning for further refinement.
+
+To get started with Serverless SFT, see the [How to use Serverless SFT](/training/guides/sft-training) guide or the ART [Serverless SFT docs](https://art.openpipe.ai/fundamentals/sft-training).
+
+## How W&B Training uses W&B services
+
+W&B Training uses a combination of the following W&B components to operate:
+
+* [Inference](/inference): To run your models.
+* [Models](/models): To track performance metrics during the LoRA adapter's training.
+* [Artifacts](/models/artifacts): To store and version the LoRA adapters.
+* [Weave (optional)](/weave): To gain observability into how the model responds at each step of the training loop.