wandb · dbrian57 · Feb 18, 2026 · Feb 18, 2026 · Feb 18, 2026 · Feb 19, 2026
@@ -1107,14 +1107,27 @@
             "icon": "/icons/cropped-training.svg",
             "pages": [
               "training",
-              "training/prerequisites",
               {
-                "group": "Serverless RL",
+                "group": "Getting Started",
+                "pages": [
+                  "training/what-is-training",
+                  "training/getting-started/prerequisites"
+                ]
+              },
+              {
+                "group": "Guides",
+                "pages": [
+                  "training/guides/serverless-rl",
+                  "training/guides/sft-training",
+                  "training/guides/use-trained-models"
+                ]
+              },
+              {
+                "group": "Details",
                 "pages": [
-                  "training/serverless-rl",
-                  "training/serverless-rl/available-models",
-                  "training/serverless-rl/usage-limits",
-                  "training/serverless-rl/use-trained-models"
+                  "training/details/pricing",
+                  "training/details/available-models",
+                  "training/details/usage-limits"
                 ]
               },
               {

@@ -1,15 +1,18 @@
 ---
 title: W&B Training
-description: Post-train your models using reinforcement learning
+description: Post-train your models using reinforcement learning and supervised fine-tuning
 mode: wide
 ---
 
-Now in public preview, W&B Training offers serverless reinforcement learning (RL) for post-training large language models (LLMs) to improve their reliability performing multi-turn, agentic tasks while also increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs. 
+Now in public preview, W&B Training offers serverless post-training for large language models (LLMs), including both reinforcement learning (RL) and supervised fine-tuning (SFT).
+
+* **[Serverless RL](/training/serverless-rl)**: Improve model reliability performing multi-turn, agentic tasks while increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs.
+* **[Serverless SFT](/training/sft-training)**: Fine-tune models using curated datasets for distillation, teaching output style and format, or warming up before RL.
 
 W&B Training includes integration with:
 
-* [ART](https://art.openpipe.ai/getting-started/about), a flexible RL fine-tuning framework.
-* [RULER](https://openpipe.ai/blog/ruler), a universal verifier. 
+* [ART](https://art.openpipe.ai/getting-started/about), a flexible fine-tuning framework.
+* [RULER](https://openpipe.ai/blog/ruler), a universal verifier.
 * A fully-managed backend on [CoreWeave Cloud](https://docs.coreweave.com/docs/platform).
 
-To get started, satisfy the [prerequisites](/training/prerequisites) to start using the service and then see [OpenPipe's Serverless RL quickstart](https://art.openpipe.ai/getting-started/quick-start) to learn how to post-train your models.
+To get started, satisfy the [prerequisites](/training/prerequisites) to start using the service and then see the [Serverless RL quickstart](https://art.openpipe.ai/getting-started/quick-start) or the [Serverless SFT docs](https://art.openpipe.ai/fundamentals/sft-training) to learn how to post-train your models.
@@ -4,7 +4,7 @@ description: Complete API documentation for W&B Training
 ---
 
 <Note>
-The W&B Training API provides endpoints for managing and interacting with serverless reinforcement learning training jobs. The API is OpenAI-compatible for chat completions.
+The W&B Training API provides endpoints for managing and interacting with training jobs, including serverless reinforcement learning (RL) and supervised fine-tuning (SFT). The API is OpenAI-compatible for chat completions.
 </Note>
 
 ## Authentication
@@ -41,7 +41,8 @@ https://api.training.wandb.ai/v1
 
 ### training-jobs
 
-- **[POST /v1/preview/training-jobs](https://docs.wandb.ai/training/api-reference/training-jobs/create-training-job)** - Create Training Job
+- **[POST /v1/preview/sft-training-jobs](https://docs.wandb.ai/training/api-reference/training-jobs/create-sft-training-job)** - Create SFT Training Job
+- **[POST /v1/preview/training-jobs](https://docs.wandb.ai/training/api-reference/training-jobs/create-rl-training-job)** - Create RL Training Job
 - **[GET /v1/preview/training-jobs/{training_job_id}](https://docs.wandb.ai/training/api-reference/training-jobs/get-training-job)** - Get Training Job
 - **[GET /v1/preview/training-jobs/{training_job_id}/events](https://docs.wandb.ai/training/api-reference/training-jobs/get-training-job-events)** - Get Training Job Events
 
@@ -54,6 +55,7 @@ https://api.training.wandb.ai/v1
 
 - [W&B Training overview](/training)
 - [Prerequisites](/training/prerequisites)
+- [Serverless SFT](/training/sft-training)
 - [Use your trained models](/training/serverless-rl/use-trained-models)
 - [Available models](/training/serverless-rl/available-models)
 - [Usage limits](/training/serverless-rl/usage-limits)
@@ -399,7 +399,7 @@
         "tags": [
           "training-jobs"
         ],
-        "summary": "Create Training Job",
+        "summary": "Create RL Training Job",
         "operationId": "create_training_job_v1_preview_training_jobs_post",
         "requestBody": {
           "content": {
@@ -440,6 +440,53 @@
         ]
       }
     },
+    "/v1/preview/sft-training-jobs": {
+      "post": {
+        "tags": [
+          "training-jobs"
+        ],
+        "summary": "Create SFT Training Job",
+        "description": "Create a new SFT (Supervised Fine-Tuning) training job.",
+        "operationId": "create_sft_training_job_v1_preview_sft_training_jobs_post",
+        "requestBody": {
+          "content": {
+            "application/json": {
+              "schema": {
+                "$ref": "#/components/schemas/CreateSFTTrainingJob"
+              }
+            }
+          },
+          "required": true
+        },
+        "responses": {
+          "200": {
+            "description": "Successful Response",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/TrainingJobResponse"
+                }
+              }
+            }
+          },
+          "422": {
+            "description": "Validation Error",
+            "content": {
+              "application/json": {
+                "schema": {
+                  "$ref": "#/components/schemas/HTTPValidationError"
+                }
+              }
+            }
+          }
+        },
+        "security": [
+          {
+            "HTTPBearer": []
+          }
+        ]
+      }
+    },
     "/v1/preview/training-jobs/{training_job_id}": {
       "get": {
         "tags": [
@@ -2671,6 +2718,49 @@
         "type": "object",
         "title": "Content"
       },
+      "CreateSFTTrainingJob": {
+        "properties": {
+          "model_id": {
+            "type": "string",
+            "format": "uuid",
+            "title": "Model Id"
+          },
+          "training_data_url": {
+            "type": "string",
+            "title": "Training Data Url",
+            "description": "W&B artifact path for training data (e.g., 'wandb-artifact:///entity/project/artifact-name:version')"
+          },
+          "config": {
+            "anyOf": [
+              {
+                "$ref": "#/components/schemas/SFTTrainingConfig"
+              },
+              {
+                "type": "null"
+              }
+            ]
+          },
+          "experimental_config": {
+            "anyOf": [
+              {
+                "additionalProperties": true,
+                "type": "object"
+              },
+              {
+                "type": "null"
+              }
+            ],
+            "title": "Experimental Config"
+          }
+        },
+        "type": "object",
+        "required": [
+          "model_id",
+          "training_data_url"
+        ],
+        "title": "CreateSFTTrainingJob",
+        "description": "Schema for creating a new SFT (Supervised Fine-Tuning) TrainingJob.\n\nThe client should upload the training data (trajectories.jsonl and metadata.json)\nto W&B Artifacts and provide the artifact URL."
+      },
       "CreateTrainingJob": {
         "properties": {
           "model_id": {
@@ -3734,6 +3824,17 @@
           "base_model": {
             "type": "string",
             "title": "Base Model"
+          },
+          "run_id": {
+            "anyOf": [
+              {
+                "type": "string"
+              },
+              {
+                "type": "null"
+              }
+            ],
+            "title": "Run Id"
           }
         },
         "type": "object",
@@ -3888,6 +3989,45 @@
         "title": "Role",
         "description": "The role of a message author (mirrors ``chat::Role``)."
       },
+      "SFTTrainingConfig": {
+        "properties": {
+          "batch_size": {
+            "anyOf": [
+              {
+                "type": "integer"
+              },
+              {
+                "type": "string",
+                "const": "auto"
+              },
+              {
+                "type": "null"
+              }
+            ],
+            "title": "Batch Size"
+          },
+          "learning_rate": {
+            "anyOf": [
+              {
+                "type": "number"
+              },
+              {
+                "items": {
+                  "type": "number"
+                },
+                "type": "array"
+              },
+              {
+                "type": "null"
+              }
+            ],
+            "title": "Learning Rate"
+          }
+        },
+        "type": "object",
+        "title": "SFTTrainingConfig",
+        "description": "Schema for SFT training config."
+      },
       "StreamOptions": {
         "properties": {
           "include_usage": {
@@ -4223,6 +4363,28 @@
             "type": "number",
             "title": "Reward"
           },
+          "initial_policy_version": {
+            "anyOf": [
+              {
+                "type": "integer"
+              },
+              {
+                "type": "null"
+              }
+            ],
+            "title": "Initial Policy Version"
+          },
+          "final_policy_version": {
+            "anyOf": [
+              {
+                "type": "integer"
+              },
+              {
+                "type": "null"
+              }
+            ],
+            "title": "Final Policy Version"
+          },
           "metrics": {
             "additionalProperties": {
               "anyOf": [
@@ -4448,4 +4610,4 @@
       }
     }
   }
-}
+}
@@ -0,0 +1,4 @@
+---
+title: "W&B Training pricing"
+url: "https://wandb.ai/site/pricing/reinforcement-learning"
+---
@@ -0,0 +1,11 @@
+---
+title: Limits
+description: Understand pricing, usage limits, and account restrictions for W&B Serverless RL
+---
+
+
+## Limits
+
+* **Inference concurrency limits**: By default, Serverless RL currently supports up to 2000 concurrent requests per user and 6000 per project. If you exceed your rate limit, the Inference API returns a `429 Concurrency limit reached for requests` response. To avoid this error, reduce the number of concurrent requests your training job or production workload makes at once. If you need a higher rate limit, you can request one at support@wandb.com.
+
+* **Geographic restrictions**: Serverless RL is only available in supported geographic locations. For more information, see the [Terms of Service](https://site.wandb.ai/terms/).
@@ -1,5 +1,5 @@
 ---
-title: Prerequisites
+title: Set up your environment
 description: Set up your environment to use W&B Training
 ---
 
@@ -26,4 +26,4 @@ Create a project in your W&B account to track usage, record training metrics, an
 After completing the prerequisites:
 
 * Check the [API reference](/training/api-reference) to learn about available endpoints
-* Try the [ART quickstart](https://art.openpipe.ai/getting-started/quick-start)
+* Try the [ART quickstart](https://art.openpipe.ai/getting-started/quick-start)
@@ -1,5 +1,5 @@
 ---
-title: How to use Serverless RL
+title: Use Serverless RL
 ---
 
 Serverless RL is supported through [OpenPipe's ART framework](https://art.openpipe.ai/getting-started/about) and the [W&B Training API](/training/api-reference). 

@@ -0,0 +1,8 @@
+---
+title: Use Serverless SFT
+---
+
+Serverless SFT is supported through [OpenPipe's ART framework](https://art.openpipe.ai/getting-started/about) and the [W&B Training API](/training/api-reference).
+
+To start using Serverless SFT, satisfy the [prerequisites](/training/prerequisites) to use W&B tools, and then go through the ART [Serverless SFT docs](https://art.openpipe.ai/fundamentals/sft-training).
+- To learn about Serverless SFT's API endpoints, see the [W&B Training API reference](/training/api-reference).
@@ -1,6 +1,6 @@
 ---
-title: Use your trained models
-description: Make inference requests to the models you've trained
+title: Inference trained models
+description: Make inference requests to the models you've trained.
 ---
 
 After training a model with Serverless RL, it is automatically available for inference.