Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 19 additions & 6 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -1107,14 +1107,27 @@
"icon": "/icons/cropped-training.svg",
"pages": [
"training",
"training/prerequisites",
{
"group": "Serverless RL",
"group": "Getting Started",
"pages": [
"training/what-is-training",
"training/getting-started/prerequisites"
]
},
{
"group": "Guides",
"pages": [
"training/guides/serverless-rl",
"training/guides/sft-training",
"training/guides/use-trained-models"
]
},
{
"group": "Details",
"pages": [
"training/serverless-rl",
"training/serverless-rl/available-models",
"training/serverless-rl/usage-limits",
"training/serverless-rl/use-trained-models"
"training/details/pricing",
"training/details/available-models",
"training/details/usage-limits"
]
},
{
Expand Down
13 changes: 8 additions & 5 deletions training.mdx
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
---
title: W&B Training
description: Post-train your models using reinforcement learning
description: Post-train your models using reinforcement learning and supervised fine-tuning
mode: wide
---

Now in public preview, W&B Training offers serverless reinforcement learning (RL) for post-training large language models (LLMs) to improve their reliability performing multi-turn, agentic tasks while also increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs.
Now in public preview, W&B Training offers serverless post-training for large language models (LLMs), including both reinforcement learning (RL) and supervised fine-tuning (SFT).

* **[Serverless RL](/training/serverless-rl)**: Improve model reliability performing multi-turn, agentic tasks while increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs.
* **[Serverless SFT](/training/sft-training)**: Fine-tune models using curated datasets for distillation, teaching output style and format, or warming up before RL.

W&B Training includes integration with:

* [ART](https://art.openpipe.ai/getting-started/about), a flexible RL fine-tuning framework.
* [RULER](https://openpipe.ai/blog/ruler), a universal verifier.
* [ART](https://art.openpipe.ai/getting-started/about), a flexible fine-tuning framework.
* [RULER](https://openpipe.ai/blog/ruler), a universal verifier.
* A fully-managed backend on [CoreWeave Cloud](https://docs.coreweave.com/docs/platform).

To get started, satisfy the [prerequisites](/training/prerequisites) to start using the service and then see [OpenPipe's Serverless RL quickstart](https://art.openpipe.ai/getting-started/quick-start) to learn how to post-train your models.
To get started, satisfy the [prerequisites](/training/prerequisites) to start using the service and then see the [Serverless RL quickstart](https://art.openpipe.ai/getting-started/quick-start) or the [Serverless SFT docs](https://art.openpipe.ai/fundamentals/sft-training) to learn how to post-train your models.
6 changes: 4 additions & 2 deletions training/api-reference.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: Complete API documentation for W&B Training
---

<Note>
The W&B Training API provides endpoints for managing and interacting with serverless reinforcement learning training jobs. The API is OpenAI-compatible for chat completions.
The W&B Training API provides endpoints for managing and interacting with training jobs, including serverless reinforcement learning (RL) and supervised fine-tuning (SFT). The API is OpenAI-compatible for chat completions.
</Note>

## Authentication
Expand Down Expand Up @@ -41,7 +41,8 @@ https://api.training.wandb.ai/v1

### training-jobs

- **[POST /v1/preview/training-jobs](https://docs.wandb.ai/training/api-reference/training-jobs/create-training-job)** - Create Training Job
- **[POST /v1/preview/sft-training-jobs](https://docs.wandb.ai/training/api-reference/training-jobs/create-sft-training-job)** - Create SFT Training Job
- **[POST /v1/preview/training-jobs](https://docs.wandb.ai/training/api-reference/training-jobs/create-rl-training-job)** - Create RL Training Job
- **[GET /v1/preview/training-jobs/{training_job_id}](https://docs.wandb.ai/training/api-reference/training-jobs/get-training-job)** - Get Training Job
- **[GET /v1/preview/training-jobs/{training_job_id}/events](https://docs.wandb.ai/training/api-reference/training-jobs/get-training-job-events)** - Get Training Job Events

Expand All @@ -54,6 +55,7 @@ https://api.training.wandb.ai/v1

- [W&B Training overview](/training)
- [Prerequisites](/training/prerequisites)
- [Serverless SFT](/training/sft-training)
- [Use your trained models](/training/serverless-rl/use-trained-models)
- [Available models](/training/serverless-rl/available-models)
- [Usage limits](/training/serverless-rl/usage-limits)
166 changes: 164 additions & 2 deletions training/api-reference/openapi.json
Original file line number Diff line number Diff line change
Expand Up @@ -399,7 +399,7 @@
"tags": [
"training-jobs"
],
"summary": "Create Training Job",
"summary": "Create RL Training Job",
"operationId": "create_training_job_v1_preview_training_jobs_post",
"requestBody": {
"content": {
Expand Down Expand Up @@ -440,6 +440,53 @@
]
}
},
"/v1/preview/sft-training-jobs": {
"post": {
"tags": [
"training-jobs"
],
"summary": "Create SFT Training Job",
"description": "Create a new SFT (Supervised Fine-Tuning) training job.",
"operationId": "create_sft_training_job_v1_preview_sft_training_jobs_post",
"requestBody": {
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/CreateSFTTrainingJob"
}
}
},
"required": true
},
"responses": {
"200": {
"description": "Successful Response",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/TrainingJobResponse"
}
}
}
},
"422": {
"description": "Validation Error",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/HTTPValidationError"
}
}
}
}
},
"security": [
{
"HTTPBearer": []
}
]
}
},
"/v1/preview/training-jobs/{training_job_id}": {
"get": {
"tags": [
Expand Down Expand Up @@ -2671,6 +2718,49 @@
"type": "object",
"title": "Content"
},
"CreateSFTTrainingJob": {
"properties": {
"model_id": {
"type": "string",
"format": "uuid",
"title": "Model Id"
},
"training_data_url": {
"type": "string",
"title": "Training Data Url",
"description": "W&B artifact path for training data (e.g., 'wandb-artifact:///entity/project/artifact-name:version')"
},
"config": {
"anyOf": [
{
"$ref": "#/components/schemas/SFTTrainingConfig"
},
{
"type": "null"
}
]
},
"experimental_config": {
"anyOf": [
{
"additionalProperties": true,
"type": "object"
},
{
"type": "null"
}
],
"title": "Experimental Config"
}
},
"type": "object",
"required": [
"model_id",
"training_data_url"
],
"title": "CreateSFTTrainingJob",
"description": "Schema for creating a new SFT (Supervised Fine-Tuning) TrainingJob.\n\nThe client should upload the training data (trajectories.jsonl and metadata.json)\nto W&B Artifacts and provide the artifact URL."
},
"CreateTrainingJob": {
"properties": {
"model_id": {
Expand Down Expand Up @@ -3734,6 +3824,17 @@
"base_model": {
"type": "string",
"title": "Base Model"
},
"run_id": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"title": "Run Id"
}
},
"type": "object",
Expand Down Expand Up @@ -3888,6 +3989,45 @@
"title": "Role",
"description": "The role of a message author (mirrors ``chat::Role``)."
},
"SFTTrainingConfig": {
"properties": {
"batch_size": {
"anyOf": [
{
"type": "integer"
},
{
"type": "string",
"const": "auto"
},
{
"type": "null"
}
],
"title": "Batch Size"
},
"learning_rate": {
"anyOf": [
{
"type": "number"
},
{
"items": {
"type": "number"
},
"type": "array"
},
{
"type": "null"
}
],
"title": "Learning Rate"
}
},
"type": "object",
"title": "SFTTrainingConfig",
"description": "Schema for SFT training config."
},
"StreamOptions": {
"properties": {
"include_usage": {
Expand Down Expand Up @@ -4223,6 +4363,28 @@
"type": "number",
"title": "Reward"
},
"initial_policy_version": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"title": "Initial Policy Version"
},
"final_policy_version": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"title": "Final Policy Version"
},
"metrics": {
"additionalProperties": {
"anyOf": [
Expand Down Expand Up @@ -4448,4 +4610,4 @@
}
}
}
}
}
4 changes: 4 additions & 0 deletions training/details/pricing.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
title: "W&B Training pricing"
url: "https://wandb.ai/site/pricing/reinforcement-learning"
---
11 changes: 11 additions & 0 deletions training/details/usage-limits.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: Limits
description: Understand pricing, usage limits, and account restrictions for W&B Serverless RL
---


## Limits

* **Inference concurrency limits**: By default, Serverless RL currently supports up to 2000 concurrent requests per user and 6000 per project. If you exceed your rate limit, the Inference API returns a `429 Concurrency limit reached for requests` response. To avoid this error, reduce the number of concurrent requests your training job or production workload makes at once. If you need a higher rate limit, you can request one at support@wandb.com.

* **Geographic restrictions**: Serverless RL is only available in supported geographic locations. For more information, see the [Terms of Service](https://site.wandb.ai/terms/).
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Prerequisites
title: Set up your environment
description: Set up your environment to use W&B Training
---

Expand All @@ -26,4 +26,4 @@ Create a project in your W&B account to track usage, record training metrics, an
After completing the prerequisites:

* Check the [API reference](/training/api-reference) to learn about available endpoints
* Try the [ART quickstart](https://art.openpipe.ai/getting-started/quick-start)
* Try the [ART quickstart](https://art.openpipe.ai/getting-started/quick-start)
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: How to use Serverless RL
title: Use Serverless RL
---

Serverless RL is supported through [OpenPipe's ART framework](https://art.openpipe.ai/getting-started/about) and the [W&B Training API](/training/api-reference).
Expand Down
8 changes: 8 additions & 0 deletions training/guides/sft-training.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Use Serverless SFT
---

Serverless SFT is supported through [OpenPipe's ART framework](https://art.openpipe.ai/getting-started/about) and the [W&B Training API](/training/api-reference).

To start using Serverless SFT, satisfy the [prerequisites](/training/prerequisites) to use W&B tools, and then go through the ART [Serverless SFT docs](https://art.openpipe.ai/fundamentals/sft-training).
- To learn about Serverless SFT's API endpoints, see the [W&B Training API reference](/training/api-reference).
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Use your trained models
description: Make inference requests to the models you've trained
title: Inference trained models
description: Make inference requests to the models you've trained.
---

After training a model with Serverless RL, it is automatically available for inference.
Expand Down
35 changes: 0 additions & 35 deletions training/serverless-rl.mdx

This file was deleted.

Loading
Loading