-
Notifications
You must be signed in to change notification settings - Fork 12
docs(pricing-may-2026): customer-supplied inference (BYOK + CIE + BYOLLM) #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
hongyi-chen
merged 11 commits into
hyc/plan-updates
from
hyc/plan-updates-byok-cie-byollm
May 21, 2026
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
4fcbfed
docs(pricing-may-2026): customer-supplied inference (BYOK + CIE + BYO…
hongyi-chen 86818df
docs(pricing-may-2026): note July 1 self-serve preview period in BYOK…
hongyi-chen d3c5b54
docs(pricing-may-2026): correct launch date May 14 \u2192 May 21, 2026
hongyi-chen 5bf93db
docs(pricing-may-2026): revert BYOLLM page to main, keep changes minimal
hongyi-chen 7e7d0d0
docs(pricing-may-2026): drop CIE abbreviation, narrow credit claims, …
hongyi-chen 504efc4
docs(pricing-may-2026): de-emphasize billing in BYOK + custom inferen…
hongyi-chen 4bff48e
docs(pricing-may-2026): restore original BYOK opening, narrow only th…
hongyi-chen 97590c9
docs(pricing-may-2026): soften absolute 'no AI credits / 0 credits' c…
hongyi-chen 14ab6e2
docs(pricing-may-2026): reframe custom inference endpoint intro to le…
hongyi-chen 7ee9219
Cleanup pass: BYOK acronym + BYOLLM table scope
hongyi-chen 3501639
PR #115 review: address Tyler's comments
hongyi-chen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
112 changes: 112 additions & 0 deletions
112
...tent/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| --- | ||
| title: Custom inference endpoint | ||
| description: >- | ||
| Connect Warp's agents to any OpenAI-compatible inference endpoint — | ||
| OpenRouter, LiteLLM, z.ai, or an internal gateway you already run. | ||
| --- | ||
|
|
||
| Warp supports **custom inference endpoints** for users who want to power Warp's agents with any OpenAI-compatible inference endpoint — a model router, hosted gateway, or internal infrastructure they already run. | ||
|
|
||
| This lets you route AI requests through your preferred provider, run inference behind your own gateway, or use a router like OpenRouter or LiteLLM, while keeping the agent experience inside Warp. | ||
|
|
||
| :::note | ||
| Custom inference endpoints are available on Free and all eligible paid plans for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Larger organizations need a Business or Enterprise plan. See [warp.dev/pricing](https://www.warp.dev/pricing) for current availability. | ||
| ::: | ||
|
|
||
| ## Key features | ||
|
|
||
| * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. | ||
| * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. | ||
| * **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure. | ||
| * **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud. | ||
|
|
||
| ## How it works | ||
|
|
||
| A custom inference endpoint expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a target: | ||
|
|
||
| * **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing. | ||
| * **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers. | ||
| * **z.ai** - A model provider with an OpenAI-compatible API surface for its models. | ||
| * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). | ||
|
|
||
| When you configure a custom inference endpoint, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers. | ||
|
|
||
| :::caution | ||
| Custom inference endpoints don't apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because the configuration is stored locally, it isn't available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). | ||
| ::: | ||
|
|
||
| When a model routed through your endpoint is selected: | ||
|
|
||
| * Warp **doesn't consume** your [AI credits](/support-and-community/plans-and-billing/credits/) for that request. | ||
| * Costs are billed directly by your endpoint provider. | ||
| * Warp doesn't retain or store your endpoint credentials on any of its servers. | ||
|
|
||
| ## Enabling a custom inference endpoint | ||
|
|
||
| To enable and configure a custom inference endpoint: | ||
|
|
||
| 1. In Warp, open **Settings** and search for `inference endpoint` to jump to the configuration. | ||
| 2. Add your endpoint URL (the base URL that exposes `/v1/chat/completions`) and any required credentials (typically an API key). | ||
| 3. Specify the model identifier(s) you want to route through this endpoint. | ||
| 4. Save the configuration. Once added, you'll see your custom models appear in the model picker. | ||
|
|
||
| When you explicitly select an endpoint-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's AI credits. | ||
|
|
||
| The configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK. | ||
|
|
||
| ## Billing behavior | ||
|
|
||
| ### Warp AI credits | ||
|
|
||
| When you select an endpoint-routed model from the model picker, inference is billed directly by your endpoint provider, according to their pricing, rather than drawing from your Warp AI credits. | ||
|
|
||
| :::note | ||
| On Business and Enterprise plans, local agent runs that route through a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. | ||
| ::: | ||
|
|
||
| ### Auto routing still uses Warp credits | ||
|
|
||
| Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint. | ||
|
|
||
| To use your endpoint, select the specific endpoint-routed model from the model picker rather than an Auto option. | ||
|
|
||
| ### Other AI features in Warp | ||
|
|
||
| Some AI-powered features rely on Warp's infrastructure and are unaffected by a custom inference endpoint. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details. | ||
|
|
||
| ## Zero Data Retention (ZDR) | ||
|
|
||
| Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers. | ||
|
|
||
| When you use a custom inference endpoint: | ||
|
|
||
| * Data retention is determined by **your endpoint provider** and any upstream model providers they route to. | ||
| * Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint. | ||
| * If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms. | ||
|
|
||
| Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint. | ||
|
|
||
| ## Centrally managed configuration | ||
|
|
||
| Custom inference endpoints are configured at the **user level** on every plan. Each user adds their own endpoint locally; centrally configured, admin-managed endpoints for teams are not yet available. | ||
|
|
||
| Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/). | ||
|
|
||
| ## How custom inference endpoints differ from BYOK and BYOLLM | ||
|
|
||
| Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details. | ||
|
|
||
| | Name | Meaning | Plans | | ||
| | --- | --- | --- | | ||
| | **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans | | ||
| | **Custom inference endpoint** | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | | ||
| | **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock today; Azure Foundry and Google Vertex coming soon), with Warp handling routing, orchestration, governance, and observability. | Enterprise only | | ||
|
|
||
| Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, a custom inference endpoint, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/). | ||
|
|
||
| ## Related resources | ||
|
|
||
| * [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — Use your own OpenAI, Anthropic, or Google API keys. | ||
| * [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure. | ||
| * [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values. | ||
| * [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -142,11 +142,11 @@ export const sidebarTopics: StarlightSidebarTopicsUserConfig = [ | |
| { | ||
| label: 'Settings file', | ||
| collapsed: true, | ||
| items: [ | ||
| { slug: 'terminal/settings', label: 'Overview' }, | ||
| { slug: 'terminal/settings/all-settings', label: 'All settings reference' }, | ||
| { slug: 'terminal/settings/file-locations', label: 'File locations' }, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| ], | ||
| items: [ | ||
| { slug: 'terminal/settings', label: 'Overview' }, | ||
| { slug: 'terminal/settings/all-settings', label: 'All settings reference' }, | ||
| { slug: 'terminal/settings/file-locations', label: 'File locations' }, | ||
| ], | ||
| }, | ||
| { | ||
| label: 'Warpify overview', | ||
|
|
@@ -540,6 +540,7 @@ export const sidebarTopics: StarlightSidebarTopicsUserConfig = [ | |
| 'support-and-community/plans-and-billing/add-on-credits', | ||
| { slug: 'support-and-community/plans-and-billing/platform-credits', label: 'Platform credits' }, | ||
| 'support-and-community/plans-and-billing/bring-your-own-api-key', | ||
| 'support-and-community/plans-and-billing/custom-inference-endpoint', | ||
| 'support-and-community/plans-and-billing/overages-legacy', | ||
| 'support-and-community/plans-and-billing/pricing-faqs', | ||
| ], | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda prefer "Larger organizations need a Business or Enterprise plan" to "Larger organizations require a Business or Enterprise plan" but that's just me