warpdotdev · hongyi-chen · May 21, 2026 · May 19, 2026 · May 20, 2026 · May 20, 2026
diff --git a/...content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/...content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx
@@ -1,29 +1,39 @@
 ---
 title: Bring Your Own API Key
 description: >-
-  Warp's paid plans include the ability to bring your own API keys (BYOK) for
-  OpenAI, Anthropic, and Google AI models.
+  Warp lets you bring your own API keys (BYOK) for OpenAI, Anthropic, and
+  Google AI models.
 ---
 
-Warp supports **Bring Your Own Key (BYOK)** for users who want to connect Warp’s agent to their own Anthropic, OpenAI, or Google API accounts.
+Warp supports **Bring Your Own API Key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts.
 
 This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models.
 
-BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys.
+BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [AI credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys.
 
 :::note
-BYOK is currently only available on Warp's paid plans, starting with Build. Learn more about plans and pricing [warp.dev/pricing](https://www.warp.dev/pricing).
+BYOK is available on Free and all eligible paid plans for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Larger organizations need a Business or Enterprise plan. See [warp.dev/pricing](https://www.warp.dev/pricing) for current availability.
 :::
 
-:::caution
-BYOK and customer-supplied inference (BYOLLM via Amazon Bedrock or Google Vertex, plus custom endpoints) are available to individual users and organizations with 10 or fewer employees or users on any plan. Organizations with more than 10 employees or users must be on a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. See Warp's [Terms of Service](https://www.warp.dev/terms-of-service) for details.
-:::
+## How BYOK differs from Custom inference endpoint and BYOLLM
+
+Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details.
+
+| Name | Meaning | Plans |
+| --- | --- | --- |
+| **Bring your own API key** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans |
+| **[Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/)** | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans |
+| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock today; Azure Foundry and Google Vertex coming soon), with Warp handling routing, orchestration, governance, and observability. | Enterprise only |
+
+See [warp.dev/pricing](https://www.warp.dev/pricing) for current plan availability.
 
-## How does BYOK work?
+Platform credits apply to every cloud agent run on any plan, and to local agent runs on Business and Enterprise when using BYOK, a custom inference endpoint, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown.
+
+## How BYOK works
 
 When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**.
 
-Warp uses these API keys to directly route your agent requests to the model provider you've configured.
+Warp uses these API keys when routing your agent requests to the model provider you've configured.
 
 :::caution
 BYOK does not apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
@@ -57,21 +67,20 @@ When you explicitly select a model with a key icon, Warp routes requests through
 
 ### Auto Model
 
-Warp's **Auto** models dynamically route requests across different models based on context and performance. Because this routing logic depends on Warp’s infrastructure, **Auto always consumes Warp's credits**, even if you’ve configured your own API keys.
+Warp's **Auto** models dynamically route requests across different models based on context and performance. Because this routing logic depends on Warp's infrastructure, **Auto always consumes Warp's credits**, even if you've configured your own API keys.
 
-To use your own key, select a specific provider model (for example, Claude Sonnet 4.5, GPT-5, or Gemini 2.5 Pro) directly from the model picker with a key icon.
+To use your own key, select a specific provider model (for example, Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, or Gemini 3.1 Pro) directly from the model picker with a key icon.
 
 ### Credit usage
 
-When you select a model with the key icon in your model picker, Warp routes the request through your API key.
-
-In this case:
+When you select a model with the key icon in your model picker, Warp routes the request through your API key. In that case:
 
-* No Warp credits are consumed.
-* The cost of the request is billed directly through your provider account.
-* Core Agent Mode always **prioritizes BYOK usage** over any available credits.
+* Inference is billed directly through your provider account rather than drawing from your Warp AI credits.
+* Agent Mode prioritizes BYOK over any available Warp credits.
 
-The credit transparency footer will show “0 credits used”, and the `Billing & Usage` page will reflect no deductions from your monthly credit total.
+:::note
+On Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered.
+:::
 
 **Other AI features in Warp**
 
@@ -97,7 +106,7 @@ If your key:
 
 **Failover and fallback:**
 
-By default, Warp does not fall back to your credits when a BYOK (Bring Your Own Key) request fails.
+By default, Warp does not fall back to your credits when a BYOK request fails.
 
 You can choose to enable **Warp credit fallback**. When enabled, if an agent request fails with your BYOK model (for example, due to an API error or quota limit), Warp will automatically route the request to one of Warp’s provided models. Warp always prioritizes your API keys first and only uses Warp credits when necessary.
 
@@ -117,12 +126,17 @@ Warp itself never stores your LLM API keys.
 
 ### BYOK on Enterprise and Business plans
 
-Organizations with more than 10 employees or users must be on a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. See Warp's [Terms of Service](https://www.warp.dev/terms-of-service) for the full eligibility rule.
+BYOK is configured at the **user level** on every plan, including Enterprise and Business:
+
+* Each team member adds and manages their own API keys locally on their device.
+* Centrally configured, admin-managed BYOK is not yet available — admins cannot enforce or share API keys across team members from a single place.
+* There is no organization-level Admin Panel for BYOK management today.
 
-Currently, BYOK is configured at the **user level**, not the team or admin level:
+If your organization needs centrally managed model routing today, see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) for the Enterprise-managed option, or [contact sales](https://www.warp.dev/contact-sales).
 
-* Each team member can add and manage their own API keys locally.
-* Team admins cannot yet enforce or share API keys across members.
-* There is currently no organization-level Admin Panel for BYOK management.
+## Related resources
 
-If your organization has specific needs for managed keys or enterprise-level control, please contact us at [warp.dev/contact-sales](https://www.warp.dev/contact-sales).
+* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Route Warp through any OpenAI-compatible endpoint, such as OpenRouter, LiteLLM, z.ai, or an internal gateway.
+* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure.
+* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values.
+* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed.
diff --git a/...tent/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/...tent/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx
@@ -0,0 +1,112 @@
+---
+title: Custom inference endpoint
+description: >-
+  Connect Warp's agents to any OpenAI-compatible inference endpoint —
+  OpenRouter, LiteLLM, z.ai, or an internal gateway you already run.
+---
+
+Warp supports **custom inference endpoints** for users who want to power Warp's agents with any OpenAI-compatible inference endpoint — a model router, hosted gateway, or internal infrastructure they already run.
+
+This lets you route AI requests through your preferred provider, run inference behind your own gateway, or use a router like OpenRouter or LiteLLM, while keeping the agent experience inside Warp.
+
+:::note
+Custom inference endpoints are available on Free and all eligible paid plans for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Larger organizations need a Business or Enterprise plan. See [warp.dev/pricing](https://www.warp.dev/pricing) for current availability.
+:::
+
+## Key features
+
+* **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
+* **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
+* **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure.
+* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud.
+
+## How it works
+
+A custom inference endpoint expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a target:
+
+* **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing.
+* **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers.
+* **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
+* **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
+
+When you configure a custom inference endpoint, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers.
+
+:::caution
+Custom inference endpoints don't apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because the configuration is stored locally, it isn't available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
+:::
+
+When a model routed through your endpoint is selected:
+
+* Warp **doesn't consume** your [AI credits](/support-and-community/plans-and-billing/credits/) for that request.
+* Costs are billed directly by your endpoint provider.
+* Warp doesn't retain or store your endpoint credentials on any of its servers.
+
+## Enabling a custom inference endpoint
+
+To enable and configure a custom inference endpoint:
+
+1. In Warp, open **Settings** and search for `inference endpoint` to jump to the configuration.
+2. Add your endpoint URL (the base URL that exposes `/v1/chat/completions`) and any required credentials (typically an API key).
+3. Specify the model identifier(s) you want to route through this endpoint.
+4. Save the configuration. Once added, you'll see your custom models appear in the model picker.
+
+When you explicitly select an endpoint-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's AI credits.
+
+The configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK.
+
+## Billing behavior
+
+### Warp AI credits
+
+When you select an endpoint-routed model from the model picker, inference is billed directly by your endpoint provider, according to their pricing, rather than drawing from your Warp AI credits.
+
+:::note
+On Business and Enterprise plans, local agent runs that route through a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown.
+:::
+
+### Auto routing still uses Warp credits
+
+Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint.
+
+To use your endpoint, select the specific endpoint-routed model from the model picker rather than an Auto option.
+
+### Other AI features in Warp
+
+Some AI-powered features rely on Warp's infrastructure and are unaffected by a custom inference endpoint. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details.
+
+## Zero Data Retention (ZDR)
+
+Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers.
+
+When you use a custom inference endpoint:
+
+* Data retention is determined by **your endpoint provider** and any upstream model providers they route to.
+* Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint.
+* If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms.
+
+Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint.
+
+## Centrally managed configuration
+
+Custom inference endpoints are configured at the **user level** on every plan. Each user adds their own endpoint locally; centrally configured, admin-managed endpoints for teams are not yet available.
+
+Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/).
+
+## How custom inference endpoints differ from BYOK and BYOLLM
+
+Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details.
+
+| Name | Meaning | Plans |
+| --- | --- | --- |
+| **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans |
+| **Custom inference endpoint** | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans |
+| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock today; Azure Foundry and Google Vertex coming soon), with Warp handling routing, orchestration, governance, and observability. | Enterprise only |
+
+Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, a custom inference endpoint, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/).
+
+## Related resources
+
+* [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — Use your own OpenAI, Anthropic, or Google API keys.
+* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure.
+* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values.
+* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed.
diff --git a/src/content/docs/support-and-community/plans-and-billing/index.mdx b/src/content/docs/support-and-community/plans-and-billing/index.mdx
@@ -11,5 +11,6 @@ Warp offers flexible plans for individual developers, teams, and enterprises, wi
 * [**Credits**](/support-and-community/plans-and-billing/credits/) - How credits are used and calculated across AI features
 * [**Add-on Credits**](/support-and-community/plans-and-billing/add-on-credits/) - Purchase additional credits or enable automatic reloads
 * [**Bring Your Own API Key**](/support-and-community/plans-and-billing/bring-your-own-api-key/) - Connect your own model provider API keys
+* [**Custom inference endpoint**](/support-and-community/plans-and-billing/custom-inference-endpoint/) - Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway
 * [**Overages (Legacy)**](/support-and-community/plans-and-billing/overages-legacy/) - Information for users on legacy plans with overages
 * [**Pricing FAQs**](/support-and-community/plans-and-billing/pricing-faqs/) - Answers to common questions about plans and billing
diff --git a/src/sidebar.ts b/src/sidebar.ts
@@ -142,11 +142,11 @@ export const sidebarTopics: StarlightSidebarTopicsUserConfig = [
 						{
 							label: 'Settings file',
 							collapsed: true,
-							items: [
-								{ slug: 'terminal/settings', label: 'Overview' },
-								{ slug: 'terminal/settings/all-settings', label: 'All settings reference' },
-								{ slug: 'terminal/settings/file-locations', label: 'File locations' },
-							],
+					items: [
+					{ slug: 'terminal/settings', label: 'Overview' },
+					{ slug: 'terminal/settings/all-settings', label: 'All settings reference' },
+					{ slug: 'terminal/settings/file-locations', label: 'File locations' },
+					],
 						},
 						{
 							label: 'Warpify overview',
@@ -540,6 +540,7 @@ export const sidebarTopics: StarlightSidebarTopicsUserConfig = [
 						'support-and-community/plans-and-billing/add-on-credits',
 						{ slug: 'support-and-community/plans-and-billing/platform-credits', label: 'Platform credits' },
 						'support-and-community/plans-and-billing/bring-your-own-api-key',
+						'support-and-community/plans-and-billing/custom-inference-endpoint',
 						'support-and-community/plans-and-billing/overages-legacy',
 						'support-and-community/plans-and-billing/pricing-faqs',
 					],