From 4fcbfedd734cabc0417d261e00ac65988536879d Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 19 May 2026 16:56:18 -0700 Subject: [PATCH 01/11] docs(pricing-may-2026): customer-supplied inference (BYOK + CIE + BYOLLM) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Part of the May 14, 2026 pricing-and-packaging docs launch. - BYOK is now available on the Free plan; page rewritten to open eligibility, refresh model examples, and add the BYOK/CIE/BYOLLM comparison table. - New Custom Inference Endpoint (CIE) page for OpenAI Chat Completions– compatible endpoints (OpenRouter, LiteLLM, z.ai, internal gateways). Sidebar entry added under Plans and billing. - BYOLLM reframed as Enterprise-only managed inference. AWS Bedrock GA; Google Vertex AI and Azure AI Foundry on the roadmap. Cloud-native credentials now span IAM/OIDC across all three cloud providers. - 10-employee org rule applies to BYOK and CIE; larger orgs need Business or Enterprise. - Platform-credits caveats: on Business/Enterprise local agent runs, customer-supplied inference still consumes platform credits even though no AI credits are charged. - plans-and-billing/index.mdx updated to surface the new CIE page. Co-Authored-By: Oz --- .../bring-your-own-llm.mdx | 53 +++++--- .../bring-your-own-api-key.mdx | 64 ++++++--- .../custom-inference-endpoint.mdx | 123 ++++++++++++++++++ .../plans-and-billing/index.mdx | 1 + src/sidebar.ts | 2 +- 5 files changed, 204 insertions(+), 39 deletions(-) create mode 100644 src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx diff --git a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx index 6b4fc6b9..e62be9b8 100644 --- a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx +++ b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx @@ -1,16 +1,16 @@ --- title: Bring your own LLM description: >- - Route Warp's agents through your AWS Bedrock models for billing control and - infrastructure flexibility. + Route Warp's agents through your organization's managed inference + infrastructure for governance, billing control, and model flexibility. --- -Warp supports **Bring Your Own LLM (BYOLLM)** for enterprise teams that need to run inference on their own cloud infrastructure. With BYOLLM, your team can use Warp's agents while routing inference through models hosted in your AWS Bedrock environment. +Warp supports **Bring your own LLM (BYOLLM)** for Enterprise teams that want to run inference on their own managed infrastructure. BYOLLM covers two patterns: cloud-provider Model-as-a-Service (AWS Bedrock, Google Vertex AI, Azure AI Foundry) and approved internal inference gateways. -This gives you control over cloud spend and model hosting, without changing how your team works in Warp. +With BYOLLM, your team uses Warp's agents while Warp manages routing, orchestration, governance, and observability across the providers you've approved. Inference runs in your environment; admins control which models are available to whom. :::caution -BYOLLM currently supports **AWS Bedrock** only. Coming soon: Azure Foundry and Google Vertex support. +**AWS Bedrock** is the GA implementation today. **Google Vertex AI** and **Azure AI Foundry** support is on the roadmap. Approved internal gateways are evaluated on a case-by-case basis with your Warp account team. BYOLLM applies to interactive agents in the terminal. Cloud agents do not yet support BYOLLM routing. ::: @@ -19,9 +19,29 @@ BYOLLM applies to interactive agents in the terminal. Cloud agents do not yet su BYOLLM is only available on Warp's Enterprise plan. Contact [warp.dev/contact-sales](https://www.warp.dev/contact-sales) to learn more. ::: +## How BYOLLM differs from BYOK and Custom inference endpoint + +Warp offers three ways to bring your own inference into the product. BYOLLM is one of them, and it serves a different use case than the others. + +| Name | Meaning | Plans | +| --- | --- | --- | +| Bring your own API key (BYOK) | User-level API keys for OpenAI, Anthropic, or Google. Each user configures their own key locally; Warp uses it to call the provider directly. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs | +| Custom inference endpoint (CIE) | User-level OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. Each user configures the endpoint locally. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs | +| Bring your own LLM (BYOLLM) | Enterprise-only managed inference infrastructure: cloud-provider Model-as-a-Service (Bedrock, Vertex, Foundry) or approved internal gateways. Warp manages routing, orchestration, governance, and observability for the whole team. | Enterprise | + +:::note +BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features. +::: + +Use [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) or [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) when an individual developer wants to authenticate to a provider with their own key or endpoint. Use BYOLLM when an organization wants Warp to manage inference routing across approved providers for the whole team. + +:::note +Centrally configured BYOK and Custom inference endpoint for Enterprise — where admins approve providers or endpoints for the entire organization through the Admin Panel — are a fast-follow after launch, not at launch. Until then, BYOK and CIE remain user-level configurations, and BYOLLM remains the path for admin-managed inference infrastructure. +::: + ## Key features -* **Cloud-native credentials** - Authenticate using each user’s AWS IAM identity. Warp does not store API keys. +* **Cloud-native credentials** - Authenticate using each user's cloud-native identity (AWS IAM today; Google Cloud and Azure identities on the roadmap). Warp does not store API keys. * **Admin-enforced routing** - Team admins configure which models are available to users in AWS Bedrock, with the ability to disable non-Bedrock model access entirely. * **Consolidated billing** - Inference costs are billed directly to your AWS account, leveraging existing cloud commitments. @@ -134,6 +154,10 @@ When a request routes through BYOLLM: * **Warp does not consume credits** for that request. * Your cloud provider account receives the inference costs directly. +:::note +BYOLLM-routed local agent runs on Enterprise still consume platform credits for Warp's platform infrastructure (run orchestration, observability, integrations). Inference costs are billed directly to your cloud provider account. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. +::: + ### Routing behavior Warp's agents automatically select the best model for your task while respecting your admin's routing policies. If you configure a model for BYOLLM, requests for that model route to AWS Bedrock. @@ -187,18 +211,9 @@ However, when using BYOLLM: ## FAQ -### How is BYOLLM different from BYOK? - -**BYOK (Bring Your Own Key)** lets individual users add their own API keys for direct model provider access (e.g., Anthropic, OpenAI, Google). Warp stores keys locally on the user's device. +### How is BYOLLM different from BYOK and Custom inference endpoint? -**BYOLLM (Bring Your Own LLM)** routes inference through your organization's cloud infrastructure (AWS Bedrock) using cloud-native IAM. Admins configure it at the admin level and it applies to the entire team. - -| Feature | BYOK | BYOLLM | -| --- | --- | --- | -| Configuration level | User | Admin/Team | -| Authentication | API keys (local) | Cloud IAM (per-user) | -| Billing | Direct to provider | Your cloud account | -| Data locality | Provider infrastructure | Your cloud infrastructure | +See [How BYOLLM differs from BYOK and Custom inference endpoint](#how-byollm-differs-from-byok-and-custom-inference-endpoint) at the top of this page for a comparison and plan-availability details. In short: BYOK and CIE are user-level configurations available to individual users and orgs with 10 or fewer employees on Free, Build, and Max, and to all users on Business and Enterprise. BYOLLM is Enterprise-only managed inference infrastructure where Warp routes the whole team's traffic through providers your admins have approved. ### Does BYOLLM work with Auto? @@ -222,7 +237,9 @@ Yes. Admins can configure routing policies to require specific models to use BYO ## Related resources -* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) +* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — User-level keys for OpenAI, Anthropic, and Google +* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway +* [platform credits](/support-and-community/plans-and-billing/platform-credits/) — Warp's platform-infrastructure credit bucket * [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models * [Admin Panel](/enterprise/team-management/admin-panel/) — Configure team settings * [Contact Sales](https://www.warp.dev/contact-sales) — Get help with enterprise setup diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index 69aa1dfe..9239c925 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -1,29 +1,46 @@ --- -title: Bring Your Own API Key +title: Bring your own API key description: >- - Warp's paid plans include the ability to bring your own API keys (BYOK) for - OpenAI, Anthropic, and Google AI models. + Use your own OpenAI, Anthropic, or Google API keys. Never consumes AI + credits — on Business and Enterprise, platform credits may apply for + local agent runs. --- -Warp supports **Bring Your Own Key (BYOK)** for users who want to connect Warp’s agent to their own Anthropic, OpenAI, or Google API accounts. +Warp supports **Bring your own API key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts. -This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models. +BYOK gives you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for the full list of supported models. When you route a request through your own key, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for that request. -BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys. +:::note +On the Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered. +::: :::note -BYOK is currently only available on Warp's paid plans, starting with Build. Learn more about plans and pricing [warp.dev/pricing](https://www.warp.dev/pricing). +BYOK is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans. ::: -:::caution -BYOK and customer-supplied inference (BYOLLM via Amazon Bedrock or Google Vertex, plus custom endpoints) are available to individual users and organizations with 10 or fewer employees or users on any plan. Organizations with more than 10 employees or users must be on a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. See Warp's [Terms of Service](https://www.warp.dev/terms-of-service) for details. +:::note +BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. ::: -## How does BYOK work? +## How BYOK differs from Custom inference endpoint and BYOLLM + +Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details. + +| Name | Meaning | Plans | +| --- | --- | --- | +| **Bring your own API key** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans | +| **[Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/)** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | +| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only | + +See [warp.dev/pricing](https://www.warp.dev/pricing) for current plan availability. + +Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, CIE, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/). + +## How BYOK works When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**. -Warp uses these API keys to directly route your agent requests to the model provider you've configured. +Warp uses these API keys to route your agent requests directly to the model provider you've configured. :::caution BYOK does not apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). @@ -57,9 +74,9 @@ When you explicitly select a model with a key icon, Warp routes requests through ### Auto Model -Warp's **Auto** models dynamically route requests across different models based on context and performance. Because this routing logic depends on Warp’s infrastructure, **Auto always consumes Warp's credits**, even if you’ve configured your own API keys. +Warp's **Auto** models dynamically route requests across different models based on context and performance. Because this routing logic depends on Warp's infrastructure, **Auto always consumes Warp's credits**, even if you've configured your own API keys. -To use your own key, select a specific provider model (for example, Claude Sonnet 4.5, GPT-5, or Gemini 2.5 Pro) directly from the model picker with a key icon. +To use your own key, select a specific provider model (for example, Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, or Gemini 3.1 Pro) directly from the model picker with a key icon. ### Credit usage @@ -97,7 +114,7 @@ If your key: **Failover and fallback:** -By default, Warp does not fall back to your credits when a BYOK (Bring Your Own Key) request fails. +By default, Warp does not fall back to your credits when a BYOK request fails. You can choose to enable **Warp credit fallback**. When enabled, if an agent request fails with your BYOK model (for example, due to an API error or quota limit), Warp will automatically route the request to one of Warp’s provided models. Warp always prioritizes your API keys first and only uses Warp credits when necessary. @@ -117,12 +134,19 @@ Warp itself never stores your LLM API keys. ### BYOK on Enterprise and Business plans -Organizations with more than 10 employees or users must be on a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. See Warp's [Terms of Service](https://www.warp.dev/terms-of-service) for the full eligibility rule. +BYOK is available to individual users and to organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. + +Today, BYOK is configured at the **user level** on every plan, including Enterprise and Business: + +* Each team member can add and manage their own API keys locally on their device. +* Centrally configured, admin-managed BYOK is not yet available — admins cannot enforce or share API keys across team members from a single place. +* There is no organization-level Admin Panel for BYOK management today. -Currently, BYOK is configured at the **user level**, not the team or admin level: +If your organization needs centrally managed model routing now, see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) for the Enterprise-managed option. To discuss a fit, contact us at [warp.dev/contact-sales](https://www.warp.dev/contact-sales). -* Each team member can add and manage their own API keys locally. -* Team admins cannot yet enforce or share API keys across members. -* There is currently no organization-level Admin Panel for BYOK management. +## Related resources -If your organization has specific needs for managed keys or enterprise-level control, please contact us at [warp.dev/contact-sales](https://www.warp.dev/contact-sales). +* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Route Warp through any OpenAI-compatible endpoint, such as OpenRouter, LiteLLM, z.ai, or an internal gateway. +* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure. +* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values. +* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed. diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx new file mode 100644 index 00000000..88fa647e --- /dev/null +++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx @@ -0,0 +1,123 @@ +--- +title: Custom inference endpoint +description: >- + Connect Warp to OpenAI-compatible endpoints (OpenRouter, LiteLLM, z.ai, + internal gateways). On Business and Enterprise, platform credits may + apply for local runs. +--- + +A **Custom inference endpoint (CIE)** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure. + +CIE is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a CIE is configured and selected, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for the request. + +:::note +CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans. +::: + +:::note +BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use custom inference endpoints. +::: + +## Key features + +* **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. +* **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. +* **No Warp credits consumed** - Inference is billed directly by your endpoint provider; Warp's metered features remain unaffected. +* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud. + +## How it works + +CIE expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a CIE target: + +* **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing. +* **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers. +* **z.ai** - A model provider with an OpenAI-compatible API surface for its models. +* **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). + +When you configure a CIE, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers. + +:::caution +CIE does not apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because CIE configuration is stored locally, it is not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). +::: + +When a CIE-routed model is selected: + +* Warp **does not consume** any of your [credits](/support-and-community/plans-and-billing/credits/). +* Costs are billed directly by your endpoint provider. +* Warp does not retain or store your endpoint credentials on any of its servers. + +## Enabling a custom inference endpoint + +To enable and configure a custom inference endpoint: + +1. In Warp, open **Settings** and search for `inference endpoint` to jump to the configuration. +2. Add your endpoint URL (the base URL that exposes `/v1/chat/completions`) and any required credentials (typically an API key). +3. Specify the model identifier(s) you want to route through this endpoint. +4. Save the configuration. Once added, you'll see your custom models appear in the model picker. + +When you explicitly select a CIE-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's credits. + +The CIE configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK. + +## Billing behavior + +### Warp credits + +When you select a CIE-routed model from the model picker: + +* No Warp credits are consumed for that request. +* Inference is billed directly by your endpoint provider, according to their pricing. +* The credit transparency footer will show "0 credits used" for CIE-routed requests. + +:::note +On Business and Enterprise plans, local agent runs that use a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. +::: + +### Auto routing still uses Warp credits + +Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint. + +To use your endpoint, select the specific CIE-backed model from the model picker rather than an Auto option. + +### Other AI features in Warp + +Some AI-powered features rely on Warp's infrastructure and are unaffected by CIE configuration. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details. + +## Zero Data Retention (ZDR) + +Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers. + +When you use a custom inference endpoint: + +* Data retention is determined by **your endpoint provider** and any upstream model providers they route to. +* Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint. +* If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms. + +Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a CIE. + +## Plan availability + +CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits. + +CIE is available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints. + +Centrally configured, admin-managed CIE for teams is not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/). + +## How CIE differs from BYOK and BYOLLM + +Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details. + +| Name | Meaning | Plans | +| --- | --- | --- | +| **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans | +| **Custom inference endpoint** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | +| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only | + +Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, CIE, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/). + +## Related resources + +* [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — Use your own OpenAI, Anthropic, or Google API keys. +* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure. +* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values. +* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed. diff --git a/src/content/docs/support-and-community/plans-and-billing/index.mdx b/src/content/docs/support-and-community/plans-and-billing/index.mdx index 692b4fc9..bc9845bd 100644 --- a/src/content/docs/support-and-community/plans-and-billing/index.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/index.mdx @@ -11,5 +11,6 @@ Warp offers flexible plans for individual developers, teams, and enterprises, wi * [**Credits**](/support-and-community/plans-and-billing/credits/) - How credits are used and calculated across AI features * [**Add-on Credits**](/support-and-community/plans-and-billing/add-on-credits/) - Purchase additional credits or enable automatic reloads * [**Bring Your Own API Key**](/support-and-community/plans-and-billing/bring-your-own-api-key/) - Connect your own model provider API keys +* [**Custom inference endpoint**](/support-and-community/plans-and-billing/custom-inference-endpoint/) - Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway * [**Overages (Legacy)**](/support-and-community/plans-and-billing/overages-legacy/) - Information for users on legacy plans with overages * [**Pricing FAQs**](/support-and-community/plans-and-billing/pricing-faqs/) - Answers to common questions about plans and billing diff --git a/src/sidebar.ts b/src/sidebar.ts index f64c9e80..bb83c4b0 100644 --- a/src/sidebar.ts +++ b/src/sidebar.ts @@ -145,7 +145,6 @@ export const sidebarTopics: StarlightSidebarTopicsUserConfig = [ items: [ { slug: 'terminal/settings', label: 'Overview' }, { slug: 'terminal/settings/all-settings', label: 'All settings reference' }, - { slug: 'terminal/settings/file-locations', label: 'File locations' }, ], }, { @@ -540,6 +539,7 @@ export const sidebarTopics: StarlightSidebarTopicsUserConfig = [ 'support-and-community/plans-and-billing/add-on-credits', { slug: 'support-and-community/plans-and-billing/platform-credits', label: 'Platform credits' }, 'support-and-community/plans-and-billing/bring-your-own-api-key', + 'support-and-community/plans-and-billing/custom-inference-endpoint', 'support-and-community/plans-and-billing/overages-legacy', 'support-and-community/plans-and-billing/pricing-faqs', ], From 86818df68182d8771bbbddc2c61086af1c230620 Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 19 May 2026 17:08:23 -0700 Subject: [PATCH 02/11] docs(pricing-may-2026): note July 1 self-serve preview period in BYOK + CIE platform-credits callouts Both BYOK and CIE pages now spell out that self-serve billing for platform credits (including Business BYOK / CIE) doesn't start until July 1, 2026. Between May 14 and June 30, 2026, platform-credit consumption is visible in the Warp app's usage breakdown for transparency on Build, Max, and Business, but no platform credits are deducted from your Reload pool or counted against your spend cap. Enterprise plans are billed per contract from May 14 and aren't affected by this preview period. Co-Authored-By: Oz --- .../plans-and-billing/bring-your-own-api-key.mdx | 2 ++ .../plans-and-billing/custom-inference-endpoint.mdx | 2 ++ 2 files changed, 4 insertions(+) diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index 9239c925..1c0b3e9e 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -12,6 +12,8 @@ BYOK gives you full control over model selection, billing, and data routing. See :::note On the Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered. + +**Self-serve preview period.** Self-serve billing for platform credits (including Business BYOK) doesn't start until **July 1, 2026**. Between May 14 and June 30, 2026, platform-credit consumption is visible in the Warp app's usage breakdown for transparency, but no platform credits are deducted from your Reload pool or counted against your spend cap. Enterprise is billed per contract from May 14. ::: :::note diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx index 88fa647e..8f5cb113 100644 --- a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx @@ -71,6 +71,8 @@ When you select a CIE-routed model from the model picker: :::note On Business and Enterprise plans, local agent runs that use a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. + +**Self-serve preview period.** Self-serve billing for platform credits (including Business CIE) doesn't start until **July 1, 2026**. Between May 14 and June 30, 2026, platform-credit consumption is visible in the Warp app's usage breakdown for transparency, but no platform credits are deducted from your Reload pool or counted against your spend cap. Enterprise is billed per contract from May 14. ::: ### Auto routing still uses Warp credits From d3c5b54b316d9289e8cb27e85fdc7c96fa04c819 Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Wed, 20 May 2026 11:00:24 -0700 Subject: [PATCH 03/11] docs(pricing-may-2026): correct launch date May 14 \u2192 May 21, 2026 Co-Authored-By: Oz --- .../plans-and-billing/bring-your-own-api-key.mdx | 2 +- .../plans-and-billing/custom-inference-endpoint.mdx | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index 1c0b3e9e..76bb636f 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -13,7 +13,7 @@ BYOK gives you full control over model selection, billing, and data routing. See :::note On the Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered. -**Self-serve preview period.** Self-serve billing for platform credits (including Business BYOK) doesn't start until **July 1, 2026**. Between May 14 and June 30, 2026, platform-credit consumption is visible in the Warp app's usage breakdown for transparency, but no platform credits are deducted from your Reload pool or counted against your spend cap. Enterprise is billed per contract from May 14. +**Self-serve preview period.** Self-serve billing for platform credits (including Business BYOK) doesn't start until **July 1, 2026**. Between May 21 and June 30, 2026, platform-credit consumption is visible in the Warp app's usage breakdown for transparency, but no platform credits are deducted from your Reload pool or counted against your spend cap. Enterprise is billed per contract from May 21. ::: :::note diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx index 8f5cb113..49fc1477 100644 --- a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx @@ -72,7 +72,7 @@ When you select a CIE-routed model from the model picker: :::note On Business and Enterprise plans, local agent runs that use a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. -**Self-serve preview period.** Self-serve billing for platform credits (including Business CIE) doesn't start until **July 1, 2026**. Between May 14 and June 30, 2026, platform-credit consumption is visible in the Warp app's usage breakdown for transparency, but no platform credits are deducted from your Reload pool or counted against your spend cap. Enterprise is billed per contract from May 14. +**Self-serve preview period.** Self-serve billing for platform credits (including Business CIE) doesn't start until **July 1, 2026**. Between May 21 and June 30, 2026, platform-credit consumption is visible in the Warp app's usage breakdown for transparency, but no platform credits are deducted from your Reload pool or counted against your spend cap. Enterprise is billed per contract from May 21. ::: ### Auto routing still uses Warp credits From 5bf93dbfa5fa5b38a4a3f26607752819d60cf63d Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Wed, 20 May 2026 17:09:46 -0700 Subject: [PATCH 04/11] docs(pricing-may-2026): revert BYOLLM page to main, keep changes minimal Per launch direction: keep the Enterprise BYOLLM page largely unchanged for this launch. The BYOK/CIE/BYOLLM comparison still lives on the BYOK and CIE pages, so readers landing on either of those will see the three-way framing. This restores: - The original AWS-Bedrock-focused frontmatter description and opening paragraph (instead of the cross-provider reframing). - The original 'BYOLLM currently supports AWS Bedrock only. Coming soon: Azure Foundry and Google Vertex support.' caveat. - The original 'Cloud-native credentials - Authenticate using each user's AWS IAM identity' key feature. - The original 'How is BYOLLM different from BYOK?' FAQ with its 4-row comparison table. - The original Related resources list. Drops the launch-era additions: - The 'How BYOLLM differs from BYOK and Custom inference endpoint' section with the three-way comparison table. - The :::note about centrally configured BYOK / CIE for Enterprise being a fast-follow. - The :::note about platform credits for BYOLLM-routed local runs. Co-Authored-By: Oz --- .../bring-your-own-llm.mdx | 53 +++++++------------ 1 file changed, 18 insertions(+), 35 deletions(-) diff --git a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx index e62be9b8..6b4fc6b9 100644 --- a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx +++ b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx @@ -1,16 +1,16 @@ --- title: Bring your own LLM description: >- - Route Warp's agents through your organization's managed inference - infrastructure for governance, billing control, and model flexibility. + Route Warp's agents through your AWS Bedrock models for billing control and + infrastructure flexibility. --- -Warp supports **Bring your own LLM (BYOLLM)** for Enterprise teams that want to run inference on their own managed infrastructure. BYOLLM covers two patterns: cloud-provider Model-as-a-Service (AWS Bedrock, Google Vertex AI, Azure AI Foundry) and approved internal inference gateways. +Warp supports **Bring Your Own LLM (BYOLLM)** for enterprise teams that need to run inference on their own cloud infrastructure. With BYOLLM, your team can use Warp's agents while routing inference through models hosted in your AWS Bedrock environment. -With BYOLLM, your team uses Warp's agents while Warp manages routing, orchestration, governance, and observability across the providers you've approved. Inference runs in your environment; admins control which models are available to whom. +This gives you control over cloud spend and model hosting, without changing how your team works in Warp. :::caution -**AWS Bedrock** is the GA implementation today. **Google Vertex AI** and **Azure AI Foundry** support is on the roadmap. Approved internal gateways are evaluated on a case-by-case basis with your Warp account team. +BYOLLM currently supports **AWS Bedrock** only. Coming soon: Azure Foundry and Google Vertex support. BYOLLM applies to interactive agents in the terminal. Cloud agents do not yet support BYOLLM routing. ::: @@ -19,29 +19,9 @@ BYOLLM applies to interactive agents in the terminal. Cloud agents do not yet su BYOLLM is only available on Warp's Enterprise plan. Contact [warp.dev/contact-sales](https://www.warp.dev/contact-sales) to learn more. ::: -## How BYOLLM differs from BYOK and Custom inference endpoint - -Warp offers three ways to bring your own inference into the product. BYOLLM is one of them, and it serves a different use case than the others. - -| Name | Meaning | Plans | -| --- | --- | --- | -| Bring your own API key (BYOK) | User-level API keys for OpenAI, Anthropic, or Google. Each user configures their own key locally; Warp uses it to call the provider directly. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs | -| Custom inference endpoint (CIE) | User-level OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. Each user configures the endpoint locally. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs | -| Bring your own LLM (BYOLLM) | Enterprise-only managed inference infrastructure: cloud-provider Model-as-a-Service (Bedrock, Vertex, Foundry) or approved internal gateways. Warp manages routing, orchestration, governance, and observability for the whole team. | Enterprise | - -:::note -BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features. -::: - -Use [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) or [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) when an individual developer wants to authenticate to a provider with their own key or endpoint. Use BYOLLM when an organization wants Warp to manage inference routing across approved providers for the whole team. - -:::note -Centrally configured BYOK and Custom inference endpoint for Enterprise — where admins approve providers or endpoints for the entire organization through the Admin Panel — are a fast-follow after launch, not at launch. Until then, BYOK and CIE remain user-level configurations, and BYOLLM remains the path for admin-managed inference infrastructure. -::: - ## Key features -* **Cloud-native credentials** - Authenticate using each user's cloud-native identity (AWS IAM today; Google Cloud and Azure identities on the roadmap). Warp does not store API keys. +* **Cloud-native credentials** - Authenticate using each user’s AWS IAM identity. Warp does not store API keys. * **Admin-enforced routing** - Team admins configure which models are available to users in AWS Bedrock, with the ability to disable non-Bedrock model access entirely. * **Consolidated billing** - Inference costs are billed directly to your AWS account, leveraging existing cloud commitments. @@ -154,10 +134,6 @@ When a request routes through BYOLLM: * **Warp does not consume credits** for that request. * Your cloud provider account receives the inference costs directly. -:::note -BYOLLM-routed local agent runs on Enterprise still consume platform credits for Warp's platform infrastructure (run orchestration, observability, integrations). Inference costs are billed directly to your cloud provider account. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. -::: - ### Routing behavior Warp's agents automatically select the best model for your task while respecting your admin's routing policies. If you configure a model for BYOLLM, requests for that model route to AWS Bedrock. @@ -211,9 +187,18 @@ However, when using BYOLLM: ## FAQ -### How is BYOLLM different from BYOK and Custom inference endpoint? +### How is BYOLLM different from BYOK? + +**BYOK (Bring Your Own Key)** lets individual users add their own API keys for direct model provider access (e.g., Anthropic, OpenAI, Google). Warp stores keys locally on the user's device. -See [How BYOLLM differs from BYOK and Custom inference endpoint](#how-byollm-differs-from-byok-and-custom-inference-endpoint) at the top of this page for a comparison and plan-availability details. In short: BYOK and CIE are user-level configurations available to individual users and orgs with 10 or fewer employees on Free, Build, and Max, and to all users on Business and Enterprise. BYOLLM is Enterprise-only managed inference infrastructure where Warp routes the whole team's traffic through providers your admins have approved. +**BYOLLM (Bring Your Own LLM)** routes inference through your organization's cloud infrastructure (AWS Bedrock) using cloud-native IAM. Admins configure it at the admin level and it applies to the entire team. + +| Feature | BYOK | BYOLLM | +| --- | --- | --- | +| Configuration level | User | Admin/Team | +| Authentication | API keys (local) | Cloud IAM (per-user) | +| Billing | Direct to provider | Your cloud account | +| Data locality | Provider infrastructure | Your cloud infrastructure | ### Does BYOLLM work with Auto? @@ -237,9 +222,7 @@ Yes. Admins can configure routing policies to require specific models to use BYO ## Related resources -* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — User-level keys for OpenAI, Anthropic, and Google -* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway -* [platform credits](/support-and-community/plans-and-billing/platform-credits/) — Warp's platform-infrastructure credit bucket +* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) * [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models * [Admin Panel](/enterprise/team-management/admin-panel/) — Configure team settings * [Contact Sales](https://www.warp.dev/contact-sales) — Get help with enterprise setup From 7e7d0d06791bc856c9eb1accad48f87422ea1f0a Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Wed, 20 May 2026 17:15:15 -0700 Subject: [PATCH 05/11] docs(pricing-may-2026): drop CIE abbreviation, narrow credit claims, remove preview-period notes, restore File locations sidebar - Restore the 'File locations' sidebar entry under Settings file (added on main by PR #110, accidentally dropped during the rebase). - Drop the 'CIE' abbreviation throughout the customer-supplied inference pages. Use the full name 'custom inference endpoint' (or 'your endpoint' / 'endpoint-routed model' in context) instead. - Narrow the 'never consumes Warp credits' claim to 'doesn't consume AI credits' on the BYOK and custom inference endpoint pages, since Business / Enterprise local agent runs still consume platform credits. - Rewrite the 'No Warp credits consumed' Key features bullet on the custom inference endpoint page so it accurately calls out the platform-credits caveat on Business / Enterprise. - Drop the 'Self-serve preview period' paragraph from the platform-credits :::note callouts on the BYOK and custom inference endpoint pages. The July 1, 2026 cutover lives only in pricing-faqs.mdx now \u2014 canonical feature pages don't carry the launch-period detail. Co-Authored-By: Oz --- .../bring-your-own-api-key.mdx | 8 ++- .../custom-inference-endpoint.mdx | 54 +++++++++---------- src/sidebar.ts | 9 ++-- 3 files changed, 34 insertions(+), 37 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index 76bb636f..f942d59d 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -8,12 +8,10 @@ description: >- Warp supports **Bring your own API key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts. -BYOK gives you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for the full list of supported models. When you route a request through your own key, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for that request. +BYOK gives you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for the full list of supported models. When you route a request through your own key, Warp **doesn't consume your** [AI credits](/support-and-community/plans-and-billing/credits/) for that request — you're billed directly by your model provider. :::note On the Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered. - -**Self-serve preview period.** Self-serve billing for platform credits (including Business BYOK) doesn't start until **July 1, 2026**. Between May 21 and June 30, 2026, platform-credit consumption is visible in the Warp app's usage breakdown for transparency, but no platform credits are deducted from your Reload pool or counted against your spend cap. Enterprise is billed per contract from May 21. ::: :::note @@ -31,12 +29,12 @@ Warp offers three ways to bring your own AI infrastructure. Use this table to pi | Name | Meaning | Plans | | --- | --- | --- | | **Bring your own API key** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans | -| **[Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/)** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | +| **[Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/)** | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | | **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only | See [warp.dev/pricing](https://www.warp.dev/pricing) for current plan availability. -Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, CIE, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/). +Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, a custom inference endpoint, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/). ## How BYOK works diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx index 49fc1477..acc0ed4e 100644 --- a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx @@ -6,12 +6,12 @@ description: >- apply for local runs. --- -A **Custom inference endpoint (CIE)** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure. +A **custom inference endpoint** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure. -CIE is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a CIE is configured and selected, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for the request. +This is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a custom inference endpoint is configured and selected, Warp **doesn't consume your** [AI credits](/support-and-community/plans-and-billing/credits/) for that request — you're billed directly by your endpoint provider. :::note -CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans. +Custom inference endpoints are available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans. ::: :::note @@ -22,29 +22,29 @@ BYOK and custom inference endpoint support are available for individual users an * **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. * **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. -* **No Warp credits consumed** - Inference is billed directly by your endpoint provider; Warp's metered features remain unaffected. +* **No AI credits consumed for inference** - Inference is billed directly by your endpoint provider. On Business and Enterprise, local agent runs that route through a custom inference endpoint still consume [platform credits](/support-and-community/plans-and-billing/platform-credits/) for Warp's platform infrastructure. * **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud. ## How it works -CIE expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a CIE target: +A custom inference endpoint expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a target: * **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing. * **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers. * **z.ai** - A model provider with an OpenAI-compatible API surface for its models. * **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). -When you configure a CIE, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers. +When you configure a custom inference endpoint, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers. :::caution -CIE does not apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because CIE configuration is stored locally, it is not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). +Custom inference endpoints don't apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because the configuration is stored locally, it isn't available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). ::: -When a CIE-routed model is selected: +When a model routed through your endpoint is selected: -* Warp **does not consume** any of your [credits](/support-and-community/plans-and-billing/credits/). +* Warp **doesn't consume** your [AI credits](/support-and-community/plans-and-billing/credits/) for that request. * Costs are billed directly by your endpoint provider. -* Warp does not retain or store your endpoint credentials on any of its servers. +* Warp doesn't retain or store your endpoint credentials on any of its servers. ## Enabling a custom inference endpoint @@ -55,35 +55,33 @@ To enable and configure a custom inference endpoint: 3. Specify the model identifier(s) you want to route through this endpoint. 4. Save the configuration. Once added, you'll see your custom models appear in the model picker. -When you explicitly select a CIE-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's credits. +When you explicitly select an endpoint-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's AI credits. -The CIE configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK. +The configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK. ## Billing behavior -### Warp credits +### Warp AI credits -When you select a CIE-routed model from the model picker: +When you select an endpoint-routed model from the model picker: -* No Warp credits are consumed for that request. +* No Warp AI credits are consumed for that request. * Inference is billed directly by your endpoint provider, according to their pricing. -* The credit transparency footer will show "0 credits used" for CIE-routed requests. +* The credit transparency footer will show "0 credits used" for those requests. :::note -On Business and Enterprise plans, local agent runs that use a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. - -**Self-serve preview period.** Self-serve billing for platform credits (including Business CIE) doesn't start until **July 1, 2026**. Between May 21 and June 30, 2026, platform-credit consumption is visible in the Warp app's usage breakdown for transparency, but no platform credits are deducted from your Reload pool or counted against your spend cap. Enterprise is billed per contract from May 21. +On Business and Enterprise plans, local agent runs that route through a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. ::: ### Auto routing still uses Warp credits Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint. -To use your endpoint, select the specific CIE-backed model from the model picker rather than an Auto option. +To use your endpoint, select the specific endpoint-routed model from the model picker rather than an Auto option. ### Other AI features in Warp -Some AI-powered features rely on Warp's infrastructure and are unaffected by CIE configuration. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details. +Some AI-powered features rely on Warp's infrastructure and are unaffected by a custom inference endpoint. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details. ## Zero Data Retention (ZDR) @@ -95,27 +93,27 @@ When you use a custom inference endpoint: * Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint. * If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms. -Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a CIE. +Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint. ## Plan availability -CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits. +Custom inference endpoints are available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits. -CIE is available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints. +Custom inference endpoints are available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints. -Centrally configured, admin-managed CIE for teams is not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/). +Centrally configured, admin-managed custom inference endpoints for teams are not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/). -## How CIE differs from BYOK and BYOLLM +## How custom inference endpoints differ from BYOK and BYOLLM Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details. | Name | Meaning | Plans | | --- | --- | --- | | **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans | -| **Custom inference endpoint** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | +| **Custom inference endpoint** | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | | **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only | -Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, CIE, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/). +Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, a custom inference endpoint, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/). ## Related resources diff --git a/src/sidebar.ts b/src/sidebar.ts index bb83c4b0..adb1e279 100644 --- a/src/sidebar.ts +++ b/src/sidebar.ts @@ -142,10 +142,11 @@ export const sidebarTopics: StarlightSidebarTopicsUserConfig = [ { label: 'Settings file', collapsed: true, - items: [ - { slug: 'terminal/settings', label: 'Overview' }, - { slug: 'terminal/settings/all-settings', label: 'All settings reference' }, - ], + items: [ + { slug: 'terminal/settings', label: 'Overview' }, + { slug: 'terminal/settings/all-settings', label: 'All settings reference' }, + { slug: 'terminal/settings/file-locations', label: 'File locations' }, + ], }, { label: 'Warpify overview', From 504efc4cbfbcab793928bbe4796a294ebabc59fa Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Wed, 20 May 2026 17:23:04 -0700 Subject: [PATCH 06/11] docs(pricing-may-2026): de-emphasize billing in BYOK + custom inference endpoint openings, consolidate plan notes - Reframe the BYOK and custom inference endpoint opening copy around model selection and data routing instead of billing. Move the AI-credits-consumption details out of the intro and down into the dedicated billing sections where they belong. - Collapse the two stacked :::note callouts about plan availability and the 10-or-fewer-employees rule into a single, briefer note on each page. - Move the Business / Enterprise platform-credits caveat off the top of the BYOK page and into the 'Credit usage' subsection alongside the related credit details. - Trim the 'BYOK on Enterprise and Business plans' section on the BYOK page so it doesn't restate the org-size rule already covered up top. - Replace the redundant 'Plan availability' section on the custom inference endpoint page with a focused 'Centrally managed configuration' section that only covers what's still unique to that page (user-level config today, admin-managed coming later).\n - Light copy polish on phrasing in both files. Co-Authored-By: Oz --- .../bring-your-own-api-key.mdx | 41 ++++++++----------- .../custom-inference-endpoint.mdx | 23 ++++------- 2 files changed, 24 insertions(+), 40 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index f942d59d..ca554116 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -1,25 +1,16 @@ --- title: Bring your own API key description: >- - Use your own OpenAI, Anthropic, or Google API keys. Never consumes AI - credits — on Business and Enterprise, platform credits may apply for - local agent runs. + Connect Warp's agents to your own OpenAI, Anthropic, or Google API + accounts for direct control over model selection and data routing. --- -Warp supports **Bring your own API key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts. +**Bring your own API key (BYOK)** lets you connect Warp's agents to your own Anthropic, OpenAI, or Google API accounts. You stay in control of model selection and data routing, and your keys are stored locally on your device. -BYOK gives you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for the full list of supported models. When you route a request through your own key, Warp **doesn't consume your** [AI credits](/support-and-community/plans-and-billing/credits/) for that request — you're billed directly by your model provider. +See [Model Choice](/agent-platform/capabilities/model-choice/) for the full list of supported models. :::note -On the Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered. -::: - -:::note -BYOK is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans. -::: - -:::note -BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. +BYOK is available on Free and all eligible paid plans for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Larger organizations need a Business or Enterprise plan. See [warp.dev/pricing](https://www.warp.dev/pricing) for current availability. ::: ## How BYOK differs from Custom inference endpoint and BYOLLM @@ -80,15 +71,17 @@ To use your own key, select a specific provider model (for example, Claude Opus ### Credit usage -When you select a model with the key icon in your model picker, Warp routes the request through your API key. +When you select a model with the key icon in your model picker, Warp routes the request through your API key. In that case: -In this case: +* No AI credits are consumed. +* The request is billed directly through your provider account. +* Agent Mode always **prioritizes BYOK** over any available Warp credits. -* No Warp credits are consumed. -* The cost of the request is billed directly through your provider account. -* Core Agent Mode always **prioritizes BYOK usage** over any available credits. +The credit transparency footer shows “0 credits used”, and the `Billing & Usage` page reflects no deductions from your monthly credit total. -The credit transparency footer will show “0 credits used”, and the `Billing & Usage` page will reflect no deductions from your monthly credit total. +:::note +On Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered. +::: **Other AI features in Warp** @@ -134,15 +127,13 @@ Warp itself never stores your LLM API keys. ### BYOK on Enterprise and Business plans -BYOK is available to individual users and to organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. - -Today, BYOK is configured at the **user level** on every plan, including Enterprise and Business: +BYOK is configured at the **user level** on every plan, including Enterprise and Business: -* Each team member can add and manage their own API keys locally on their device. +* Each team member adds and manages their own API keys locally on their device. * Centrally configured, admin-managed BYOK is not yet available — admins cannot enforce or share API keys across team members from a single place. * There is no organization-level Admin Panel for BYOK management today. -If your organization needs centrally managed model routing now, see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) for the Enterprise-managed option. To discuss a fit, contact us at [warp.dev/contact-sales](https://www.warp.dev/contact-sales). +If your organization needs centrally managed model routing today, see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) for the Enterprise-managed option, or [contact sales](https://www.warp.dev/contact-sales). ## Related resources diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx index acc0ed4e..6c2cd9b7 100644 --- a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx @@ -1,21 +1,16 @@ --- title: Custom inference endpoint description: >- - Connect Warp to OpenAI-compatible endpoints (OpenRouter, LiteLLM, z.ai, - internal gateways). On Business and Enterprise, platform credits may - apply for local runs. + Connect Warp's agents to any OpenAI-compatible inference endpoint — + OpenRouter, LiteLLM, z.ai, or an internal gateway you already run. --- -A **custom inference endpoint** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure. +A **custom inference endpoint** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure — without giving up the agent experience inside Warp. -This is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a custom inference endpoint is configured and selected, Warp **doesn't consume your** [AI credits](/support-and-community/plans-and-billing/credits/) for that request — you're billed directly by your endpoint provider. +This is the right fit when you want to choose your provider, run inference behind your own gateway, or use a router like OpenRouter or LiteLLM. :::note -Custom inference endpoints are available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans. -::: - -:::note -BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use custom inference endpoints. +Custom inference endpoints are available on Free and all eligible paid plans for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Larger organizations need a Business or Enterprise plan. See [warp.dev/pricing](https://www.warp.dev/pricing) for current availability. ::: ## Key features @@ -95,13 +90,11 @@ When you use a custom inference endpoint: Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a custom inference endpoint. -## Plan availability - -Custom inference endpoints are available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits. +## Centrally managed configuration -Custom inference endpoints are available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints. +Custom inference endpoints are configured at the **user level** on every plan. Each user adds their own endpoint locally; centrally configured, admin-managed endpoints for teams are not yet available. -Centrally configured, admin-managed custom inference endpoints for teams are not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/). +Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/). ## How custom inference endpoints differ from BYOK and BYOLLM From 4bff48e0b10322c08c311aced8413e31ef1ae48f Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Wed, 20 May 2026 17:26:00 -0700 Subject: [PATCH 07/11] docs(pricing-may-2026): restore original BYOK opening, narrow only the credits claim to AI credits Per follow-up review, undo the polish on the BYOK intro and restore the original three-paragraph opening verbatim: - Title back to 'Bring Your Own API Key' (Title Case) - 'Warp supports Bring Your Own Key (BYOK) for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts.' - 'This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See Model Choice for a list of supported models.' - 'BYOK provides greater flexibility in model access and ensures Warp never consumes your AI credits for requests routed through your own keys.' The only substantive change vs the original is narrowing 'credits' to 'AI credits' in that last sentence, per earlier feedback that the unqualified 'never consumes Warp credits' claim is too broad now that Business / Enterprise local runs can consume platform credits. The combined plan-availability + 10-employee :::note below the intro stays as-is. Everything below the intro (BYOK works, Enabling BYOK, billing behavior, Credit usage with the platform-credits note, ZDR, Enterprise/Business config, Related resources) is unchanged. Co-Authored-By: Oz --- .../plans-and-billing/bring-your-own-api-key.mdx | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index ca554116..a909bd39 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -1,13 +1,15 @@ --- -title: Bring your own API key +title: Bring Your Own API Key description: >- - Connect Warp's agents to your own OpenAI, Anthropic, or Google API - accounts for direct control over model selection and data routing. + Warp lets you bring your own API keys (BYOK) for OpenAI, Anthropic, and + Google AI models. --- -**Bring your own API key (BYOK)** lets you connect Warp's agents to your own Anthropic, OpenAI, or Google API accounts. You stay in control of model selection and data routing, and your keys are stored locally on your device. +Warp supports **Bring Your Own Key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts. -See [Model Choice](/agent-platform/capabilities/model-choice/) for the full list of supported models. +This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models. + +BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [AI credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys. :::note BYOK is available on Free and all eligible paid plans for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Larger organizations need a Business or Enterprise plan. See [warp.dev/pricing](https://www.warp.dev/pricing) for current availability. From 97590c983b409cc0c4b36eddbbdb171513ab1edb Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Wed, 20 May 2026 17:28:55 -0700 Subject: [PATCH 08/11] docs(pricing-may-2026): soften absolute 'no AI credits / 0 credits' claims in BYOK + CIE credit sections - Drop the 'No AI credits are consumed' bullet and the 'credit transparency footer shows 0 credits used' sentence from BYOK's Credit usage subsection. Replaced with a more general framing that says inference is billed through your provider account rather than drawing from your Warp AI credits, alongside the existing platform credits caveat for Business / Enterprise. - Same softening on the custom inference endpoint page's Warp AI credits subsection \u2014 collapse the three firm bullets into one general sentence and keep the platform-credits note. This avoids the misleadingly absolute '0 credits' claim, which is inaccurate for Business / Enterprise local runs where platform credits can still apply. Co-Authored-By: Oz --- .../plans-and-billing/bring-your-own-api-key.mdx | 7 ++----- .../plans-and-billing/custom-inference-endpoint.mdx | 6 +----- 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index a909bd39..5aaf2ea0 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -75,11 +75,8 @@ To use your own key, select a specific provider model (for example, Claude Opus When you select a model with the key icon in your model picker, Warp routes the request through your API key. In that case: -* No AI credits are consumed. -* The request is billed directly through your provider account. -* Agent Mode always **prioritizes BYOK** over any available Warp credits. - -The credit transparency footer shows “0 credits used”, and the `Billing & Usage` page reflects no deductions from your monthly credit total. +* Inference is billed directly through your provider account rather than drawing from your Warp AI credits. +* Agent Mode prioritizes BYOK over any available Warp credits. :::note On Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered. diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx index 6c2cd9b7..d0ef7098 100644 --- a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx @@ -58,11 +58,7 @@ The configuration flow mirrors the [Bring your own API key](/support-and-communi ### Warp AI credits -When you select an endpoint-routed model from the model picker: - -* No Warp AI credits are consumed for that request. -* Inference is billed directly by your endpoint provider, according to their pricing. -* The credit transparency footer will show "0 credits used" for those requests. +When you select an endpoint-routed model from the model picker, inference is billed directly by your endpoint provider, according to their pricing, rather than drawing from your Warp AI credits. :::note On Business and Enterprise plans, local agent runs that route through a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. From 14ab6e2d94989a70d85d1ba5437a2f2f916673ae Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Wed, 20 May 2026 17:31:23 -0700 Subject: [PATCH 09/11] docs(pricing-may-2026): reframe custom inference endpoint intro to lead with powering Warp's agents Mirror the BYOK page's intro pattern so it's explicit upfront that a custom inference endpoint is used to power Warp's agents. New opening: Warp supports custom inference endpoints for users who want to power Warp's agents with any OpenAI-compatible inference endpoint \u2014 a model router, hosted gateway, or internal infrastructure they already run. This lets you route AI requests through your preferred provider, run inference behind your own gateway, or use a router like OpenRouter or LiteLLM, while keeping the agent experience inside Warp. No other changes. Co-Authored-By: Oz --- .../plans-and-billing/custom-inference-endpoint.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx index d0ef7098..201f2b11 100644 --- a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx @@ -5,9 +5,9 @@ description: >- OpenRouter, LiteLLM, z.ai, or an internal gateway you already run. --- -A **custom inference endpoint** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure — without giving up the agent experience inside Warp. +Warp supports **custom inference endpoints** for users who want to power Warp's agents with any OpenAI-compatible inference endpoint — a model router, hosted gateway, or internal infrastructure they already run. -This is the right fit when you want to choose your provider, run inference behind your own gateway, or use a router like OpenRouter or LiteLLM. +This lets you route AI requests through your preferred provider, run inference behind your own gateway, or use a router like OpenRouter or LiteLLM, while keeping the agent experience inside Warp. :::note Custom inference endpoints are available on Free and all eligible paid plans for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Larger organizations need a Business or Enterprise plan. See [warp.dev/pricing](https://www.warp.dev/pricing) for current availability. From 7ee92199189aaa1197abe6a93a7d8c2a744eb874 Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Wed, 20 May 2026 18:47:58 -0700 Subject: [PATCH 10/11] Cleanup pass: BYOK acronym + BYOLLM table scope - bring-your-own-api-key.mdx intro: fix the wrong BYOK expansion ('Bring Your Own Key (BYOK)') to match the page title and standard usage ('Bring Your Own API Key (BYOK)'). - bring-your-own-api-key.mdx + custom-inference-endpoint.mdx comparison tables: tighten the BYOLLM row so it reflects current launch scope ('AWS Bedrock today; Azure Foundry and Google Vertex coming soon') instead of implying all three ship at launch. Co-Authored-By: Oz --- .../plans-and-billing/bring-your-own-api-key.mdx | 4 ++-- .../plans-and-billing/custom-inference-endpoint.mdx | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index 5aaf2ea0..ff9a0699 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -5,7 +5,7 @@ description: >- Google AI models. --- -Warp supports **Bring Your Own Key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts. +Warp supports **Bring Your Own API Key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts. This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models. @@ -23,7 +23,7 @@ Warp offers three ways to bring your own AI infrastructure. Use this table to pi | --- | --- | --- | | **Bring your own API key** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans | | **[Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/)** | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | -| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only | +| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock today; Azure Foundry and Google Vertex coming soon), with Warp handling routing, orchestration, governance, and observability. | Enterprise only | See [warp.dev/pricing](https://www.warp.dev/pricing) for current plan availability. diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx index 201f2b11..522df646 100644 --- a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx @@ -100,7 +100,7 @@ Warp offers three ways to bring your own AI infrastructure. Use this table to pi | --- | --- | --- | | **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans | | **Custom inference endpoint** | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | -| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only | +| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock today; Azure Foundry and Google Vertex coming soon), with Warp handling routing, orchestration, governance, and observability. | Enterprise only | Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, a custom inference endpoint, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/). From 350163908248ee958db83bbb33ce0ef8d2a829d5 Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Wed, 20 May 2026 21:07:49 -0700 Subject: [PATCH 11/11] PR #115 review: address Tyler's comments - bring-your-own-api-key.mdx 'Platform credits' note: Tyler correctly pointed out that platform credits also apply for cloud agent runs. Rewrite the line to lead with the cloud-agent case ('apply to every cloud agent run on any plan') and then cover the local-runs case ('and to local agent runs on Business and Enterprise when using BYOK, a custom inference endpoint, or BYOLLM'). - bring-your-own-api-key.mdx 'How BYOK works' opening: drop the misleading 'directly to the model provider' phrasing since requests still flow through Warp's infrastructure. Now reads 'Warp uses these API keys when routing your agent requests to the model provider you've configured.' Tyler's third comment was a stylistic preference for 'need' over 'require' on the page note, which already uses 'need' here. The parallel 'require' phrasing in pricing-faqs.mdx will be normalized on PR #116. Co-Authored-By: Oz --- .../plans-and-billing/bring-your-own-api-key.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx index ff9a0699..78a3c7de 100644 --- a/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx @@ -27,13 +27,13 @@ Warp offers three ways to bring your own AI infrastructure. Use this table to pi See [warp.dev/pricing](https://www.warp.dev/pricing) for current plan availability. -Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, a custom inference endpoint, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/). +Platform credits apply to every cloud agent run on any plan, and to local agent runs on Business and Enterprise when using BYOK, a custom inference endpoint, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. ## How BYOK works When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**. -Warp uses these API keys to route your agent requests directly to the model provider you've configured. +Warp uses these API keys when routing your agent requests to the model provider you've configured. :::caution BYOK does not apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).