From c9ac351acd3058d4e30d47596315f0cfd43ed349 Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 12 May 2026 18:31:26 -0700 Subject: [PATCH 1/6] docs: update BYOLLM, plan summary, Enterprise billing, and plans-and-billing index for May 2026 - bring-your-own-llm.mdx: Reframe as Enterprise-only managed inference (Bedrock GA, Vertex/Foundry roadmap, internal gateways). Add a comparison section that contrasts BYOLLM with BYOK and Custom inference endpoint, with a note that centrally configured BYOK/CIE for Enterprise is a fast-follow after launch. - plans-pricing-refunds.mdx: Add May 2026 plan summary (Free, Build, Max, Business, Enterprise) with seat limits and qualitative descriptions of each plan's value. Link out to warp.dev/pricing for current monthly credit allowances instead of hard-coding numbers. Add a Custom inference endpoint bullet to the existing sub-page list. - enterprise/support-and-resources/billing.mdx: Clarify that team-wide spend limits are also available on self-serve paid plans while per-user spend limits are Enterprise-only. Add a related-resources link to the Enterprise Analytics API. - plans-and-billing/index.mdx: Add a Custom inference endpoint bullet so the new page is discoverable from the plans-and-billing landing page. Part of the May 2026 Warp pricing docs overhaul (hyc/plan-updates). Co-Authored-By: Oz --- .../bring-your-own-llm.mdx | 42 +++++++++++-------- .../support-and-resources/billing.mdx | 5 +++ .../plans-and-billing/index.mdx | 1 + .../plans-pricing-refunds.mdx | 13 ++++++ 4 files changed, 44 insertions(+), 17 deletions(-) diff --git a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx index 326b7314..5d0fdadb 100644 --- a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx +++ b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx @@ -1,16 +1,16 @@ --- title: Bring your own LLM description: >- - Route Warp's agents through your AWS Bedrock models for billing control and - infrastructure flexibility. + Route Warp's agents through your organization's managed inference + infrastructure for governance, billing control, and model flexibility. --- -Warp supports **Bring Your Own LLM (BYOLLM)** for enterprise teams that need to run inference on their own cloud infrastructure. With BYOLLM, your team can use Warp's agents while routing inference through models hosted in your AWS Bedrock environment. +Warp supports **Bring your own LLM (BYOLLM)** for Enterprise teams that want to run inference on their own managed infrastructure. BYOLLM covers two patterns: cloud-provider Model-as-a-Service (AWS Bedrock, Google Vertex AI, Azure AI Foundry) and approved internal inference gateways. -This gives you control over cloud spend and model hosting, without changing how your team works in Warp. +With BYOLLM, your team uses Warp's agents while Warp manages routing, orchestration, governance, and observability across the providers you've approved. Inference runs in your environment; admins control which models are available to whom. :::caution -BYOLLM currently supports **AWS Bedrock** only. Coming soon: Azure Foundry and Google Vertex support. +**AWS Bedrock** is the GA implementation today. **Google Vertex AI** and **Azure AI Foundry** support is on the roadmap. Approved internal gateways are evaluated on a case-by-case basis with your Warp account team. BYOLLM applies to interactive Oz agents in the terminal. Oz cloud agents do not yet support BYOLLM routing. ::: @@ -19,6 +19,22 @@ BYOLLM applies to interactive Oz agents in the terminal. Oz cloud agents do not BYOLLM is only available on Warp's Enterprise plan. Contact [warp.dev/contact-sales](https://warp.dev/contact-sales) to learn more. ::: +## How BYOLLM differs from BYOK and Custom inference endpoint + +Warp offers three ways to bring your own inference into the product. BYOLLM is one of them, and it serves a different use case than the others. + +| Name | Meaning | Plans | +| --- | --- | --- | +| Bring your own API key (BYOK) | User-level API keys for OpenAI, Anthropic, or Google. Each user configures their own key locally; Warp uses it to call the provider directly. | Free and all paid plans | +| Custom inference endpoint (CIE) | User-level OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. Each user configures the endpoint locally. | Free and all paid plans | +| Bring your own LLM (BYOLLM) | Enterprise-only managed inference infrastructure: cloud-provider Model-as-a-Service (Bedrock, Vertex, Foundry) or approved internal gateways. Warp manages routing, orchestration, governance, and observability for the whole team. | Enterprise | + +Use [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) or [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) when an individual developer wants to authenticate to a provider with their own key or endpoint. Use BYOLLM when an organization wants Warp to manage inference routing across approved providers for the whole team. + +:::note +Centrally configured BYOK and Custom inference endpoint for Enterprise — where admins approve providers or endpoints for the entire organization through the Admin Panel — are a fast-follow after launch, not at launch. Until then, BYOK and CIE remain user-level configurations, and BYOLLM remains the path for admin-managed inference infrastructure. +::: + ## Key features * **Cloud-native credentials** - Authenticate using each user’s AWS IAM identity. Warp does not store API keys. @@ -187,18 +203,9 @@ However, when using BYOLLM: ## FAQ -### How is BYOLLM different from BYOK? - -**BYOK (Bring Your Own Key)** lets individual users add their own API keys for direct model provider access (e.g., Anthropic, OpenAI, Google). Warp stores keys locally on the user's device. +### How is BYOLLM different from BYOK and Custom inference endpoint? -**BYOLLM (Bring Your Own LLM)** routes inference through your organization's cloud infrastructure (AWS Bedrock) using cloud-native IAM. Admins configure it at the admin level and it applies to the entire team. - -| Feature | BYOK | BYOLLM | -| --- | --- | --- | -| Configuration level | User | Admin/Team | -| Authentication | API keys (local) | Cloud IAM (per-user) | -| Billing | Direct to provider | Your cloud account | -| Data locality | Provider infrastructure | Your cloud infrastructure | +See [How BYOLLM differs from BYOK and Custom inference endpoint](#how-byollm-differs-from-byok-and-custom-inference-endpoint) at the top of this page for a comparison. In short: BYOK and CIE are user-level configurations available on Free and all paid plans, while BYOLLM is Enterprise-only managed inference infrastructure where Warp routes the whole team's traffic through providers your admins have approved. ### Does BYOLLM work with Auto? @@ -222,7 +229,8 @@ Yes. Admins can configure routing policies to require specific models to use BYO ## Related resources -* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) +* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — User-level keys for OpenAI, Anthropic, and Google +* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway * [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models * [Admin Panel](/enterprise/team-management/admin-panel/) — Configure team settings * [Contact Sales](https://warp.dev/contact-sales) — Get help with enterprise setup diff --git a/src/content/docs/enterprise/support-and-resources/billing.mdx b/src/content/docs/enterprise/support-and-resources/billing.mdx index 8110c6ab..94e14ee5 100644 --- a/src/content/docs/enterprise/support-and-resources/billing.mdx +++ b/src/content/docs/enterprise/support-and-resources/billing.mdx @@ -72,6 +72,10 @@ Enterprise administrators can set monthly spending limits across the following f Spending is tracked across all payment types (Add-on Credits, pay-as-you-go usage) so limits apply consistently regardless of how usage is funded. +:::note +Team-wide spending limits (cloud, local, and total) are also available on Warp's self-serve paid plans through admin-managed Reload settings. **Per-user spending limits are Enterprise-only.** For deeper visibility into how individual users consume credits, see the [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/). +::: + #### Monthly spend alerts Warp sends alerts to administrators as team usage approaches each configured spending limit, so you can adjust caps, purchase more credits, or communicate with your team before agent usage is blocked at the cap. @@ -86,4 +90,5 @@ For enterprises with credit pools, administrators receive alerts as the team cre * [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) - Purchase additional credits and configure auto-reload * [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/) - Common billing questions * [Bring Your Own LLM](/enterprise/enterprise-features/bring-your-own-llm/) - BYOLLM billing and configuration +* [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/) - Programmatic access to team usage and spend data * [Admin Panel](/enterprise/team-management/admin-panel/) - Configure spending limits and billing settings diff --git a/src/content/docs/support-and-community/plans-and-billing/index.mdx b/src/content/docs/support-and-community/plans-and-billing/index.mdx index ae1490f1..bba2216c 100644 --- a/src/content/docs/support-and-community/plans-and-billing/index.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/index.mdx @@ -11,5 +11,6 @@ Warp offers flexible plans designed for individual developers, teams, and enterp * [**Credits**](/support-and-community/plans-and-billing/credits/) - How credits are used and calculated across AI features * [**Add-on Credits**](/support-and-community/plans-and-billing/add-on-credits/) - Purchase additional credits or enable automatic reloads * [**Bring Your Own API Key**](/support-and-community/plans-and-billing/bring-your-own-api-key/) - Connect your own model provider API keys +* [**Custom inference endpoint**](/support-and-community/plans-and-billing/custom-inference-endpoint/) - Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway * [**Overages (Legacy)**](/support-and-community/plans-and-billing/overages-legacy/) - Information for users on legacy plans with overages * [**Pricing FAQs**](/support-and-community/plans-and-billing/pricing-faqs/) - Answers to common questions about plans and billing diff --git a/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx b/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx index 4423f9d6..0c02deb1 100644 --- a/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx @@ -21,9 +21,22 @@ Visit [warp.dev/pricing](https://warp.dev/pricing) to see the latest plans and w * [Credits](/support-and-community/plans-and-billing/credits/) — learn how credits are used and calculated across AI features. * [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) — purchase additional credits or enable automatic reloads at discounted rates. * [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — connect your own model provider API keys for custom usage and billing. +* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. * [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/) — information for users on legacy plans with overages enabled. * [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/) — answers to common questions about plans, billing, and usage. Don’t have Warp yet? [Download Warp](https://warp.dev/download) and get started for free today. +### May 2026 plan summary + +Below is a high-level summary of Warp's plans as of May 14, 2026. Visit [warp.dev/pricing](https://www.warp.dev/pricing) for current monthly credit allowances and seat pricing. + +* **Free** — Single seat. For developers exploring Warp. Includes core terminal features and a monthly credit allowance for trying Warp's agents. Supports [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) and [custom inference endpoints](/support-and-community/plans-and-billing/custom-inference-endpoint/) so you can keep working with your own provider after the included allowance is used. +* **Build** — Single seat. For individual developers using Warp's agents as a daily driver. Includes a higher monthly credit allowance than Free, [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) with auto-reload, and the same BYOK and custom inference endpoint support as Free. +* **Max** — Single seat. For heavy individual users. Includes everything in Build with a higher monthly credit allowance and a better effective Reload rate. +* **Business** — Up to 50 seats. For small-to-midsize teams. Includes everything in Max plus team-wide collaboration features, Reload credits with admin-managed auto-reload and team-wide spend caps, SAML-based SSO, and admin-configurable data controls. +* **Enterprise** — Custom seat counts. For organizations with advanced security, compliance, or scale needs. Includes everything in Business plus [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/), the [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/), per-user spend limits, advanced admin controls, and Implementation Engineer Support (a structured multi-week implementation program with hands-on guidance from Warp engineers to help your team deploy production Oz Cloud Agent use cases). + +Model provider Zero Data Retention (ZDR) applies across all plans through Warp's contracted LLM providers. See [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/) for details on data controls. + ### Warp’s refund policies Please review the details of our refund policies below. To request a refund, email [**billing@warp.dev**](mailto:billing@warp.dev) with information about your situation — the more context you provide, the faster we can resolve your request. From f975e2d77a84810bd10c7f6f22b7e603dfc895a6 Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 12 May 2026 18:33:44 -0700 Subject: [PATCH 2/6] docs(pricing-may-2026): correct plan seat limits per orchestrator Update May 2026 plan summary in plans-pricing-refunds.mdx with the actual seat limits from the pricing-context dump: - Free / Build / Max: Up to 10 team members - Business: Up to 25 team members - Enterprise: Unlimited team members (custom contract) Also reframe descriptions so 'individual' language is dropped from plans that support up to 10 team members. Co-Authored-By: Oz --- .../plans-and-billing/plans-pricing-refunds.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx b/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx index 0c02deb1..1e048358 100644 --- a/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx @@ -29,11 +29,11 @@ Visit [warp.dev/pricing](https://warp.dev/pricing) to see the latest plans and w Below is a high-level summary of Warp's plans as of May 14, 2026. Visit [warp.dev/pricing](https://www.warp.dev/pricing) for current monthly credit allowances and seat pricing. -* **Free** — Single seat. For developers exploring Warp. Includes core terminal features and a monthly credit allowance for trying Warp's agents. Supports [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) and [custom inference endpoints](/support-and-community/plans-and-billing/custom-inference-endpoint/) so you can keep working with your own provider after the included allowance is used. -* **Build** — Single seat. For individual developers using Warp's agents as a daily driver. Includes a higher monthly credit allowance than Free, [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) with auto-reload, and the same BYOK and custom inference endpoint support as Free. -* **Max** — Single seat. For heavy individual users. Includes everything in Build with a higher monthly credit allowance and a better effective Reload rate. -* **Business** — Up to 50 seats. For small-to-midsize teams. Includes everything in Max plus team-wide collaboration features, Reload credits with admin-managed auto-reload and team-wide spend caps, SAML-based SSO, and admin-configurable data controls. -* **Enterprise** — Custom seat counts. For organizations with advanced security, compliance, or scale needs. Includes everything in Business plus [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/), the [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/), per-user spend limits, advanced admin controls, and Implementation Engineer Support (a structured multi-week implementation program with hands-on guidance from Warp engineers to help your team deploy production Oz Cloud Agent use cases). +* **Free** — Up to 10 team members. For developers exploring Warp. Includes core terminal features and a monthly credit allowance for trying Warp's agents. Supports [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) and [custom inference endpoints](/support-and-community/plans-and-billing/custom-inference-endpoint/) so you can keep working with your own provider after the included allowance is used. +* **Build** — Up to 10 team members. For developers using Warp's agents as a daily driver. Includes a higher monthly credit allowance than Free, [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) with auto-reload, and the same BYOK and custom inference endpoint support as Free. +* **Max** — Up to 10 team members. For heavy users. Includes everything in Build with a higher monthly credit allowance and a better effective Reload rate. +* **Business** — Up to 25 team members. For small-to-midsize teams. Includes everything in Max plus team-wide collaboration features, Reload credits with admin-managed auto-reload and team-wide spend caps, SAML-based SSO, and admin-configurable data controls. +* **Enterprise** — Unlimited team members (custom contract). For organizations with advanced security, compliance, or scale needs. Includes everything in Business plus [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/), the [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/), per-user spend limits, advanced admin controls, and Implementation Engineer Support (a structured multi-week implementation program with hands-on guidance from Warp engineers to help your team deploy production Oz Cloud Agent use cases). Model provider Zero Data Retention (ZDR) applies across all plans through Warp's contracted LLM providers. See [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/) for details on data controls. From 980e9d0ea5a43eb4d010664d03452c8a15bcde6f Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 12 May 2026 18:41:15 -0700 Subject: [PATCH 3/6] docs(pricing-may-2026): add BYOK / CIE org-size disclosure (10 or fewer employees) Per user requirement: BYOK and custom inference endpoint are limited to individual users and orgs with 10 or fewer employees on Free / Build / Max; larger orgs require a Business or Enterprise plan. - bring-your-own-llm.mdx: Update the BYOK and CIE rows in the comparison matrix to reflect the org-size constraint, add a :::note disclosure callout immediately below the matrix, and rephrase the corresponding FAQ for consistency. - plans-pricing-refunds.mdx: Add the same :::note disclosure callout after the May 2026 plan summary bullet list, before the ZDR sentence (one callout covering the Free / Build / Max bullets). Co-Authored-By: Oz --- .../enterprise-features/bring-your-own-llm.mdx | 10 +++++++--- .../plans-and-billing/plans-pricing-refunds.mdx | 4 ++++ 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx index 5d0fdadb..f112531b 100644 --- a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx +++ b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx @@ -25,10 +25,14 @@ Warp offers three ways to bring your own inference into the product. BYOLLM is o | Name | Meaning | Plans | | --- | --- | --- | -| Bring your own API key (BYOK) | User-level API keys for OpenAI, Anthropic, or Google. Each user configures their own key locally; Warp uses it to call the provider directly. | Free and all paid plans | -| Custom inference endpoint (CIE) | User-level OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. Each user configures the endpoint locally. | Free and all paid plans | +| Bring your own API key (BYOK) | User-level API keys for OpenAI, Anthropic, or Google. Each user configures their own key locally; Warp uses it to call the provider directly. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs | +| Custom inference endpoint (CIE) | User-level OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. Each user configures the endpoint locally. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs | | Bring your own LLM (BYOLLM) | Enterprise-only managed inference infrastructure: cloud-provider Model-as-a-Service (Bedrock, Vertex, Foundry) or approved internal gateways. Warp manages routing, orchestration, governance, and observability for the whole team. | Enterprise | +:::note +BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features. +::: + Use [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) or [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) when an individual developer wants to authenticate to a provider with their own key or endpoint. Use BYOLLM when an organization wants Warp to manage inference routing across approved providers for the whole team. :::note @@ -205,7 +209,7 @@ However, when using BYOLLM: ### How is BYOLLM different from BYOK and Custom inference endpoint? -See [How BYOLLM differs from BYOK and Custom inference endpoint](#how-byollm-differs-from-byok-and-custom-inference-endpoint) at the top of this page for a comparison. In short: BYOK and CIE are user-level configurations available on Free and all paid plans, while BYOLLM is Enterprise-only managed inference infrastructure where Warp routes the whole team's traffic through providers your admins have approved. +See [How BYOLLM differs from BYOK and Custom inference endpoint](#how-byollm-differs-from-byok-and-custom-inference-endpoint) at the top of this page for a comparison and plan-availability details. In short: BYOK and CIE are user-level configurations available to individual users and orgs with 10 or fewer employees on Free, Build, and Max, and to all users on Business and Enterprise. BYOLLM is Enterprise-only managed inference infrastructure where Warp routes the whole team's traffic through providers your admins have approved. ### Does BYOLLM work with Auto? diff --git a/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx b/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx index 1e048358..9c4fb383 100644 --- a/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx @@ -35,6 +35,10 @@ Below is a high-level summary of Warp's plans as of May 14, 2026. Visit [warp.de * **Business** — Up to 25 team members. For small-to-midsize teams. Includes everything in Max plus team-wide collaboration features, Reload credits with admin-managed auto-reload and team-wide spend caps, SAML-based SSO, and admin-configurable data controls. * **Enterprise** — Unlimited team members (custom contract). For organizations with advanced security, compliance, or scale needs. Includes everything in Business plus [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/), the [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/), per-user spend limits, advanced admin controls, and Implementation Engineer Support (a structured multi-week implementation program with hands-on guidance from Warp engineers to help your team deploy production Oz Cloud Agent use cases). +:::note +BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features. +::: + Model provider Zero Data Retention (ZDR) applies across all plans through Warp's contracted LLM providers. See [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/) for details on data controls. ### Warp’s refund policies From 157891832ffdf0a7abceba1e0e22a5dd72eb78af Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 12 May 2026 18:49:55 -0700 Subject: [PATCH 4/6] Add Custom inference endpoint page stub for CI link check This file is the canonical version created on PR #71 (hyc/plan-updates-byok-cie). It is duplicated here so that the link checker on this branch can resolve the relative references to /support-and-community/plans-and-billing/custom-inference-endpoint/ that this PR introduces. When PR #71 merges into hyc/plan-updates, git will reconcile the identical file contents automatically. Co-Authored-By: Oz --- .../custom-inference-endpoint.mdx | 117 ++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx new file mode 100644 index 00000000..77465ba4 --- /dev/null +++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx @@ -0,0 +1,117 @@ +--- +title: Custom inference endpoint +description: >- + Connect Warp to any OpenAI-compatible custom inference endpoint, such as + OpenRouter, LiteLLM, z.ai, or an internal gateway. Available on the Free + plan and all eligible paid plans. +--- + +A **Custom inference endpoint (CIE)** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure. + +CIE is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a CIE is configured and selected, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for the request. + +:::note +CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans. +::: + +:::note +BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use custom inference endpoints. +::: + +## Key features + +* **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. +* **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. +* **No Warp credits consumed** - Inference is billed directly by your endpoint provider; Warp's metered features remain unaffected. +* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud. + +## How it works + +CIE expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a CIE target: + +* **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing. +* **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers. +* **z.ai** - A model provider with an OpenAI-compatible API surface for its models. +* **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). + +When you configure a CIE, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers. + +:::caution +CIE does not apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because CIE configuration is stored locally, it is not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). +::: + +When a CIE-routed model is selected: + +* Warp **does not consume** any of your [credits](/support-and-community/plans-and-billing/credits/). +* Costs are billed directly by your endpoint provider. +* Warp does not retain or store your endpoint credentials on any of its servers. + +## Enabling a custom inference endpoint + +To enable and configure a custom inference endpoint: + +1. In Warp, open **Settings** and search for `inference endpoint` to jump to the configuration. +2. Add your endpoint URL (the base URL that exposes `/v1/chat/completions`) and any required credentials (typically an API key). +3. Specify the model identifier(s) you want to route through this endpoint. +4. Save the configuration. Once added, you'll see your custom models appear in the model picker. + +When you explicitly select a CIE-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's credits. + +The CIE configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK. + +## Billing behavior + +### Warp credits + +When you select a CIE-routed model from the model picker: + +* No Warp credits are consumed for that request. +* Inference is billed directly by your endpoint provider, according to their pricing. +* The credit transparency footer will show "0 credits used" for CIE-routed requests. + +### Auto routing still uses Warp credits + +Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint. + +To use your endpoint, select the specific CIE-backed model from the model picker rather than an Auto option. + +### Other AI features in Warp + +Some AI-powered features rely on Warp's infrastructure and are unaffected by CIE configuration. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details. + +## Zero Data Retention (ZDR) + +Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers. + +When you use a custom inference endpoint: + +* Data retention is determined by **your endpoint provider** and any upstream model providers they route to. +* Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint. +* If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms. + +Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a CIE. + +## Plan availability + +CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits. + +CIE is available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints. + +Centrally configured, admin-managed CIE for teams is not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/). + +## How CIE differs from BYOK and BYOLLM + +Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details. + +| Name | Meaning | Plans | +| --- | --- | --- | +| **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans | +| **Custom inference endpoint** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | +| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only | + +## Related resources + +* [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — Use your own OpenAI, Anthropic, or Google API keys. +* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure. +* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values. +* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed. From 32b280dc3686e263b401d81fb4bd5624476a5345 Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 19 May 2026 16:29:53 -0700 Subject: [PATCH 5/6] docs(pricing-may-2026): BYOLLM cross-provider IAM + platform credits in plan summary, Enterprise billing - bring-your-own-llm.mdx: rewrite Cloud-native credentials bullet so it covers AWS IAM (GA) plus Google Cloud and Azure identities (roadmap); add a platform-credits note next to the BYOLLM 'no Warp credits consumed' framing so Enterprise readers know local agent runs still consume platform credits; add the platform-credits page to Related resources. - plans-pricing-refunds.mdx: append a one-liner about platform credits to the Business and Enterprise bullets of the May 2026 plan summary so readers understand when platform credits apply across customer-supplied inference. - enterprise/support-and-resources/billing.mdx: add the platform-credits page to Related resources alongside Add-on Credits and Credits. Pre-existing build/link-check failures (CIE sidebar registration; platform-credits.mdx not yet on the umbrella) are out of scope here and will be resolved by PR #71's sidebar.ts update and the umbrella rebase onto main. Co-Authored-By: Oz --- .../enterprise/enterprise-features/bring-your-own-llm.mdx | 7 ++++++- .../docs/enterprise/support-and-resources/billing.mdx | 1 + .../plans-and-billing/plans-pricing-refunds.mdx | 4 ++-- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx index f112531b..75e62816 100644 --- a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx +++ b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx @@ -41,7 +41,7 @@ Centrally configured BYOK and Custom inference endpoint for Enterprise — where ## Key features -* **Cloud-native credentials** - Authenticate using each user’s AWS IAM identity. Warp does not store API keys. +* **Cloud-native credentials** - Authenticate using each user's cloud-native identity (AWS IAM today; Google Cloud and Azure identities on the roadmap). Warp does not store API keys. * **Admin-enforced routing** - Team admins configure which models are available to users in AWS Bedrock, with the ability to disable non-Bedrock model access entirely. * **Consolidated billing** - Inference costs are billed directly to your AWS account, leveraging existing cloud commitments. @@ -154,6 +154,10 @@ When a request routes through BYOLLM: * **Warp does not consume credits** for that request. * Your cloud provider account receives the inference costs directly. +:::note +BYOLLM-routed local agent runs on Enterprise still consume platform credits for Warp's platform infrastructure (run orchestration, observability, integrations). Inference costs are billed directly to your cloud provider account. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown. +::: + ### Routing behavior Warp's agents automatically select the best model for your task while respecting your admin's routing policies. If you configure a model for BYOLLM, requests for that model route to AWS Bedrock. @@ -235,6 +239,7 @@ Yes. Admins can configure routing policies to require specific models to use BYO * [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — User-level keys for OpenAI, Anthropic, and Google * [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway +* [platform credits](/support-and-community/plans-and-billing/platform-credits/) — Warp's platform-infrastructure credit bucket * [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models * [Admin Panel](/enterprise/team-management/admin-panel/) — Configure team settings * [Contact Sales](https://warp.dev/contact-sales) — Get help with enterprise setup diff --git a/src/content/docs/enterprise/support-and-resources/billing.mdx b/src/content/docs/enterprise/support-and-resources/billing.mdx index 94e14ee5..c2e2e755 100644 --- a/src/content/docs/enterprise/support-and-resources/billing.mdx +++ b/src/content/docs/enterprise/support-and-resources/billing.mdx @@ -88,6 +88,7 @@ For enterprises with credit pools, administrators receive alerts as the team cre * [Credits](/support-and-community/plans-and-billing/credits/) - How credits are calculated and consumed * [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) - Purchase additional credits and configure auto-reload +* [platform credits](/support-and-community/plans-and-billing/platform-credits/) - The third credit bucket alongside AI credits and compute credits, covering Warp's platform infrastructure * [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/) - Common billing questions * [Bring Your Own LLM](/enterprise/enterprise-features/bring-your-own-llm/) - BYOLLM billing and configuration * [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/) - Programmatic access to team usage and spend data diff --git a/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx b/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx index 9c4fb383..4fee09e1 100644 --- a/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx +++ b/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx @@ -32,8 +32,8 @@ Below is a high-level summary of Warp's plans as of May 14, 2026. Visit [warp.de * **Free** — Up to 10 team members. For developers exploring Warp. Includes core terminal features and a monthly credit allowance for trying Warp's agents. Supports [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) and [custom inference endpoints](/support-and-community/plans-and-billing/custom-inference-endpoint/) so you can keep working with your own provider after the included allowance is used. * **Build** — Up to 10 team members. For developers using Warp's agents as a daily driver. Includes a higher monthly credit allowance than Free, [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) with auto-reload, and the same BYOK and custom inference endpoint support as Free. * **Max** — Up to 10 team members. For heavy users. Includes everything in Build with a higher monthly credit allowance and a better effective Reload rate. -* **Business** — Up to 25 team members. For small-to-midsize teams. Includes everything in Max plus team-wide collaboration features, Reload credits with admin-managed auto-reload and team-wide spend caps, SAML-based SSO, and admin-configurable data controls. -* **Enterprise** — Unlimited team members (custom contract). For organizations with advanced security, compliance, or scale needs. Includes everything in Business plus [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/), the [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/), per-user spend limits, advanced admin controls, and Implementation Engineer Support (a structured multi-week implementation program with hands-on guidance from Warp engineers to help your team deploy production Oz Cloud Agent use cases). +* **Business** — Up to 25 team members. For small-to-midsize teams. Includes everything in Max plus team-wide collaboration features, Reload credits with admin-managed auto-reload and team-wide spend caps, SAML-based SSO, and admin-configurable data controls. Platform credits apply to cloud agent runs and to local runs that use customer-supplied inference (BYOK or CIE). +* **Enterprise** — Unlimited team members (custom contract). For organizations with advanced security, compliance, or scale needs. Includes everything in Business plus [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/), the [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/), per-user spend limits, advanced admin controls, and Implementation Engineer Support (a structured multi-week implementation program with hands-on guidance from Warp engineers to help your team deploy production Oz Cloud Agent use cases). Platform credits apply to all cloud agent runs and to local runs using BYOLLM, BYOK, or CIE. :::note BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features. From 448594394245d3055f4c43b4e8ca0ab204f83a92 Mon Sep 17 00:00:00 2001 From: Hong Yi Chen Date: Tue, 19 May 2026 16:32:23 -0700 Subject: [PATCH 6/6] Remove duplicate CIE stub; PR #71 owns the canonical custom-inference-endpoint.mdx The CIE page is owned by PR #71 (hyc/plan-updates-byok-cie), which also adds the matching src/sidebar.ts entry. Removing the stub here so the PR #72 \u2192 umbrella merge auto-resolves cleanly after #71 lands. Co-Authored-By: Oz --- .../custom-inference-endpoint.mdx | 117 ------------------ 1 file changed, 117 deletions(-) delete mode 100644 src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx deleted file mode 100644 index 77465ba4..00000000 --- a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx +++ /dev/null @@ -1,117 +0,0 @@ ---- -title: Custom inference endpoint -description: >- - Connect Warp to any OpenAI-compatible custom inference endpoint, such as - OpenRouter, LiteLLM, z.ai, or an internal gateway. Available on the Free - plan and all eligible paid plans. ---- - -A **Custom inference endpoint (CIE)** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure. - -CIE is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a CIE is configured and selected, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for the request. - -:::note -CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans. -::: - -:::note -BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use custom inference endpoints. -::: - -## Key features - -* **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API. -* **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway. -* **No Warp credits consumed** - Inference is billed directly by your endpoint provider; Warp's metered features remain unaffected. -* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud. - -## How it works - -CIE expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a CIE target: - -* **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing. -* **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers. -* **z.ai** - A model provider with an OpenAI-compatible API surface for its models. -* **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control). - -When you configure a CIE, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers. - -:::caution -CIE does not apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because CIE configuration is stored locally, it is not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/). -::: - -When a CIE-routed model is selected: - -* Warp **does not consume** any of your [credits](/support-and-community/plans-and-billing/credits/). -* Costs are billed directly by your endpoint provider. -* Warp does not retain or store your endpoint credentials on any of its servers. - -## Enabling a custom inference endpoint - -To enable and configure a custom inference endpoint: - -1. In Warp, open **Settings** and search for `inference endpoint` to jump to the configuration. -2. Add your endpoint URL (the base URL that exposes `/v1/chat/completions`) and any required credentials (typically an API key). -3. Specify the model identifier(s) you want to route through this endpoint. -4. Save the configuration. Once added, you'll see your custom models appear in the model picker. - -When you explicitly select a CIE-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's credits. - -The CIE configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK. - -## Billing behavior - -### Warp credits - -When you select a CIE-routed model from the model picker: - -* No Warp credits are consumed for that request. -* Inference is billed directly by your endpoint provider, according to their pricing. -* The credit transparency footer will show "0 credits used" for CIE-routed requests. - -### Auto routing still uses Warp credits - -Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint. - -To use your endpoint, select the specific CIE-backed model from the model picker rather than an Auto option. - -### Other AI features in Warp - -Some AI-powered features rely on Warp's infrastructure and are unaffected by CIE configuration. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details. - -## Zero Data Retention (ZDR) - -Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers. - -When you use a custom inference endpoint: - -* Data retention is determined by **your endpoint provider** and any upstream model providers they route to. -* Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint. -* If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms. - -Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a CIE. - -## Plan availability - -CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits. - -CIE is available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints. - -Centrally configured, admin-managed CIE for teams is not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/). - -## How CIE differs from BYOK and BYOLLM - -Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details. - -| Name | Meaning | Plans | -| --- | --- | --- | -| **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans | -| **Custom inference endpoint** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans | -| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only | - -## Related resources - -* [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — Use your own OpenAI, Anthropic, or Google API keys. -* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure. -* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values. -* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed.