From 9f30309dd9d0d14d2ff237ea41114353199088e8 Mon Sep 17 00:00:00 2001
From: hongyi-chen <hongyigma@gmail.com>
Date: Tue, 26 May 2026 18:43:30 +0000
Subject: [PATCH 1/2] Document local model tunneling for custom inference
 endpoints

Adds a new 'Using local models' section explaining that endpoint URLs
must be publicly accessible (localhost and private network URLs are
rejected) and how to expose a local model server through a tunneling
service like ngrok.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../inference/custom-inference-endpoint.mdx               | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
index 604a004c..7dcd58ba 100644
--- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
+++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
@@ -54,6 +54,14 @@ When you explicitly select an endpoint-routed model from the model picker, Warp
 
 The configuration flow mirrors the [Bring Your Own API Key](/agent-platform/inference/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK.
 
+## Using local models
+
+Warp routes inference requests through its servers, so endpoint URLs must be publicly accessible. `localhost`, `127.0.0.1`, and other private or local network URLs are rejected when configuring a custom inference endpoint.
+
+To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration.
+
+For example, with a default Ollama install listening on port `11434`, run `ngrok http 11434` and use the resulting `https://*.ngrok-free.app` URL as your endpoint. Other tunneling services that produce a publicly reachable HTTPS URL (Cloudflare Tunnel, Tailscale Funnel, and similar) work the same way.
+
 ## Billing behavior
 
 ### Warp AI credits

From 0874dac97c564a4c38b71d12c3d28c609642c621 Mon Sep 17 00:00:00 2001
From: Hong Yi Chen <hongyi@warp.dev>
Date: Tue, 26 May 2026 11:53:51 -0700
Subject: [PATCH 2/2] Update
 src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx

Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>
---
 .../docs/agent-platform/inference/custom-inference-endpoint.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
index 7dcd58ba..23a4d4b9 100644
--- a/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
+++ b/src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx
@@ -60,7 +60,7 @@ Warp routes inference requests through its servers, so endpoint URLs must be pub
 
 To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration.
 
-For example, with a default Ollama install listening on port `11434`, run `ngrok http 11434` and use the resulting `https://*.ngrok-free.app` URL as your endpoint. Other tunneling services that produce a publicly reachable HTTPS URL (Cloudflare Tunnel, Tailscale Funnel, and similar) work the same way.
+For example, with a default Ollama install listening on port `11434`, run `ngrok http 11434` and use the resulting `https://*.ngrok-free.app/v1` URL as your endpoint. Other tunneling services that produce a publicly reachable HTTPS URL (Cloudflare Tunnel, Tailscale Funnel, and similar) work the same way.
 
 ## Billing behavior