docs(custom-inference): document local model tunneling via ngrok by hongyi-chen · Pull Request #139 · warpdotdev/docs

hongyi-chen · 2026-05-26T18:43:55Z

Adds a "Using local models" section to the custom inference endpoint docs to address a recurring question (and the cluster of recent GitHub issues, e.g. #11589) about why localhost and private network URLs are rejected.

What changed

Explains that endpoint URLs must be publicly accessible because Warp routes inference requests through its servers.
Calls out that localhost and other private/local URLs are rejected at configuration time.
Documents the supported workaround: expose a local model server (Ollama, LM Studio, vLLM, llama.cpp) through a tunneling service like ngrok.
Includes a concrete example using Ollama's default port 11434 with ngrok http 11434, and notes other tunneling services (Cloudflare Tunnel, Tailscale Funnel) work the same way.

Why

Multiple users in the past week filed GitHub issues hitting this — they assumed they could point Warp at http://localhost:11434/v1 for a local Ollama/LM Studio setup. Per Daniel: "the URLs must be publicly accessible, users can set up an ngrok tunnel to use local models. might be worth adding something about that to our docs."

Note for reviewers

The "private/local network URLs" sentence preserves a ********* placeholder for a redacted host pattern carried over from the source Slack/issue text. Happy to swap in a concrete example (e.g. 127.0.0.1, 192.168.x.x) before merge — flag in review if preferred.

Conversation: https://staging.warp.dev/conversation/8adb8bd1-c7f5-4d69-8087-7eae2e699040
Run: https://oz.staging.warp.dev/runs/019e6595-8e80-786b-9a98-21ae29198ab8

This PR was generated with Oz.

Adds a new 'Using local models' section explaining that endpoint URLs must be publicly accessible (localhost and private network URLs are rejected) and how to expose a local model server through a tunneling service like ngrok. Co-Authored-By: Oz <oz-agent@warp.dev>

vercel · 2026-05-26T18:44:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
docs	Ready	Preview, Comment	May 26, 2026 6:56pm

oz-for-oss · 2026-05-26T18:46:52Z

@hongyi-chen

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

oz-for-oss

Overview

This PR adds local-model tunneling guidance to the custom inference endpoint docs. The new section addresses the right limitation, but the example currently risks leading users to a non-working Ollama endpoint and should add a security caveat before recommending public tunnels.

Concerns

The Ollama/ngrok example tells users to configure the tunnel root URL instead of the OpenAI-compatible /v1 base path used by Ollama and the existing setup instructions.
The docs recommend exposing a local model server through a public tunnel without warning users to add access controls or close the tunnel when finished.

Security

Public tunnels can expose a local model server to anyone with the tunnel URL unless the user configures authentication or access controls.

Verdict

Found: 0 critical, 2 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

oz-for-oss · 2026-05-26T18:50:27Z

+
+Warp routes inference requests through its servers, so endpoint URLs must be publicly accessible. `localhost`, `127.0.0.1`, and other private or local network URLs are rejected when configuring a custom inference endpoint.
+
+To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration.


⚠️ [IMPORTANT] [SECURITY] This recommends putting a local model server on a public URL without warning users to protect it; add a caution to require tunnel access controls and close the tunnel when finished.

Suggested change

To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration.

To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration.

:::caution

A public tunnel can expose your local model server to anyone with the URL. Require authentication or access controls in the tunneling service, and close the tunnel when you are done.

:::

…point.mdx Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>

cla-bot Bot added the cla-signed label May 26, 2026

vercel Bot deployed to Preview May 26, 2026 18:45 View deployment

hongyi-chen marked this pull request as ready for review May 26, 2026 18:46

Merge branch 'main' into docs/custom-inference-local-models-ngrok

3f8b3ad

vercel Bot deployed to Preview May 26, 2026 18:49 View deployment

oz-for-oss Bot reviewed May 26, 2026

View reviewed changes

Update src/content/docs/agent-platform/inference/custom-inference-end…

0874dac

…point.mdx Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>

vercel Bot deployed to Preview May 26, 2026 18:56 View deployment

hongyi-chen requested a review from danielpeng2 May 26, 2026 22:54

danielpeng2 approved these changes May 27, 2026

View reviewed changes

hongyi-chen merged commit 9db8623 into main May 27, 2026
8 checks passed

hongyi-chen deleted the docs/custom-inference-local-models-ngrok branch May 27, 2026 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(custom-inference): document local model tunneling via ngrok#139

docs(custom-inference): document local model tunneling via ngrok#139
hongyi-chen merged 3 commits into
mainfrom
docs/custom-inference-local-models-ngrok

hongyi-chen commented May 26, 2026

Uh oh!

vercel Bot commented May 26, 2026 •

edited

Loading

Uh oh!

oz-for-oss Bot commented May 26, 2026 •

edited

Loading

Uh oh!

oz-for-oss Bot left a comment

Uh oh!

oz-for-oss Bot May 26, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Warp routes inference requests through its servers, so endpoint URLs must be publicly accessible. `localhost`, `127.0.0.1`, and other private or local network URLs are rejected when configuring a custom inference endpoint.

		To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration.

Conversation

hongyi-chen commented May 26, 2026

What changed

Why

Note for reviewers

Uh oh!

vercel Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oz-for-oss Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oz-for-oss Bot left a comment

Choose a reason for hiding this comment

Overview

Concerns

Security

Verdict

Uh oh!

oz-for-oss Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 26, 2026 •

edited

Loading

oz-for-oss Bot commented May 26, 2026 •

edited

Loading