Skip to content

docs(custom-inference): document local model tunneling via ngrok#139

Merged
hongyi-chen merged 3 commits into
mainfrom
docs/custom-inference-local-models-ngrok
May 27, 2026
Merged

docs(custom-inference): document local model tunneling via ngrok#139
hongyi-chen merged 3 commits into
mainfrom
docs/custom-inference-local-models-ngrok

Conversation

@hongyi-chen

Copy link
Copy Markdown
Collaborator

Adds a "Using local models" section to the custom inference endpoint docs to address a recurring question (and the cluster of recent GitHub issues, e.g. #11589) about why localhost and private network URLs are rejected.

What changed

  • Explains that endpoint URLs must be publicly accessible because Warp routes inference requests through its servers.
  • Calls out that localhost and other private/local URLs are rejected at configuration time.
  • Documents the supported workaround: expose a local model server (Ollama, LM Studio, vLLM, llama.cpp) through a tunneling service like ngrok.
  • Includes a concrete example using Ollama's default port 11434 with ngrok http 11434, and notes other tunneling services (Cloudflare Tunnel, Tailscale Funnel) work the same way.

Why

Multiple users in the past week filed GitHub issues hitting this — they assumed they could point Warp at http://localhost:11434/v1 for a local Ollama/LM Studio setup. Per Daniel: "the URLs must be publicly accessible, users can set up an ngrok tunnel to use local models. might be worth adding something about that to our docs."

Note for reviewers

The "private/local network URLs" sentence preserves a ********* placeholder for a redacted host pattern carried over from the source Slack/issue text. Happy to swap in a concrete example (e.g. 127.0.0.1, 192.168.x.x) before merge — flag in review if preferred.

Conversation: https://staging.warp.dev/conversation/8adb8bd1-c7f5-4d69-8087-7eae2e699040
Run: https://oz.staging.warp.dev/runs/019e6595-8e80-786b-9a98-21ae29198ab8

This PR was generated with Oz.

Adds a new 'Using local models' section explaining that endpoint URLs
must be publicly accessible (localhost and private network URLs are
rejected) and how to expose a local model server through a tunneling
service like ngrok.

Co-Authored-By: Oz <oz-agent@warp.dev>
@cla-bot cla-bot Bot added the cla-signed label May 26, 2026
@vercel

vercel Bot commented May 26, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment May 26, 2026 6:56pm

Request Review

@hongyi-chen hongyi-chen marked this pull request as ready for review May 26, 2026 18:46
@oz-for-oss

oz-for-oss Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

@hongyi-chen

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

@oz-for-oss oz-for-oss Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR adds local-model tunneling guidance to the custom inference endpoint docs. The new section addresses the right limitation, but the example currently risks leading users to a non-working Ollama endpoint and should add a security caveat before recommending public tunnels.

Concerns

  • The Ollama/ngrok example tells users to configure the tunnel root URL instead of the OpenAI-compatible /v1 base path used by Ollama and the existing setup instructions.
  • The docs recommend exposing a local model server through a public tunnel without warning users to add access controls or close the tunnel when finished.

Security

  • Public tunnels can expose a local model server to anyone with the tunnel URL unless the user configures authentication or access controls.

Verdict

Found: 0 critical, 2 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz


Warp routes inference requests through its servers, so endpoint URLs must be publicly accessible. `localhost`, `127.0.0.1`, and other private or local network URLs are rejected when configuring a custom inference endpoint.

To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [IMPORTANT] [SECURITY] This recommends putting a local model server on a public URL without warning users to protect it; add a caution to require tunnel access controls and close the tunnel when finished.

Suggested change
To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration.
To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration.
:::caution
A public tunnel can expose your local model server to anyone with the URL. Require authentication or access controls in the tunneling service, and close the tunnel when you are done.
:::

Comment thread src/content/docs/agent-platform/inference/custom-inference-endpoint.mdx Outdated
…point.mdx

Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>
@hongyi-chen hongyi-chen merged commit 9db8623 into main May 27, 2026
8 checks passed
@hongyi-chen hongyi-chen deleted the docs/custom-inference-local-models-ngrok branch May 27, 2026 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants