docs(custom-inference): document local model tunneling via ngrok#139
Conversation
Adds a new 'Using local models' section explaining that endpoint URLs must be publicly accessible (localhost and private network URLs are rejected) and how to expose a local model server through a tunneling service like ngrok. Co-Authored-By: Oz <oz-agent@warp.dev>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR adds local-model tunneling guidance to the custom inference endpoint docs. The new section addresses the right limitation, but the example currently risks leading users to a non-working Ollama endpoint and should add a security caveat before recommending public tunnels.
Concerns
- The Ollama/ngrok example tells users to configure the tunnel root URL instead of the OpenAI-compatible
/v1base path used by Ollama and the existing setup instructions. - The docs recommend exposing a local model server through a public tunnel without warning users to add access controls or close the tunnel when finished.
Security
- Public tunnels can expose a local model server to anyone with the tunnel URL unless the user configures authentication or access controls.
Verdict
Found: 0 critical, 2 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
|
|
||
| Warp routes inference requests through its servers, so endpoint URLs must be publicly accessible. `localhost`, `127.0.0.1`, and other private or local network URLs are rejected when configuring a custom inference endpoint. | ||
|
|
||
| To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration. |
There was a problem hiding this comment.
| To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration. | |
| To route through a model running on your own machine (for example, Ollama, LM Studio, vLLM, or llama.cpp), expose it through a tunneling service like [ngrok](https://ngrok.com/) and use the public tunnel URL as the base URL in your endpoint configuration. | |
| :::caution | |
| A public tunnel can expose your local model server to anyone with the URL. Require authentication or access controls in the tunneling service, and close the tunnel when you are done. | |
| ::: |
…point.mdx Co-authored-by: oz-for-oss[bot] <277970191+oz-for-oss[bot]@users.noreply.github.com>
Adds a "Using local models" section to the custom inference endpoint docs to address a recurring question (and the cluster of recent GitHub issues, e.g. #11589) about why
localhostand private network URLs are rejected.What changed
localhostand other private/local URLs are rejected at configuration time.11434withngrok http 11434, and notes other tunneling services (Cloudflare Tunnel, Tailscale Funnel) work the same way.Why
Multiple users in the past week filed GitHub issues hitting this — they assumed they could point Warp at
http://localhost:11434/v1for a local Ollama/LM Studio setup. Per Daniel: "the URLs must be publicly accessible, users can set up an ngrok tunnel to use local models. might be worth adding something about that to our docs."Note for reviewers
The "private/local network URLs" sentence preserves a
*********placeholder for a redacted host pattern carried over from the source Slack/issue text. Happy to swap in a concrete example (e.g.127.0.0.1,192.168.x.x) before merge — flag in review if preferred.Conversation: https://staging.warp.dev/conversation/8adb8bd1-c7f5-4d69-8087-7eae2e699040
Run: https://oz.staging.warp.dev/runs/019e6595-8e80-786b-9a98-21ae29198ab8
This PR was generated with Oz.