From f504e33c725290ca6bae3c9ac960b9a04f379e67 Mon Sep 17 00:00:00 2001 From: Brandon Zarnitz Date: Mon, 4 May 2026 12:05:59 -0400 Subject: [PATCH] docs: clarify Ollama and OpenAI-compatible setup --- README.md | 10 ++-- docs/providers/ollama.md | 74 +++++++++++++++++++++++------ docs/providers/openai_compatible.md | 59 +++++++++++++++++++++++ mkdocs.yml | 1 + 4 files changed, 126 insertions(+), 18 deletions(-) create mode 100644 docs/providers/openai_compatible.md diff --git a/README.md b/README.md index 5ca1d1ce5..560700d5c 100644 --- a/README.md +++ b/README.md @@ -530,18 +530,22 @@ CAI_MODEL="alias1" ### 🔹 Custom OpenAI Base URL Support -CAI supports configuring a custom OpenAI API base URL via the `OPENAI_BASE_URL` environment variable. This allows users to redirect API calls to a custom endpoint, such as a proxy or self-hosted OpenAI-compatible service. +CAI supports configuring a custom OpenAI-compatible API base URL via the `OPENAI_BASE_URL` environment variable. Use an `openai/` model prefix so LiteLLM routes the request through the OpenAI-compatible provider path. Example `.env` entry configuration: ``` -OLLAMA_API_BASE="https://custom-openai-proxy.com/v1" +OPENAI_API_KEY="" +OPENAI_BASE_URL="https://custom-openai-proxy.com/v1" +CAI_MODEL="openai/gpt-4.1" ``` Or directly from the command line: ```bash -OLLAMA_API_BASE="https://custom-openai-proxy.com/v1" cai +OPENAI_API_KEY="dummy-key" OPENAI_BASE_URL="http://127.0.0.1:8000/v1" CAI_MODEL="openai/" cai ``` +For local Ollama, prefer `OLLAMA_API_BASE="http://localhost:11434/v1"` with `CAI_MODEL="ollama/"`. + ## :triangular_ruler: Architecture: diff --git a/docs/providers/ollama.md b/docs/providers/ollama.md index 34f32aec1..54dd79bb2 100644 --- a/docs/providers/ollama.md +++ b/docs/providers/ollama.md @@ -3,36 +3,80 @@ ## Ollama Local (Self-hosted) #### [Ollama Integration](https://ollama.com/) -For local models using Ollama, add the following to your .env: + +For local Ollama models, point CAI at Ollama's OpenAI-compatible endpoint and use the `ollama/` model prefix: + +```bash +# Start Ollama separately, then configure CAI +export OLLAMA_API_BASE="http://localhost:11434/v1" +export CAI_MODEL="ollama/qwen2.5:7b" +export CAI_PRICE_LIMIT="0" +export CAI_STREAM="false" + +cai +``` + +Notes: + +- Use the `ollama/` prefix in `CAI_MODEL`. Without it, LiteLLM may not route the request through the Ollama provider. +- Include `/v1` in `OLLAMA_API_BASE`. CAI appends the chat completions path through the OpenAI-compatible client. +- Ollama's default local port is `11434`. If you expose Ollama through Docker or another host, keep the same `/v1` suffix on that base URL. +- A local Ollama setup does not require `OLLAMA_API_KEY`. + +Quick checks: + +```bash +ollama --version +ollama list +curl http://localhost:11434/api/version +``` + +If CAI reports a `404 page not found` from Ollama, check both values: + +```bash +echo "$CAI_MODEL" # should look like ollama/ +echo "$OLLAMA_API_BASE" # should look like http://localhost:11434/v1 +``` + +## Ollama through an OpenAI-compatible endpoint + +If you intentionally want to treat Ollama as a generic OpenAI-compatible endpoint instead of using LiteLLM's Ollama provider, use `OPENAI_BASE_URL` and an `openai/` model prefix: ```bash -CAI_MODEL=qwen2.5:72b -OLLAMA_API_BASE=http://localhost:8000/v1 # note, maybe you have a different endpoint +export OPENAI_API_KEY="dummy-key" +export OPENAI_BASE_URL="http://127.0.0.1:11434/v1" +export CAI_MODEL="openai/qwen2.5:7b" +export CAI_PRICE_LIMIT="0" +export CAI_STREAM="false" + +cai ``` -Make sure that the Ollama server is running and accessible at the specified base URL. You can swap the model with any other supported by your local Ollama instance. +Use this mode for endpoints that speak the OpenAI chat-completions API directly. For normal local Ollama usage, prefer the `OLLAMA_API_BASE` + `ollama/` configuration above. ## Ollama Cloud -For cloud models using Ollama Cloud (no GPU required), add the following to your .env: +For cloud models using Ollama Cloud (no local GPU required), add the following to your `.env`: ```bash -# API Key from ollama.com +# API key from ollama.com OLLAMA_API_KEY=your_api_key_here OLLAMA_API_BASE=https://ollama.com -# Cloud model (note the ollama_cloud/ prefix) +# Cloud model: note the ollama_cloud/ prefix CAI_MODEL=ollama_cloud/gpt-oss:120b ``` -**Requirements:** -1. Create an account at [ollama.com](https://ollama.com) -2. Generate an API key from your profile -3. Use models with `ollama_cloud/` prefix (e.g., `ollama_cloud/gpt-oss:120b`) +Requirements: + +1. Create an account at [ollama.com](https://ollama.com). +2. Generate an API key from your profile. +3. Use models with the `ollama_cloud/` prefix, for example `ollama_cloud/gpt-oss:120b`. + +Key differences: -**Key differences:** -- Prefix: `ollama_cloud/` (cloud) vs `ollama/` (local) -- API Key: Required for cloud, not needed for local -- Endpoint: `https://ollama.com/v1` (cloud) vs `http://localhost:8000/v1` (local) +- Prefix: `ollama/` for local Ollama, `ollama_cloud/` for Ollama Cloud. +- API key: not required for local Ollama, required for Ollama Cloud. +- Endpoint: `http://localhost:11434/v1` for local Ollama, `https://ollama.com` for Ollama Cloud. See [Ollama Cloud documentation](ollama_cloud.md) for detailed setup instructions. diff --git a/docs/providers/openai_compatible.md b/docs/providers/openai_compatible.md new file mode 100644 index 000000000..cdfc7f78d --- /dev/null +++ b/docs/providers/openai_compatible.md @@ -0,0 +1,59 @@ +# OpenAI-Compatible Providers + +CAI can talk to OpenAI-compatible endpoints through LiteLLM. This includes self-hosted proxies and hosted APIs that expose OpenAI-style chat completions, such as OCI Generative AI endpoints fronted by LiteLLM. + +## Configuration + +Use `OPENAI_BASE_URL` for the endpoint and make the model explicit with the `openai/` prefix: + +```bash +export OPENAI_API_KEY="" +export OPENAI_BASE_URL="https://example.com/litellm/v1" +export CAI_MODEL="openai/gpt-4.1" +export CAI_STREAM="false" + +cai +``` + +Why the prefix matters: + +- `OPENAI_BASE_URL` applies to OpenAI-compatible routing. +- The `openai/` prefix tells LiteLLM to route the model through the OpenAI provider path. +- Without the prefix, CAI may treat the model as another provider or as a default Alias model, so the custom base URL can appear to be ignored. + +## Local OpenAI-compatible server example + +For a local server that exposes `/v1/chat/completions`: + +```bash +export OPENAI_API_KEY="dummy-key" +export OPENAI_BASE_URL="http://127.0.0.1:8000/v1" +export CAI_MODEL="openai/" +export CAI_PRICE_LIMIT="0" + +cai +``` + +## Ollama note + +For normal local Ollama usage, prefer the dedicated Ollama provider configuration: + +```bash +export OLLAMA_API_BASE="http://localhost:11434/v1" +export CAI_MODEL="ollama/qwen2.5:7b" +``` + +If you intentionally want to treat Ollama as a generic OpenAI-compatible endpoint, use: + +```bash +export OPENAI_API_KEY="dummy-key" +export OPENAI_BASE_URL="http://127.0.0.1:11434/v1" +export CAI_MODEL="openai/qwen2.5:7b" +``` + +## Troubleshooting + +- `OPENAI_BASE_URL` appears ignored: check that `CAI_MODEL` starts with `openai/`. +- `404 page not found`: confirm the base URL includes `/v1` and that the endpoint supports OpenAI chat completions. +- Authentication errors: confirm the API key required by your proxy/provider is in `OPENAI_API_KEY`. +- Streaming errors: retry with `CAI_STREAM=false` to isolate endpoint compatibility from streaming behavior. diff --git a/mkdocs.yml b/mkdocs.yml index 9a1bd9daf..22cc3c7d1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -105,6 +105,7 @@ nav: - Model Providers: - OpenRouter: providers/openrouter.md - Ollama: providers/ollama.md + - OpenAI-Compatible: providers/openai_compatible.md - Azure OpenAI: providers/azure.md # ========================================