From f504e33c725290ca6bae3c9ac960b9a04f379e67 Mon Sep 17 00:00:00 2001
From: Brandon Zarnitz <bzarnitz13@gmail.com>
Date: Mon, 4 May 2026 12:05:59 -0400
Subject: [PATCH] docs: clarify Ollama and OpenAI-compatible setup

---
 README.md                           | 10 ++--
 docs/providers/ollama.md            | 74 +++++++++++++++++++++++------
 docs/providers/openai_compatible.md | 59 +++++++++++++++++++++++
 mkdocs.yml                          |  1 +
 4 files changed, 126 insertions(+), 18 deletions(-)
 create mode 100644 docs/providers/openai_compatible.md
diff --git a/README.md b/README.md
index 5ca1d1ce5..560700d5c 100644
--- a/README.md
+++ b/README.md
@@ -530,18 +530,22 @@ CAI_MODEL="alias1"
 
 ### 🔹 Custom OpenAI Base URL Support
 
-CAI supports configuring a custom OpenAI API base URL via the `OPENAI_BASE_URL` environment variable. This allows users to redirect API calls to a custom endpoint, such as a proxy or self-hosted OpenAI-compatible service.
+CAI supports configuring a custom OpenAI-compatible API base URL via the `OPENAI_BASE_URL` environment variable. Use an `openai/` model prefix so LiteLLM routes the request through the OpenAI-compatible provider path.
 
 Example `.env` entry configuration:
 ```
-OLLAMA_API_BASE="https://custom-openai-proxy.com/v1"
+OPENAI_API_KEY="<your-api-key-or-placeholder>"
+OPENAI_BASE_URL="https://custom-openai-proxy.com/v1"
+CAI_MODEL="openai/gpt-4.1"
 ```
 
 Or directly from the command line:
 ```bash
-OLLAMA_API_BASE="https://custom-openai-proxy.com/v1" cai
+OPENAI_API_KEY="dummy-key" OPENAI_BASE_URL="http://127.0.0.1:8000/v1" CAI_MODEL="openai/<model-name>" cai
 ```
 
+For local Ollama, prefer `OLLAMA_API_BASE="http://localhost:11434/v1"` with `CAI_MODEL="ollama/<model-name>"`.
+
 
 ## :triangular_ruler: Architecture:
 
diff --git a/docs/providers/ollama.md b/docs/providers/ollama.md
index 34f32aec1..54dd79bb2 100644
--- a/docs/providers/ollama.md
+++ b/docs/providers/ollama.md
@@ -3,36 +3,80 @@
 ## Ollama Local (Self-hosted)
 
 #### [Ollama Integration](https://ollama.com/)
-For local models using Ollama, add the following to your .env:
+
+For local Ollama models, point CAI at Ollama's OpenAI-compatible endpoint and use the `ollama/` model prefix:
+
+```bash
+# Start Ollama separately, then configure CAI
+export OLLAMA_API_BASE="http://localhost:11434/v1"
+export CAI_MODEL="ollama/qwen2.5:7b"
+export CAI_PRICE_LIMIT="0"
+export CAI_STREAM="false"
+
+cai
+```
+
+Notes:
+
+- Use the `ollama/` prefix in `CAI_MODEL`. Without it, LiteLLM may not route the request through the Ollama provider.
+- Include `/v1` in `OLLAMA_API_BASE`. CAI appends the chat completions path through the OpenAI-compatible client.
+- Ollama's default local port is `11434`. If you expose Ollama through Docker or another host, keep the same `/v1` suffix on that base URL.
+- A local Ollama setup does not require `OLLAMA_API_KEY`.
+
+Quick checks:
+
+```bash
+ollama --version
+ollama list
+curl http://localhost:11434/api/version
+```
+
+If CAI reports a `404 page not found` from Ollama, check both values:
+
+```bash
+echo "$CAI_MODEL"        # should look like ollama/<model-name>
+echo "$OLLAMA_API_BASE"  # should look like http://localhost:11434/v1
+```
+
+## Ollama through an OpenAI-compatible endpoint
+
+If you intentionally want to treat Ollama as a generic OpenAI-compatible endpoint instead of using LiteLLM's Ollama provider, use `OPENAI_BASE_URL` and an `openai/` model prefix:
 
 ```bash
-CAI_MODEL=qwen2.5:72b
-OLLAMA_API_BASE=http://localhost:8000/v1 # note, maybe you have a different endpoint
+export OPENAI_API_KEY="dummy-key"
+export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
+export CAI_MODEL="openai/qwen2.5:7b"
+export CAI_PRICE_LIMIT="0"
+export CAI_STREAM="false"
+
+cai
 ```
 
-Make sure that the Ollama server is running and accessible at the specified base URL. You can swap the model with any other supported by your local Ollama instance.
+Use this mode for endpoints that speak the OpenAI chat-completions API directly. For normal local Ollama usage, prefer the `OLLAMA_API_BASE` + `ollama/` configuration above.
 
 ## Ollama Cloud
 
-For cloud models using Ollama Cloud (no GPU required), add the following to your .env:
+For cloud models using Ollama Cloud (no local GPU required), add the following to your `.env`:
 
 ```bash
-# API Key from ollama.com
+# API key from ollama.com
 OLLAMA_API_KEY=your_api_key_here
 OLLAMA_API_BASE=https://ollama.com
 
-# Cloud model (note the ollama_cloud/ prefix)
+# Cloud model: note the ollama_cloud/ prefix
 CAI_MODEL=ollama_cloud/gpt-oss:120b
 ```
 
-**Requirements:**
-1. Create an account at [ollama.com](https://ollama.com)
-2. Generate an API key from your profile
-3. Use models with `ollama_cloud/` prefix (e.g., `ollama_cloud/gpt-oss:120b`)
+Requirements:
+
+1. Create an account at [ollama.com](https://ollama.com).
+2. Generate an API key from your profile.
+3. Use models with the `ollama_cloud/` prefix, for example `ollama_cloud/gpt-oss:120b`.
+
+Key differences:
 
-**Key differences:**
-- Prefix: `ollama_cloud/` (cloud) vs `ollama/` (local)
-- API Key: Required for cloud, not needed for local
-- Endpoint: `https://ollama.com/v1` (cloud) vs `http://localhost:8000/v1` (local)
+- Prefix: `ollama/` for local Ollama, `ollama_cloud/` for Ollama Cloud.
+- API key: not required for local Ollama, required for Ollama Cloud.
+- Endpoint: `http://localhost:11434/v1` for local Ollama, `https://ollama.com` for Ollama Cloud.
 
 See [Ollama Cloud documentation](ollama_cloud.md) for detailed setup instructions.
diff --git a/docs/providers/openai_compatible.md b/docs/providers/openai_compatible.md
new file mode 100644
index 000000000..cdfc7f78d
--- /dev/null
+++ b/docs/providers/openai_compatible.md
@@ -0,0 +1,59 @@
+# OpenAI-Compatible Providers
+
+CAI can talk to OpenAI-compatible endpoints through LiteLLM. This includes self-hosted proxies and hosted APIs that expose OpenAI-style chat completions, such as OCI Generative AI endpoints fronted by LiteLLM.
+
+## Configuration
+
+Use `OPENAI_BASE_URL` for the endpoint and make the model explicit with the `openai/` prefix:
+
+```bash
+export OPENAI_API_KEY="<your-api-key-or-placeholder>"
+export OPENAI_BASE_URL="https://example.com/litellm/v1"
+export CAI_MODEL="openai/gpt-4.1"
+export CAI_STREAM="false"
+
+cai
+```
+
+Why the prefix matters:
+
+- `OPENAI_BASE_URL` applies to OpenAI-compatible routing.
+- The `openai/` prefix tells LiteLLM to route the model through the OpenAI provider path.
+- Without the prefix, CAI may treat the model as another provider or as a default Alias model, so the custom base URL can appear to be ignored.
+
+## Local OpenAI-compatible server example
+
+For a local server that exposes `/v1/chat/completions`:
+
+```bash
+export OPENAI_API_KEY="dummy-key"
+export OPENAI_BASE_URL="http://127.0.0.1:8000/v1"
+export CAI_MODEL="openai/<model-name>"
+export CAI_PRICE_LIMIT="0"
+
+cai
+```
+
+## Ollama note
+
+For normal local Ollama usage, prefer the dedicated Ollama provider configuration:
+
+```bash
+export OLLAMA_API_BASE="http://localhost:11434/v1"
+export CAI_MODEL="ollama/qwen2.5:7b"
+```
+
+If you intentionally want to treat Ollama as a generic OpenAI-compatible endpoint, use:
+
+```bash
+export OPENAI_API_KEY="dummy-key"
+export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
+export CAI_MODEL="openai/qwen2.5:7b"
+```
+
+## Troubleshooting
+
+- `OPENAI_BASE_URL` appears ignored: check that `CAI_MODEL` starts with `openai/`.
+- `404 page not found`: confirm the base URL includes `/v1` and that the endpoint supports OpenAI chat completions.
+- Authentication errors: confirm the API key required by your proxy/provider is in `OPENAI_API_KEY`.
+- Streaming errors: retry with `CAI_STREAM=false` to isolate endpoint compatibility from streaming behavior.
diff --git a/mkdocs.yml b/mkdocs.yml
index 9a1bd9daf..22cc3c7d1 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -105,6 +105,7 @@ nav:
       - Model Providers:
           - OpenRouter: providers/openrouter.md
           - Ollama: providers/ollama.md
+          - OpenAI-Compatible: providers/openai_compatible.md
           - Azure OpenAI: providers/azure.md
 
   # ========================================