Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -530,18 +530,22 @@ CAI_MODEL="alias1"

### 🔹 Custom OpenAI Base URL Support

CAI supports configuring a custom OpenAI API base URL via the `OPENAI_BASE_URL` environment variable. This allows users to redirect API calls to a custom endpoint, such as a proxy or self-hosted OpenAI-compatible service.
CAI supports configuring a custom OpenAI-compatible API base URL via the `OPENAI_BASE_URL` environment variable. Use an `openai/` model prefix so LiteLLM routes the request through the OpenAI-compatible provider path.

Example `.env` entry configuration:
```
OLLAMA_API_BASE="https://custom-openai-proxy.com/v1"
OPENAI_API_KEY="<your-api-key-or-placeholder>"
OPENAI_BASE_URL="https://custom-openai-proxy.com/v1"
CAI_MODEL="openai/gpt-4.1"
```

Or directly from the command line:
```bash
OLLAMA_API_BASE="https://custom-openai-proxy.com/v1" cai
OPENAI_API_KEY="dummy-key" OPENAI_BASE_URL="http://127.0.0.1:8000/v1" CAI_MODEL="openai/<model-name>" cai
```

For local Ollama, prefer `OLLAMA_API_BASE="http://localhost:11434/v1"` with `CAI_MODEL="ollama/<model-name>"`.


## :triangular_ruler: Architecture:

Expand Down
74 changes: 59 additions & 15 deletions docs/providers/ollama.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,36 +3,80 @@
## Ollama Local (Self-hosted)

#### [Ollama Integration](https://ollama.com/)
For local models using Ollama, add the following to your .env:

For local Ollama models, point CAI at Ollama's OpenAI-compatible endpoint and use the `ollama/` model prefix:

```bash
# Start Ollama separately, then configure CAI
export OLLAMA_API_BASE="http://localhost:11434/v1"
export CAI_MODEL="ollama/qwen2.5:7b"
export CAI_PRICE_LIMIT="0"
export CAI_STREAM="false"

cai
```

Notes:

- Use the `ollama/` prefix in `CAI_MODEL`. Without it, LiteLLM may not route the request through the Ollama provider.
- Include `/v1` in `OLLAMA_API_BASE`. CAI appends the chat completions path through the OpenAI-compatible client.
- Ollama's default local port is `11434`. If you expose Ollama through Docker or another host, keep the same `/v1` suffix on that base URL.
- A local Ollama setup does not require `OLLAMA_API_KEY`.

Quick checks:

```bash
ollama --version
ollama list
curl http://localhost:11434/api/version
```

If CAI reports a `404 page not found` from Ollama, check both values:

```bash
echo "$CAI_MODEL" # should look like ollama/<model-name>
echo "$OLLAMA_API_BASE" # should look like http://localhost:11434/v1
```

## Ollama through an OpenAI-compatible endpoint

If you intentionally want to treat Ollama as a generic OpenAI-compatible endpoint instead of using LiteLLM's Ollama provider, use `OPENAI_BASE_URL` and an `openai/` model prefix:

```bash
CAI_MODEL=qwen2.5:72b
OLLAMA_API_BASE=http://localhost:8000/v1 # note, maybe you have a different endpoint
export OPENAI_API_KEY="dummy-key"
export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
export CAI_MODEL="openai/qwen2.5:7b"
export CAI_PRICE_LIMIT="0"
export CAI_STREAM="false"

cai
```

Make sure that the Ollama server is running and accessible at the specified base URL. You can swap the model with any other supported by your local Ollama instance.
Use this mode for endpoints that speak the OpenAI chat-completions API directly. For normal local Ollama usage, prefer the `OLLAMA_API_BASE` + `ollama/` configuration above.

## Ollama Cloud

For cloud models using Ollama Cloud (no GPU required), add the following to your .env:
For cloud models using Ollama Cloud (no local GPU required), add the following to your `.env`:

```bash
# API Key from ollama.com
# API key from ollama.com
OLLAMA_API_KEY=your_api_key_here
OLLAMA_API_BASE=https://ollama.com

# Cloud model (note the ollama_cloud/ prefix)
# Cloud model: note the ollama_cloud/ prefix
CAI_MODEL=ollama_cloud/gpt-oss:120b
```

**Requirements:**
1. Create an account at [ollama.com](https://ollama.com)
2. Generate an API key from your profile
3. Use models with `ollama_cloud/` prefix (e.g., `ollama_cloud/gpt-oss:120b`)
Requirements:

1. Create an account at [ollama.com](https://ollama.com).
2. Generate an API key from your profile.
3. Use models with the `ollama_cloud/` prefix, for example `ollama_cloud/gpt-oss:120b`.

Key differences:

**Key differences:**
- Prefix: `ollama_cloud/` (cloud) vs `ollama/` (local)
- API Key: Required for cloud, not needed for local
- Endpoint: `https://ollama.com/v1` (cloud) vs `http://localhost:8000/v1` (local)
- Prefix: `ollama/` for local Ollama, `ollama_cloud/` for Ollama Cloud.
- API key: not required for local Ollama, required for Ollama Cloud.
- Endpoint: `http://localhost:11434/v1` for local Ollama, `https://ollama.com` for Ollama Cloud.

See [Ollama Cloud documentation](ollama_cloud.md) for detailed setup instructions.
59 changes: 59 additions & 0 deletions docs/providers/openai_compatible.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# OpenAI-Compatible Providers

CAI can talk to OpenAI-compatible endpoints through LiteLLM. This includes self-hosted proxies and hosted APIs that expose OpenAI-style chat completions, such as OCI Generative AI endpoints fronted by LiteLLM.

## Configuration

Use `OPENAI_BASE_URL` for the endpoint and make the model explicit with the `openai/` prefix:

```bash
export OPENAI_API_KEY="<your-api-key-or-placeholder>"
export OPENAI_BASE_URL="https://example.com/litellm/v1"
export CAI_MODEL="openai/gpt-4.1"
export CAI_STREAM="false"

cai
```

Why the prefix matters:

- `OPENAI_BASE_URL` applies to OpenAI-compatible routing.
- The `openai/` prefix tells LiteLLM to route the model through the OpenAI provider path.
- Without the prefix, CAI may treat the model as another provider or as a default Alias model, so the custom base URL can appear to be ignored.

## Local OpenAI-compatible server example

For a local server that exposes `/v1/chat/completions`:

```bash
export OPENAI_API_KEY="dummy-key"
export OPENAI_BASE_URL="http://127.0.0.1:8000/v1"
export CAI_MODEL="openai/<model-name>"
export CAI_PRICE_LIMIT="0"

cai
```

## Ollama note

For normal local Ollama usage, prefer the dedicated Ollama provider configuration:

```bash
export OLLAMA_API_BASE="http://localhost:11434/v1"
export CAI_MODEL="ollama/qwen2.5:7b"
```

If you intentionally want to treat Ollama as a generic OpenAI-compatible endpoint, use:

```bash
export OPENAI_API_KEY="dummy-key"
export OPENAI_BASE_URL="http://127.0.0.1:11434/v1"
export CAI_MODEL="openai/qwen2.5:7b"
```

## Troubleshooting

- `OPENAI_BASE_URL` appears ignored: check that `CAI_MODEL` starts with `openai/`.
- `404 page not found`: confirm the base URL includes `/v1` and that the endpoint supports OpenAI chat completions.
- Authentication errors: confirm the API key required by your proxy/provider is in `OPENAI_API_KEY`.
- Streaming errors: retry with `CAI_STREAM=false` to isolate endpoint compatibility from streaming behavior.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ nav:
- Model Providers:
- OpenRouter: providers/openrouter.md
- Ollama: providers/ollama.md
- OpenAI-Compatible: providers/openai_compatible.md
- Azure OpenAI: providers/azure.md

# ========================================
Expand Down