Summary
Add support for any OpenAI-compatible API provider as a selectable option in the Connect Your AI Agents setup screen. Users would click Login, enter their API key and endpoint, select a model, and Claude Code sessions route through that provider — no external proxy or manual configuration needed.
This includes self-hosted models — any local or on-premises inference server that exposes the standard /v1/chat/completions endpoint works out of the box.
Problem
Claude Code only speaks Anthropic API format (/v1/messages). This limits HolyClaude users to Anthropic (direct) or Ollama (which added Anthropic compatibility). The vast majority of third-party inference providers use the OpenAI API format (/v1/chat/completions), which is the de facto industry standard.
Users who want to use these providers today must manually set up an external translation proxy (e.g. LiteLLM), which defeats the purpose of HolyClaude's "just works" philosophy.
Proposal
HolyClaude handles the API translation internally. When a user selects an OpenAI-compatible provider, a built-in lightweight proxy converts Anthropic ↔ OpenAI format behind the scenes. The user never sees or manages this.
Setup screen
☐ Claude Code — Not authenticated [Login]
☐ Cursor — Command timeout [Login]
☐ OpenAI Codex — Codex not configured [Login]
☐ Gemini — Gemini CLI not configured [Login]
☐ OpenAI-Compatible — Not configured [Login] ← NEW
Login flow
- Click Login next to OpenAI-Compatible
- Enter API base URL (e.g.
https://api.together.xyz/v1)
- Enter API key
- Select a model from the provider's catalog
- Done — Claude Code sessions now use that provider
Environment variables (for docker-compose users)
environment:
- OPENAI_COMPAT_API_KEY=your_key
- OPENAI_COMPAT_BASE_URL=https://api.provider.com/v1
- OPENAI_COMPAT_MODEL=meta-llama/Llama-3.3-70B-Instruct
Providers this would unlock
Any provider exposing /v1/chat/completions, including but not limited to:
Cloud providers
| Provider |
Endpoint |
| Hyperstack AI Studio |
https://console.hyperstack.cloud/ai/api/v1 |
| Together AI |
https://api.together.xyz/v1 |
| Groq |
https://api.groq.com/openai/v1 |
| Fireworks AI |
https://api.fireworks.ai/inference/v1 |
| Mistral AI |
https://api.mistral.ai/v1 |
| DeepSeek |
https://api.deepseek.com/v1 |
Self-hosted / local inference
| Server |
Example endpoint |
| vLLM |
http://localhost:8000/v1 |
| TGI (Text Generation Inference) |
http://localhost:8080/v1 |
| Ollama (OpenAI-compat mode) |
http://localhost:11434/v1 |
| LocalAI |
http://localhost:8080/v1 |
| LM Studio |
http://localhost:1234/v1 |
| llama.cpp server |
http://localhost:8080/v1 |
| Any custom deployment |
User-defined |
This means users can run open-weight models (Llama, Mistral, Qwen, DeepSeek, etc.) on their own GPU hardware — whether a local workstation, a home server, or on-prem infrastructure — and use them directly in Claude Code without any cloud dependency.
Why
- "Stop configuring. Start building." — HolyClaude's own tagline. An external proxy contradicts this.
- Cost flexibility — users without an Anthropic subscription get access to open-source models on pay-per-token pricing
- Self-hosted models — teams running models on their own hardware get first-class support with zero cloud dependency
- Fine-tuned models — teams running custom models on their own infrastructure can use them in Claude Code
- One feature, many providers — a single built-in translation layer supports the entire OpenAI-compatible ecosystem
References
Summary
Add support for any OpenAI-compatible API provider as a selectable option in the Connect Your AI Agents setup screen. Users would click Login, enter their API key and endpoint, select a model, and Claude Code sessions route through that provider — no external proxy or manual configuration needed.
This includes self-hosted models — any local or on-premises inference server that exposes the standard
/v1/chat/completionsendpoint works out of the box.Problem
Claude Code only speaks Anthropic API format (
/v1/messages). This limits HolyClaude users to Anthropic (direct) or Ollama (which added Anthropic compatibility). The vast majority of third-party inference providers use the OpenAI API format (/v1/chat/completions), which is the de facto industry standard.Users who want to use these providers today must manually set up an external translation proxy (e.g. LiteLLM), which defeats the purpose of HolyClaude's "just works" philosophy.
Proposal
HolyClaude handles the API translation internally. When a user selects an OpenAI-compatible provider, a built-in lightweight proxy converts Anthropic ↔ OpenAI format behind the scenes. The user never sees or manages this.
Setup screen
Login flow
https://api.together.xyz/v1)Environment variables (for docker-compose users)
Providers this would unlock
Any provider exposing
/v1/chat/completions, including but not limited to:Cloud providers
https://console.hyperstack.cloud/ai/api/v1https://api.together.xyz/v1https://api.groq.com/openai/v1https://api.fireworks.ai/inference/v1https://api.mistral.ai/v1https://api.deepseek.com/v1Self-hosted / local inference
http://localhost:8000/v1http://localhost:8080/v1http://localhost:11434/v1http://localhost:8080/v1http://localhost:1234/v1http://localhost:8080/v1This means users can run open-weight models (Llama, Mistral, Qwen, DeepSeek, etc.) on their own GPU hardware — whether a local workstation, a home server, or on-prem infrastructure — and use them directly in Claude Code without any cloud dependency.
Why
References
ANTHROPIC_BASE_URL