diff --git a/README.md b/README.md index f85a06f..2077d16 100644 --- a/README.md +++ b/README.md @@ -1,21 +1,53 @@ # opencode-llm-proxy -An [OpenCode](https://opencode.ai) plugin that starts a local HTTP server backed by your OpenCode providers, with support for multiple LLM API formats: +[![npm](https://img.shields.io/npm/v/opencode-llm-proxy)](https://www.npmjs.com/package/opencode-llm-proxy) +[![npm downloads](https://img.shields.io/npm/dm/opencode-llm-proxy)](https://www.npmjs.com/package/opencode-llm-proxy) +[![CI](https://github.com/KochC/opencode-llm-proxy/actions/workflows/ci.yml/badge.svg)](https://github.com/KochC/opencode-llm-proxy/actions/workflows/ci.yml) +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) -- **OpenAI** Chat Completions (`POST /v1/chat/completions`) and Responses (`POST /v1/responses`) -- **Anthropic** Messages API (`POST /v1/messages`) -- **Google Gemini** API (`POST /v1beta/models/:model:generateContent`) +**One local endpoint. Every model you have access to. Any API format.** -Any tool or SDK that targets one of these APIs can point at the proxy without code changes. +opencode-llm-proxy is an [OpenCode](https://opencode.ai) plugin that starts a local HTTP server on `http://127.0.0.1:4010`. It translates between the API format your tool speaks and whichever LLM provider OpenCode has configured — so you never reconfigure the same models twice. + +``` +Your tool (OpenAI / Anthropic / Gemini SDK) + │ + ▼ http://127.0.0.1:4010 + opencode-llm-proxy + │ + ▼ OpenCode SDK + GitHub Copilot · Anthropic · Gemini · Ollama · OpenRouter · Bedrock · … +``` + +**Supported API formats — all with streaming:** + +| Format | Endpoint | +|---|---| +| OpenAI Chat Completions | `POST /v1/chat/completions` | +| OpenAI Responses API | `POST /v1/responses` | +| Anthropic Messages API | `POST /v1/messages` | +| Google Gemini | `POST /v1beta/models/:model:generateContent` | + +--- + +## Why + +Most LLM tools speak exactly one API dialect. OpenCode already manages connections to every provider you use. This proxy bridges the two — your tools keep working as-is, and you change which model they use in one place. + +**Common situations it solves:** + +- You have a **GitHub Copilot** subscription. Open WebUI, Chatbox, or a VS Code extension only accepts an OpenAI-compatible URL. Point them at the proxy — done. +- You run **Ollama** locally. Your Python scripts use the OpenAI SDK. Set `base_url` to the proxy and use your Ollama model IDs directly. +- You want to **swap models without code changes**. Your app talks to the proxy; you change the model in OpenCode config. +- You want to **share your models on a LAN**. Expose the proxy on `0.0.0.0` and give teammates the URL. +- You use the **Anthropic SDK** but want to route through GitHub Copilot or Bedrock. No code change in the SDK — just point it at the proxy. + +--- ## Quickstart ```bash -# 1. Install the npm package npm install opencode-llm-proxy - -# 2. Register the plugin in your opencode.json -# (or use one of the manual install methods below) ``` Add to `opencode.json`: @@ -26,11 +58,10 @@ Add to `opencode.json`: } ``` -Then start OpenCode — the proxy starts automatically: +Start OpenCode — the proxy starts automatically: ```bash opencode -# Proxy is now listening on http://127.0.0.1:4010 ``` Send a request: @@ -44,15 +75,17 @@ curl http://127.0.0.1:4010/v1/chat/completions \ }' ``` +--- + ## Install -### As an npm plugin (recommended) +### npm plugin (recommended) ```bash npm install opencode-llm-proxy ``` -Add to `opencode.json`: +Add to your global `~/.config/opencode/opencode.json` (works everywhere) or a project-level `opencode.json`: ```json { @@ -60,170 +93,263 @@ Add to `opencode.json`: } ``` -### As a global OpenCode plugin +### Copy the file -Copy `index.js` to your global plugin directory: +**Global** — loaded for every OpenCode session: ```bash -cp index.js ~/.config/opencode/plugins/openai-proxy.js +curl -o ~/.config/opencode/plugins/llm-proxy.js \ + https://raw.githubusercontent.com/KochC/opencode-llm-proxy/main/index.js ``` -The plugin is loaded automatically every time OpenCode starts. - -### As a project plugin - -Copy `index.js` to your project's plugin directory: +**Per-project** — loaded only in this directory: ```bash -cp index.js .opencode/plugins/openai-proxy.js +mkdir -p .opencode/plugins +curl -o .opencode/plugins/llm-proxy.js \ + https://raw.githubusercontent.com/KochC/opencode-llm-proxy/main/index.js ``` -## Usage +--- -Start OpenCode normally. The proxy server starts automatically in the background: +## Configuration -``` +| Variable | Default | Description | +|---|---|---| +| `OPENCODE_LLM_PROXY_HOST` | `127.0.0.1` | Bind address. `0.0.0.0` to expose on LAN or Docker. | +| `OPENCODE_LLM_PROXY_PORT` | `4010` | TCP port. | +| `OPENCODE_LLM_PROXY_TOKEN` | _(unset)_ | Bearer token required on every request. Unset = no auth. | +| `OPENCODE_LLM_PROXY_CORS_ORIGIN` | `*` | `Access-Control-Allow-Origin` value for browser clients. | + +```bash +OPENCODE_LLM_PROXY_HOST=0.0.0.0 \ +OPENCODE_LLM_PROXY_TOKEN=my-secret \ opencode ``` -The server listens on `http://127.0.0.1:4010` by default. +--- -### List available models +## Using with SDKs and tools -```bash -curl http://127.0.0.1:4010/v1/models -``` +### OpenAI SDK (JS/TS) -Returns all models from all providers configured in your OpenCode setup (e.g. `github-copilot/claude-sonnet-4.6`, `ollama/qwen3.5:9b`, etc.). +```javascript +import OpenAI from "openai" -### OpenAI Chat Completions +const client = new OpenAI({ + baseURL: "http://127.0.0.1:4010/v1", + apiKey: "unused", +}) -```bash -curl http://127.0.0.1:4010/v1/chat/completions \ - -H "Content-Type: application/json" \ - -d '{ - "model": "github-copilot/claude-sonnet-4.6", - "messages": [ - {"role": "user", "content": "Write a haiku about OpenCode."} - ] - }' +const response = await client.chat.completions.create({ + model: "github-copilot/claude-sonnet-4.6", + messages: [{ role: "user", content: "Explain recursion." }], +}) ``` -Use the fully-qualified `provider/model` ID from `GET /v1/models`. Supports `"stream": true` for SSE streaming. +### OpenAI SDK (Python) -### OpenAI Responses API +```python +from openai import OpenAI -```bash -curl http://127.0.0.1:4010/v1/responses \ - -H "Content-Type: application/json" \ - -d '{ - "model": "github-copilot/claude-sonnet-4.6", - "input": [{"role": "user", "content": "Hello"}] - }' +client = OpenAI(base_url="http://127.0.0.1:4010/v1", api_key="unused") + +response = client.chat.completions.create( + model="ollama/qwen2.5-coder", + messages=[{"role": "user", "content": "Write a Python function to reverse a string."}], +) +print(response.choices[0].message.content) ``` -Supports `"stream": true` for SSE streaming. +### Anthropic SDK (Python) -### Anthropic Messages API +```python +import anthropic -Point the Anthropic SDK (or any client) at this proxy: +client = anthropic.Anthropic( + base_url="http://127.0.0.1:4010", + api_key="unused", +) -```bash -curl http://127.0.0.1:4010/v1/messages \ - -H "Content-Type: application/json" \ - -d '{ - "model": "anthropic/claude-3-5-sonnet", - "max_tokens": 1024, - "system": "You are a helpful assistant.", - "messages": [{"role": "user", "content": "Hello!"}] - }' +message = client.messages.create( + model="anthropic/claude-3-5-sonnet", + max_tokens=1024, + messages=[{"role": "user", "content": "What is the Pythagorean theorem?"}], +) +print(message.content[0].text) ``` -Supports `"stream": true` for SSE streaming with standard Anthropic streaming events (`message_start`, `content_block_delta`, `message_stop`, etc.). +### Anthropic SDK (JS/TS) -To point the official Anthropic SDK at this proxy: - -```js +```javascript import Anthropic from "@anthropic-ai/sdk" const client = new Anthropic({ baseURL: "http://127.0.0.1:4010", - apiKey: "unused", // or your OPENCODE_LLM_PROXY_TOKEN + apiKey: "unused", +}) + +const message = await client.messages.create({ + model: "anthropic/claude-opus-4", + max_tokens: 1024, + messages: [{ role: "user", content: "Explain async/await." }], }) ``` -### Google Gemini API +### Google Generative AI SDK (JS/TS) -```bash -# Non-streaming -curl http://127.0.0.1:4010/v1beta/models/google/gemini-2.0-flash:generateContent \ - -H "Content-Type: application/json" \ - -d '{ - "contents": [{"role": "user", "parts": [{"text": "Hello!"}]}] - }' +```javascript +import { GoogleGenerativeAI } from "@google/generative-ai" -# Streaming (newline-delimited JSON) -curl http://127.0.0.1:4010/v1beta/models/google/gemini-2.0-flash:streamGenerateContent \ - -H "Content-Type: application/json" \ - -d '{ - "contents": [{"role": "user", "parts": [{"text": "Hello!"}]}] - }' +const genAI = new GoogleGenerativeAI("unused", { + baseUrl: "http://127.0.0.1:4010", +}) + +const model = genAI.getGenerativeModel({ model: "google/gemini-2.0-flash" }) +const result = await model.generateContent("What is machine learning?") +console.log(result.response.text()) ``` -The model name in the URL path is resolved the same way as other endpoints (use `provider/model` or a bare model ID if unambiguous). +### LangChain (Python) -To point the Google Generative AI SDK at this proxy, set the `baseUrl` option to `http://127.0.0.1:4010`. +```python +from langchain_openai import ChatOpenAI -## Selecting a provider +llm = ChatOpenAI( + model="anthropic/claude-3-5-sonnet", + openai_api_base="http://127.0.0.1:4010/v1", + openai_api_key="unused", +) -All endpoints accept an optional `x-opencode-provider` header to force a specific provider when the model ID is ambiguous: +response = llm.invoke("What are the SOLID principles?") +print(response.content) +``` -```bash -curl http://127.0.0.1:4010/v1/chat/completions \ - -H "x-opencode-provider: anthropic" \ - -H "Content-Type: application/json" \ - -d '{"model": "claude-3-5-sonnet", "messages": [...]}' +### Open WebUI + +1. Settings → Connections → OpenAI API +2. Set **API Base URL** to `http://127.0.0.1:4010/v1` +3. Leave API Key blank (or set to your `OPENCODE_LLM_PROXY_TOKEN`) +4. Save — all your OpenCode models appear in the model picker + +> Running Open WebUI in Docker? Use `http://host.docker.internal:4010/v1` and set `OPENCODE_LLM_PROXY_HOST=0.0.0.0`. + +### Chatbox + +Settings → AI Provider → OpenAI API → set **API Host** to `http://127.0.0.1:4010`. + +### Continue (VS Code / JetBrains) + +In `~/.continue/config.json`: + +```json +{ + "models": [ + { + "title": "Claude via OpenCode", + "provider": "openai", + "model": "anthropic/claude-3-5-sonnet", + "apiBase": "http://127.0.0.1:4010/v1", + "apiKey": "unused" + } + ] +} ``` -## Configuration +### Zed -All configuration is done through environment variables. No configuration file is needed. +In `~/.config/zed/settings.json`: -| Variable | Type | Default | Description | -|---|---|---|---| -| `OPENCODE_LLM_PROXY_HOST` | string | `127.0.0.1` | Bind address. Set to `0.0.0.0` to expose on LAN. | -| `OPENCODE_LLM_PROXY_PORT` | integer | `4010` | TCP port the proxy listens on. | -| `OPENCODE_LLM_PROXY_TOKEN` | string | _(unset)_ | Optional bearer token. When set, every request must include `Authorization: Bearer `. Unset means no authentication required. | -| `OPENCODE_LLM_PROXY_CORS_ORIGIN` | string | `*` | Value of the `Access-Control-Allow-Origin` response header. Use a specific origin (e.g. `https://app.example.com`) when browser clients send credentials. | +```json +{ + "language_models": { + "openai": { + "api_url": "http://127.0.0.1:4010/v1", + "available_models": [ + { + "name": "github-copilot/claude-sonnet-4.6", + "display_name": "Claude (OpenCode)", + "max_tokens": 8096 + } + ] + } + } +} +``` -The proxy adds CORS headers to all responses and handles `OPTIONS` preflight requests automatically. +--- -### LAN example +## Finding model IDs ```bash -export OPENCODE_LLM_PROXY_HOST=0.0.0.0 -export OPENCODE_LLM_PROXY_PORT=4010 -export OPENCODE_LLM_PROXY_TOKEN=my-secret-token -opencode +curl http://127.0.0.1:4010/v1/models | jq '.data[].id' +# "github-copilot/claude-sonnet-4.6" +# "anthropic/claude-3-5-sonnet" +# "ollama/qwen2.5-coder" +# ... ``` -Then from another machine: +Use `provider/model` for clarity. Bare model IDs (e.g. `gpt-4o`) work if unambiguous across your providers. -```bash -curl http://:4010/v1/models \ - -H "Authorization: Bearer my-secret-token" +To force a specific provider without changing the model string, add: + +``` +x-opencode-provider: anthropic +``` + +--- + +## API reference + +### GET /health +```json +{ "healthy": true, "service": "opencode-openai-proxy" } ``` +### GET /v1/models +Returns all models from all configured providers in OpenAI list format. + +### POST /v1/chat/completions +OpenAI Chat Completions. Required fields: `model`, `messages`. Optional: `stream`, `temperature`, `max_tokens`. + +### POST /v1/responses +OpenAI Responses API. Required fields: `model`, `input`. Optional: `instructions`, `stream`, `max_output_tokens`. + +### POST /v1/messages +Anthropic Messages API. Required fields: `model`, `messages`. Optional: `system`, `max_tokens`, `stream`. + +Errors are returned in Anthropic format: `{ "type": "error", "error": { "type": "...", "message": "..." } }`. + +### POST /v1beta/models/:model:generateContent +Google Gemini non-streaming. Model name in URL path. Required field: `contents`. Optional: `systemInstruction`, `generationConfig`. + +### POST /v1beta/models/:model:streamGenerateContent +Same as above, returns newline-delimited JSON stream. + +--- + ## How it works -The plugin hooks into OpenCode at startup and spawns a Bun HTTP server. Incoming requests (in OpenAI, Anthropic, or Gemini format) are translated into OpenCode SDK calls (`client.session.create` + `client.session.prompt`), routed through whichever provider/model is requested, and the response is returned in the matching API format. +Each request: + +1. Is authenticated if `OPENCODE_LLM_PROXY_TOKEN` is set +2. Has its model resolved — `provider/model`, bare model ID, or Gemini URL path +3. Creates a temporary OpenCode session (visible in the session list) +4. Sends the prompt via `client.session.prompt` / `client.session.promptAsync` +5. Returns the response in the same format as the request -Each request creates a temporary OpenCode session, so prompts and responses appear in the OpenCode session list. +Streaming uses OpenCode's `client.event.subscribe()` SSE stream. Text deltas are forwarded in real time. + +--- ## Limitations -- Tool/function calling is not forwarded; all built-in OpenCode tools are disabled for proxy sessions. -- Only text content is handled; image and file inputs are ignored. +- Text only — image, audio, and file inputs are ignored +- No tool/function calling — all OpenCode tools are disabled for proxy sessions +- No cross-request session state — send full conversation history on every request +- Temperature and max tokens are advisory (passed as system prompt hints) + +--- ## License diff --git a/index.js b/index.js index 227f01e..d6ac27c 100644 --- a/index.js +++ b/index.js @@ -30,7 +30,7 @@ function corsHeaders(request) { } function json(data, status = 200, headers = {}, request) { - return new Response(JSON.stringify(data, null, 2), { + return new Response(JSON.stringify(data), { status, headers: { "content-type": "application/json; charset=utf-8", diff --git a/package.json b/package.json index e872da7..30bf2d9 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { "name": "opencode-llm-proxy", "version": "1.6.0", - "description": "OpenCode plugin that exposes an OpenAI-compatible HTTP proxy backed by any LLM provider configured in OpenCode", + "description": "Local AI gateway for OpenCode — use any model via OpenAI, Anthropic, or Gemini API format", "main": "index.js", "type": "module", "engines": { @@ -16,8 +16,23 @@ "opencode", "opencode-plugin", "openai", + "openai-compatible", + "anthropic", + "gemini", + "ollama", "proxy", - "llm" + "llm", + "ai", + "gateway", + "local-llm", + "github-copilot", + "langchain", + "open-webui", + "llm-proxy", + "ai-gateway", + "model-router", + "openrouter", + "bedrock" ], "author": "KochC", "license": "MIT",