Development and Demo UI:
This repository ships with a FastAPI-powered Interactive Playground for validating text generation, embeddings, and registry configuration end-to-end. See Development And Demo UI section below for details and setup instructions.
Provider-agnostic LLM adapter for text generation + embeddings with a registry-driven routing layer (capabilities, param policies, pricing metadata, access control), plus normalized outputs (text, tool calls, reasoning, usage).
Currently supports OpenAI and Gemini (extensible architecture for additional providers).
- PyPI: https://pypi.org/project/vrraj-llm-adapter
- GitHub: https://github.com/vrraj/llm-adapter
- Documentation: https://vrraj.github.io/llm-adapter/
pip install vrraj-llm-adapter- One interface for generation + embeddings across providers
- Registry-driven routing (default + extensible) — ships with built-in model keys and supports custom registry extensions
- Parameter policies (allowed/disabled filtering per model)
- Normalized responses (text, tool calls, reasoning, usage)
- Model Allowlist (access control)
- Pricing metadata in registry for cost visibility
- Embedding controls (optional normalization + configurable output dimensionality)
Requires API keys:
OPENAI_API_KEYand/orGEMINI_API_KEYSetup: Copy
.env.exampleto.envand configure your API keys
cp .env.example .env
# Edit .env with your API keysThe examples below use a registry model key for model= (for example: openai:gpt-4o-mini, gemini:openai-3-flash-preview). For a complete list of default model keys, see model-registry.md or print keys programmatically (snippet below)
Download and run a ready-to-use example script for text generation and embeddings for openai and gemini
curl -L -O https://raw.githubusercontent.com/vrraj/llm-adapter/main/examples/llm_adapter_basic_usage.py
python llm_adapter_basic_usage.pyfrom llm_adapter import llm_adapter
resp = llm_adapter.create(
model="openai:gpt-4o-mini", # for gemini, use "gemini:openai-3-flash-preview"
input="Write a one-sentence bedtime story about a unicorn.",
max_output_tokens=100,
)
# Normalize to stable app-facing schema
result = llm_adapter.normalize_adapter_response(resp)
print(result["text"])
print(result["usage"])The package ships with a default registry. To list available keys:
from llm_adapter import LLMAdapter
adapter = LLMAdapter()
for key in sorted(adapter.model_registry.keys()):
print(key)The repo includes a small FastAPI demo + UI to try models, inspect registry metadata, and view normalized responses.
The source includes developer tooling to test custom model registries (overrides/extensions) end-to-end in the UI. See Development And Demo UI section below.
llm_adapter.create(...) -> AdapterResponse— text generation (supports tools + optional streaming)llm_adapter.normalize_adapter_response(...) -> LLMResult— normalizeAdapterResponseinto a consistent dict schemallm_adapter.create_embedding(...) -> EmbeddingResponse— create embeddingsllm_adapter.get_pricing_for_model(...) -> Pricing | None— pricing metadata lookup
📋 For complete method signatures, parameter details, and full response structures, see: api-reference.md
Top-level fields (stable surface; note: output_text may include provider thought markup for some Gemini paths):
AdapterResponse(
output_text: str,
model: str,
usage: dict,
status: str,
finish_reason: str | None,
tool_calls: list | None,
metadata: dict | None,
adapter_response: Any | None, # debug/opaque
model_response: Any | None, # debug/opaque
)Top-level fields:
EmbeddingResponse(
data: List[List[float]],
usage: EmbeddingUsage,
normalized: bool | None,
vector_dim: int | None,
metadata: dict | None,
raw: Any | None,
)Top-level fields:
{
"text": str,
"reasoning": str | None,
"role": str,
"status": str,
"finish_reason": str | None,
"usage": dict,
"tool_calls": list,
"metadata": dict | None,
"raw": Any,
}The adapter intentionally separates the provider boundary from your app-facing schema:
User Input
│
▼
llm_adapter.create(...) ─────────────► AdapterResponse
│ (provider-aware: raw responses, metadata)
│
▼
llm_adapter.normalize_adapter_response(resp) ─► LLMResult
(stable dict schema for apps)
Notes:
- `create()` performs the network call.
- `normalize_adapter_response()` is a local transform (no additional provider request).
Normalize to LLMResult for stable, application-facing output.
Use result["text"] from normalize_adapter_response() for display-safe text; resp.output_text may include provider thought markup depending on model configuration.
- Complete API Reference: api-reference.md
- Model Registry docs: model-registry.md
- Ready to use Examples: examples
- Dev notes: development.md
Install the adapter from PyPI, then download and run the standalone example scripts to explore common usage patterns such as chat, embeddings, streaming, and custom registry overrides.
Some applications prefer a one-step helper that standardizes on LLMResult internally:
from llm_adapter import llm_adapter
def create_result(**kwargs):
resp = llm_adapter.create(**kwargs)
return llm_adapter.normalize_adapter_response(resp)
result = create_result(
model="openai:gpt-4o-mini",
input="Hello"
)
print(result["text"])This pattern keeps the library surface minimal while allowing your application to standardize on the normalized contract.
Core Examples:
- llm_adapter_basic_usage.py - Basic usage and normalization
- create_and_normalize_example.py - Recommended create → normalize flow (Gemini-safe)
- llm_adapter_model_spec_example.py - ModelSpec configuration
Provider-Specific Examples:
- openai_embedding_example.py - OpenAI embeddings
- openai_adapter_example.py - OpenAI chat
- streaming_call_example.py - Streaming responses
Advanced Examples:
- set_adapter_allowed_models.py - Allowlist demo (See "Model Allowlist (Access Control)" section for environment variable details)
- custom_registry.py - Custom registry
For application-facing output, use the create → normalize flow (see Text Generation - Application Wrapper Pattern above). If you need the raw provider boundary object for debugging,
llm_adapter.create(...)returns anAdapterResponse.
Some models (like Gemini) return reasoning content separately.
from llm_adapter import llm_adapter, LLMError
try:
response = llm_adapter.create(
model="gemini:native-sdk-reasoning-2.5-flash",
input="Explain why the sky is blue",
reasoning_effort="high", # adapter-level reasoning knob
max_output_tokens=1000
)
normalized_response = llm_adapter.normalize_adapter_response(response)
if normalized_response.get('reasoning'):
print(f"Reasoning: {normalized_response['reasoning']}")
print(normalized_response['text'])
except LLMError as e:
print(f"Error: {e.code} - {e}")from llm_adapter import llm_adapter
for event in llm_adapter.create(model="openai:gpt-4o-mini", input="Hello", stream=True):
if event.type == "output_text.delta":
print(event.delta, end="")The adapter supports provider-agnostic tool calling using the OpenAI-style function schema.
Pass tool definitions to llm_adapter.create(...). The model may return one or more tool calls with structured arguments. The host application is responsible for executing those tools and sending the tool results back to the adapter as follow-up context for the final response.
Tool definitions use an OpenAI-style JSON schema:
tools = [
{
"type": "function",
"name": "get_weather",
"description": "Get current weather information for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name or location (for example: 'New York, NY')"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["location"]
}
}
]When the model decides to call a tool, normalized tool calls are returned in AdapterResponse.tool_calls and LLMResult.tool_calls:
tool_calls = [
{
"id": "call_12345",
"name": "get_weather",
"args": {
"location": "New York, NY",
"units": "celsius"
}
}
]from llm_adapter import llm_adapter
tools = [
{
"type": "function",
"name": "get_weather",
"description": "Get weather information",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
]
response = llm_adapter.create(
model="openai:gpt-4o-mini",
input="What's the weather like in New York?",
tools=tools
)
if response.tool_calls:
for call in response.tool_calls:
tool_name = call["name"]
tool_args = call["args"]
tool_id = call["id"]
# Execute the tool in your application
result = execute_tool(tool_name, tool_args)
# Send tool results back to the adapter in your follow-up call- The adapter normalizes tool definitions and emitted tool calls across providers.
- Tool execution is intentionally handled by the host application, not by the adapter.
- For application-facing output, use the
create -> normalize_adapter_responseflow.
The LLM adapter uses a registry of model definitions (ModelInfo) that control:
- Provider routing
- Endpoint selection
- Parameter policies (allowed/disabled)
- Pricing and limits
- Capabilities (reasoning, tools, dimensions, etc.)
You can override or extend the registry by passing your own mapping to LLMAdapter(...).
from llm_adapter.model_registry import ModelInfo, validate_registry
from llm_adapter import ModelSpecfrom llm_adapter import LLMAdapter
from llm_adapter.model_registry import ModelInfo, Pricing
custom_registry = {
"my-openai-model": ModelInfo(
provider="openai",
model="gpt-4o-mini",
endpoint="chat_completions",
pricing=Pricing(input_per_mm=0.05, output_per_mm=0.15),
param_policy={"allowed": {"temperature", "max_tokens"}},
limits={"max_output_tokens": 1000}
)
}
adapter = LLMAdapter(model_registry=custom_registry)export LLM_ADAPTER_ALLOWED_MODELS="openai:gpt-4o-mini,openai:embed_small"For comprehensive registry documentation, see:
- https://github.com/vrraj/llm-adapter/blob/main/docs/model-registry.md
- https://github.com/vrraj/llm-adapter/blob/main/examples/custom_registry.py
- https://github.com/vrraj/llm-adapter/blob/main/src/llm_adapter/model_registry.py
from llm_adapter.model_registry import validate_registry
validate_registry(custom_registry, strict=False)from llm_adapter import llm_adapter, LLMError
try:
response = llm_adapter.create_embedding(
model="openai:embed_small",
input="The quick brown fox jumps over the lazy dog"
)
print(f"Generated {len(response.data)} embeddings")
print(f"First embedding dimension: {len(response.data[0])}")
except LLMError as e:
print(f"Error: {e.code} - {e}")Do this to run the demo UI (runs on port 8100) or customize the code.
- Clone the repository and run the setup script.
git clone https://github.com/vrraj/llm-adapter.git
cd llm-adapter
bash scripts/llm_adapter_setup.shThis script (scripts/llm_adapter_setup.sh) checks prerequisites (
python3,make), creates.envif missing, sets up a local.venv, installs the package (pip install -e ".[server]"), and shows next steps. The demo UI and FastAPI server run in this.venvvirtual environment. Safe to run multiple times.
-
Set required API keys (see Environment variables section below).
-
Start the application.
make startNote: Run
make startto run in foreground ormake start-bgto run in background. Usemake stopto stop the server.
- Open the demo UI:
If you prefer not to use the Makefile helpers, you can start the FastAPI server directly:
uvicorn llm_adapter_demo.api:app --reload --port 8100The Interactive Playground will be available at:
http://localhost:8100/ui/
pip install -e ".[dev]"pytest
pytest -m integration
pytest -m "integration or unit"For internal design and architecture notes, see development.md.
ModelSpec provides a type-safe, reusable way to configure model parameters as an alternative to passing individual parameters.
Note: See
examples/llm_adapter_model_spec_example.pyfor a comprehensive example demonstrating ModelSpec usage with different providers and parameter configurations.
from llm_adapter import llm_adapter
from llm_adapter import ModelSpec
chat_spec = ModelSpec(
provider="openai",
model="gpt-4o-mini",
temperature=0.7,
max_output_tokens=1000,
extra={"custom_param": "value"}
)
resp1 = llm_adapter.create(spec=chat_spec, input=[{"role": "user", "content": "Hello"}])
resp2 = llm_adapter.create(spec=chat_spec, input=[{"role": "user", "content": "How are you?"}])
embed_spec = ModelSpec(
provider="openai",
model="embed_small"
)
resp = llm_adapter.create_embedding(spec=embed_spec, input="Text to embed")| Approach | Provider | Model Name | Auto-detection | Type Safety |
|---|---|---|---|---|
| Individual params | Optional (auto-detected from registry) | Registry key (openai:gpt-4o-mini) |
✅ Yes | ❌ Runtime |
| ModelSpec | Required (explicit) | Provider-native (gpt-4o-mini) |
❌ No | ✅ Static type-checkers |
LLMAdapter returns a consistent usage schema across all providers:
{
"prompt_tokens": 0,
"cached_tokens": 0,
"output_tokens": 0,
"reasoning_tokens": 0,
"answer_tokens": 0,
"total_tokens": 0
}Key relationships:
output_tokens = answer_tokens + reasoning_tokenstotal_tokens = prompt_tokens + cached_tokens + output_tokens
Copy .env.example to .env and to set up your API keys (or use your existing environment variables):
cp .env.example .envSupported env vars:
Minimal working sets:
- OpenAI-only:
OPENAI_API_KEY - Gemini native SDK:
GEMINI_API_KEY - Gemini OpenAI-compatible:
GEMINI_API_KEY+GEMINI_OPENAI_BASE_URL
All supported variables:
OPENAI_API_KEYGEMINI_API_KEYGEMINI_OPENAI_BASE_URLLLM_ADAPTER_ALLOWED_MODELS(comma-separated list) - Restrict which models can be used in each environment.
The LLM_ADAPTER_ALLOWED_MODELS environment variable allows you to restrict which models can be used. By default, all models are allowed.
export LLM_ADAPTER_ALLOWED_MODELS="openai:gpt-4o-mini,gemini:native-sdk-reasoning-2.5-flash"Supports:
- OpenAI (Responses API, Chat Completions API, Embeddings API)
- Gemini (native
google-genaiSDK and OpenAI-compatible endpoint)
Models and capabilities are defined in src/llm_adapter/model_registry.py.
To add support for new models or override existing configurations, use custom registries rather than modifying the core registry:
- Create a custom registry - See
examples/custom_registry.pyfor a complete example - Define ModelInfo entries - Configure endpoints, capabilities, pricing, and parameter policies
- Load your registry - Use environment variable or pass it to
LLMAdapter(model_registry=your_registry) - Test via Demo UI - The Interactive Playground supports custom registry testing
For easy configuration without code changes, set the CUSTOM_REGISTRY_PATH environment variable:
# Configure environment (optional):
export CUSTOM_REGISTRY_PATH=/path/to/your/custom_registry.pyThe adapter will automatically load and merge your custom registry with the default registry. This is useful for:
- Development environments with custom models
- Production deployments with organization-specific configurations
- Testing different registry configurations without code changes
📖 For complete custom registry documentation, see:
This is a standalone package. Development happens directly in this repo.
pip install -e .
make startThis project is licensed under the MIT License.
