A FastAPI-based metadata extraction gateway that sits in front of LiteLLM to inject evaluation metadata into LLM requests and track completions for distributed evaluation workflows.
The Metadata Gateway is a proxy service that enhances LiteLLM by:
- Extracting metadata from URL paths and injecting it as Langfuse tags
- Managing Langfuse credentials per-project without exposing them to clients
- Tracking completion insertion IDs in Redis for completeness verification
- Fetching and validating traces from Langfuse with built-in retry logic
This enables distributed evaluation systems to track which LLM completions belong to which evaluation runs, ensuring data completeness and proper attribution.
┌─────────────┐
│ Client │
│ (SDK/CLI) │
└──────┬──────┘
│ Authorization: Bearer <api_key>
│ POST /rollout_id/{id}/invocation_id/{id}/.../chat/completions
▼
┌─────────────────────────┐
│ Metadata Gateway │
│ (FastAPI Service) │
│ - Extract metadata │
│ - Inject Langfuse keys │
│ - Generate UUID7 IDs │
└──────┬──────────┬───────┘
│ │
▼ ▼
┌────────┐ ┌─────────────┐
│ Redis │ │ LiteLLM │
│ │ │ Backend │
│ Track │ │ │
│ IDs │ └──────┬──────┘
└────────┘ │
▼
┌─────────────┐
│ Langfuse │
│ (Tracing) │
└─────────────┘
app.py: Main FastAPI application with route definitionslitellm.py: LiteLLM client for forwarding requestslangfuse.py: Langfuse trace fetching with retry logicredis_utils.py: Redis operations for insertion ID trackingmodels.py: Pydantic models for configuration and responsesauth.py: Authentication provider interface (extensible)main.py: Entry point for running the service
- Stores insertion IDs per rollout for completeness checking
- Uses Redis Sets:
rollout_id -> {insertion_id_1, insertion_id_2, ...}
- Uses LiteLLM SDK directly for LLM calls (no separate proxy server needed)
- Integrated with Langfuse via
langfuse_otelOpenTelemetry callback
URL paths encode evaluation metadata that gets injected as Langfuse tags:
rollout_id: Unique ID for a batch evaluation runinvocation_id: ID for a single invocation within a rolloutexperiment_id: Experiment identifierrun_id: Run identifier within an experimentrow_id: Dataset row identifierinsertion_id: Auto-generated UUID7 for this specific completion
- On chat completion: Generate UUID7 insertion_id and store in Redis
- On trace fetch: Verify all expected insertion_ids are present in Langfuse
- Retry logic: Automatic retries with exponential backoff for incomplete traces
- Store Langfuse credentials for multiple projects in
secrets.yaml - Route requests to the correct project via
project_idin URL or use default - Credentials never exposed to clients
- Docker and Docker Compose (recommended)
- Python 3.11+ (for local development)
-
Create secrets file:
cp proxy_core/secrets.yaml.example proxy_core/secrets.yaml
-
Edit
proxy_core/secrets.yamlwith your Langfuse credentials. Important: use your real Langfuse project ID (e.g.cmg00asdf0123...).langfuse_keys: my-project: public_key: pk-lf-... secret_key: sk-lf-... default_project_id: my-project
-
Start services:
docker-compose up -d
-
Verify services are running:
curl http://localhost:4000/health # Expected: {"status":"healthy","service":"metadata-proxy"}
The gateway will be available at http://localhost:4000.
POST /rollout_id/{rollout_id}/invocation_id/{invocation_id}/experiment_id/{experiment_id}/run_id/{run_id}/row_id/{row_id}/chat/completions
POST /project_id/{project_id}/rollout_id/{rollout_id}/.../chat/completions
Features:
- Extracts metadata from URL path
- Generates UUID7 insertion_id
- Injects Langfuse credentials
- Tracks insertion_id in Redis
- Forwards to LiteLLM
Request:
curl -X POST http://localhost:4000/rollout_id/abc123/invocation_id/inv1/experiment_id/exp1/run_id/run1/row_id/row1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-..." \
-d '{
"model": "fireworks_ai/accounts/fireworks/models/llama-v3p3-70b-instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'Response: Standard OpenAI chat completion response
POST /project_id/{project_id}/chat/completions
For completions that don't need rollout tracking.
POST /rollout_id/{rollout_id}/.../encoded_base_url/{encoded_base_url}/chat/completions
The encoded_base_url is base64-encoded URL string injected into the request body as base_url.
GET /traces?tags=rollout_id:abc123
GET /v1/traces?tags=rollout_id:abc123
GET /project_id/{project_id}/traces?tags=rollout_id:abc123
GET /v1/project_id/{project_id}/traces?tags=rollout_id:abc123
Waits for all expected insertion_ids to complete before returning all traces.
GET /traces/pointwise?tags=rollout_id:abc123
GET /v1/traces/pointwise?tags=rollout_id:abc123
GET /project_id/{project_id}/traces/pointwise?tags=rollout_id:abc123
GET /v1/project_id/{project_id}/traces/pointwise?tags=rollout_id:abc123
Returns only the latest trace (UUID v7 time-ordered). Much faster for pointwise evaluations where you only need the final accumulated result.
Required Query Parameters:
tags: Array of tags (must include at least onerollout_id:*tag)
Optional Query Parameters:
limit: Max traces to fetch (default: 100)sample_size: Random sample size if more traces founduser_id,session_id,name,environment,version,release: Langfuse filtersfields: Comma-separated fields to includehours_back: Fetch traces from last N hoursfrom_timestamp,to_timestamp: ISO datetime strings for time rangesleep_between_gets: Delay between trace.get calls (default: 2.5s)max_retries: Retry attempts for incomplete traces (default: 3)
Completeness Logic:
- Fetches traces from Langfuse matching tags
- Extracts insertion_ids from trace tags
- Compares with expected insertion_ids in Redis
- Retries with exponential backoff if incomplete
- Returns 404 if still incomplete after max_retries
Response:
{
"project_id": "my-project",
"total_traces": 42,
"traces": [
{
"id": "trace-123",
"name": "chat-completion",
"tags": ["rollout_id:abc123", "insertion_id:uuid7..."],
"input": {...},
"output": {...},
"observations": [...]
}
]
}GET /health
Returns service health status.
ANY /{path}
Forwards any other request to LiteLLM backend with API key injection.
| Variable | Required | Default | Description |
|---|---|---|---|
REDIS_HOST |
Yes | - | Redis hostname |
REDIS_PORT |
No | 6379 | Redis port |
REDIS_PASSWORD |
No | - | Redis password |
SECRETS_PATH |
No | proxy_core/secrets.yaml |
Path to secrets file (YAML) |
LANGFUSE_HOST |
No | https://us.cloud.langfuse.com |
Langfuse OTEL host for tracing |
REQUEST_TIMEOUT |
No | 300.0 | Request timeout (LLM calls) in seconds |
LOG_LEVEL |
No | INFO | Logging level |
PORT |
No | 4000 | Gateway port |
Create proxy_core/secrets.yaml:
langfuse_keys:
project-1:
public_key: pk-lf-...
secret_key: sk-lf-...
project-2:
public_key: pk-lf-...
secret_key: sk-lf-...
default_project_id: project-1Security: secrets.yaml is ignored via .gitignore.
The config_no_cache.yaml configures LiteLLM (only needed if running a standalone LiteLLM proxy):
model_list:
- model_name: "*"
litellm_params:
model: "*"
litellm_settings:
callbacks: ["langfuse_otel"]
drop_params: True
general_settings:
allow_client_side_credentials: trueKey settings:
- Wildcard model support: Route any model to any provider
- Langfuse OTEL: OpenTelemetry-based tracing via
langfuse_otelcallback - Client-side credentials: Accept API keys from request body
Note: The proxy now uses the LiteLLM SDK directly with langfuse_otel integration, so a separate LiteLLM proxy server is no longer required.
- Default: No authentication (
NoAuthProvider) - Extensible: Implement custom
AuthProviderfor production - API Keys: Client API keys forwarded to LiteLLM, never stored
- Required rollout_id tag: Prevents fetching all traces
- Project isolation: Projects can only access their own Langfuse data
- Optional auth:
/tracesendpoint can require authentication
- Never commit
secrets.json- use environment variables in production - Use HTTPS in production deployments
- Implement proper authentication for production use
- Rotate Langfuse keys regularly
- Monitor Redis memory usage for large rollouts
docker-compose up -dCreate deployment with:
- Secrets for
secrets.jsonand Redis credentials - Service for internal/external access
- ConfigMap for LiteLLM config
- Redis StatefulSet or managed Redis service
eval_protocol/proxy/
├── proxy_core/ # Main application package
│ ├── __init__.py
│ ├── app.py # FastAPI routes
│ ├── litellm.py # LiteLLM client
│ ├── langfuse.py # Langfuse integration
│ ├── redis_utils.py # Redis operations
│ ├── models.py # Pydantic models
│ ├── auth.py # Authentication
│ ├── main.py # Entry point
│ └── secrets.yaml.example
├── docker-compose.yml # Local development stack
├── Dockerfile.gateway # Gateway container
├── config_no_cache.yaml # LiteLLM config
├── requirements.txt # Python dependencies
└── README.md # This file
curl -X POST http://localhost:4000/rollout_id/test123/invocation_id/inv1/experiment_id/exp1/run_id/run1/row_id/row1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "fireworks_ai/accounts/fireworks/models/llama-v3p3-70b-instruct",
"messages": [{"role": "user", "content": "Say hello"}]
}'curl "http://localhost:4000/traces?tags=rollout_id:test123" \
-H "Authorization: Bearer your-auth-token"redis-cli
> SMEMBERS test123 # View insertion_ids for rollout