Skip to content

Releases: aws-samples/sample-kolya-br-proxy

v0.6.3

14 Apr 08:47
431f181

Choose a tag to compare

What's New

Extended Thinking Fix

  • Fix: pass through thinking blocks instead of stripping them from assistant message history — resolves ValidationException on multi-turn conversations with extended thinking enabled
  • Forward signature_delta in streaming responses so clients receive valid signatures for subsequent turns
  • Fix non-streaming response to include actual thinking text and signature
  • Auto-adjust budget_tokens to Bedrock minimum (1024) when clients send smaller values

Prompt Cache TTL Fix

  • Fix: skip cache_control.ttl for unsupported models — only Claude 4.5 family (Opus 4.5, Sonnet 4.5, Haiku 4.5) supports the ttl field. Non-4.5 models (Sonnet 4, Opus 4, etc.) now get {"type": "ephemeral"} without ttl, resolving Extra inputs are not permitted errors
  • Strip ttl from pre-existing client breakpoints on unsupported models

Batch Token Creation

  • New POST /admin/tokens/batch endpoint — create multiple API keys at once with shared config and optional model list
  • Comma-separated names input replaces old name_prefix + count pattern — supports ASCII/Chinese commas, semicolons, and newlines
  • Frontend batch create dialog with textarea, model multi-select, and one-click "Copy All"

API Keys Search

  • Added search filter on API Keys table (search by name)

Zero-Downtime Rolling Updates

  • Added preStop hook (sleep 15 backend / sleep 10 frontend) to keep pods alive during ALB target deregistration
  • maxUnavailable: 0 + maxSurge: 1 ensures new pod is ready before old one terminates
  • Proper terminationGracePeriodSeconds (65s backend / 30s frontend)

EMF Metrics Fix

  • Switched CloudWatch Embedded Metrics from TCP socket to stdout sink (config.environment = "local")
  • Eliminates Connection refused errors in pods without CloudWatch Agent sidecar

Infrastructure

  • Renamed Karpenter IAM controller role to karp-ctrl-{alias}-... to prevent cross-project naming collision
  • Increased BEDROCK_ACCOUNT_RPM from 500 to 10,000

Code Quality

  • Replaced hand-rolled clipboard logic with Quasar copyToClipboard
  • Batch DB refresh: single WHERE IN query instead of N sequential SELECTs
  • Fixed fetchAvailableModels missing loading state (spinner bug)

PRs

  • #19 fix/filter-noise-traces
  • #20 fix/zero-downtime-rolling-update
  • #21 fix/emf-stdout-sink + batch token creation + search + karpenter rename
  • #23 fix/thinking-block-passthrough
  • #25 fix/cache-ttl-and-batch-names

v0.6.2

11 Apr 11:21
c7d39ca

Choose a tag to compare

What's New

Stream Failover (PR #15)

  • Two-level stream failover for capacity-starved regions
    • L1: Same model, different region (transparent to client)
    • L2: Different model (client notified via x-actual-model SSE comment)
  • STREAM_FIRST_CONTENT_TIMEOUT default set to 600s

AWS-Native Observability (PR #16)

  • Structured logging: JSON format with per-API-key context (token_name, token_id) via contextvars
  • CloudWatch EMF metrics: RequestDuration, TokensInput/Output, CacheTokens, TTFT, BedrockCallDuration, FailoverTriggered, HttpRequestDuration/Count
  • AWS X-Ray tracing: OpenTelemetry integration via CloudWatch Agent DaemonSet (OTLP HTTP port 4316)
  • Log-trace correlation: trace_id and span_id auto-injected into every log record
  • Runtime observability config API (PUT /admin/observability) and Settings UI
  • CloudWatch Observability EKS addon with Pod Identity IAM
  • CloudWatch log retention: prod=7d, non-prod=3d

Locust Benchmark Tool (PR #17)

  • Three API endpoint benchmarks: OpenAI, Anthropic, Gemini
  • Streaming SSE parsing with TTFT measurement
  • Extended thinking support (BENCHMARK_THINKING_BUDGET)
  • Configurable prompt sizes (small/medium/large)

Bug Fixes & Improvements

Observability Fixes (PR #18, #19)

  • Add missing bedrock.invoke_stream span to failover streaming path
  • Exclude GeneratorExit from span exception recording (normal generator cleanup)
  • Record bedrock.duration_s in finally block to cover both success and error paths
  • Filter OPTIONS (CORS preflight) and /admin/* from X-Ray traces and EMF metrics to reduce noise
  • Self-manage Karpenter node IAM with create_before_destroy lifecycle

Security

  • Increase PBKDF2 iterations for refresh token hashing from 1 to 100,000
  • Upgrade vite (7.3.2) and axios (1.15.0) to resolve critical/high CVEs

Documentation

  • Comprehensive observability docs (EN + ZH) with ASGI span breakdown, request flow diagrams, and X-Ray trace examples
  • FAQ: why one user interaction produces multiple X-Ray traces
  • Stream failover, logging, security, and performance docs update

v0.6.1

10 Apr 06:28

Choose a tag to compare

v0.6.1 Release Notes

Bug Fixes

1. KMS Decrypt Permissions for External Secrets Operator

  • Issue: ESO (External Secrets Operator) failed with AccessDeniedException: Access to KMS is not allowed when Secrets Manager secrets are encrypted with a customer-managed KMS key (CMK)
  • Fix: Added kms:Decrypt and kms:DescribeKey permissions to the ESO IAM policy in Terraform (iac/modules/eks-addons/main.tf)
  • Impact: Deployments using CMK-encrypted secrets in Secrets Manager now sync correctly

2. Refresh Token Race Condition on Multi-Pod Deployments

  • Issue: When multiple pods handle concurrent refresh token requests, the token reuse detection logic would falsely trigger "Token theft detected", revoking the entire token family and forcing users to re-login
  • Fix (Backend): Added a 10-second grace period in refresh_token.py — if a child token was created within the grace window, the reuse is recognized as a concurrent refresh rather than token theft
  • Fix (Frontend): Added refresh request deduplication in axios.ts using a shared promise pattern — only one refresh request is sent at a time, other 401 responses queue and wait for the result
  • Impact: Users on multi-pod deployments no longer get randomly logged out

3. Usage Query Performance — Composite Indexes + 90-Day Limit

  • Issue: Usage statistics queries on usage_records table become slow as data grows, with no upper bound on query time range
  • Fix (Database): Added Alembic migration with composite indexes (user_id, created_at) and (token_id, created_at) on usage_records
  • Fix (Backend): Added _clamp_date_range() helper enforcing a 90-day maximum query range on all 5 usage API endpoints (/stats, /by-token, /by-model, /aggregated-stats, /token-summary, /tokens-timeseries). Returns HTTP 400 if range exceeds 90 days
  • Fix (Frontend): Added minAllowedDate constraint on date pickers in both DashboardPage.vue and MonitorPage.vue, with user-facing validation warnings via Quasar Notify
  • Impact: Query performance significantly improved; prevents unbounded scans on large usage tables

4. CVE-2026-39892 — cryptography Package Upgrade

  • Issue: cryptography 46.0.6 had a known vulnerability (CVE-2026-39892) causing pip-audit CI failures
  • Fix: Upgraded cryptography dependency from >=46.0.5 to >=46.0.7 in pyproject.toml
  • Impact: Resolves security vulnerability and unblocks CI pipeline

5. Missing us.* Pricing for Claude Haiku 4.5 in Geo Cross-Region

  • Issue: The pricing table in us-east-1 only had global.anthropic.claude-haiku-4-5-* but was missing us.anthropic.claude-haiku-4-5-*. This caused cost calculation to fall back to base model pricing (or fail) when users invoke us.anthropic.claude-haiku-4-5-* via Geo/In-region cross-region inference
  • Root Cause: The AWS Bedrock pricing page uses inconsistent naming — "Claude Haiku 4.5" in the Global Cross-region section vs "Claude 4.5 Haiku" in the Geo/In-region section. The static model name mapping only had the former
  • Fix: Added "Claude 4.5 Haiku" as an alias in pricing_updater.py static mapping
  • Impact: us.* cross-region pricing entries are now correctly created for Claude Haiku 4.5 during pricing refresh

Files Changed

  • iac/modules/eks-addons/main.tf — KMS permissions for ESO
  • backend/app/services/refresh_token.py — Concurrent refresh grace period
  • frontend/src/boot/axios.ts — Refresh request deduplication
  • backend/alembic/versions/e6f7g8h9i0j1_add_usage_records_composite_indexes.py — New migration
  • backend/app/models/usage.py — Model-level index declarations
  • backend/app/api/admin/endpoints/usage.py — 90-day query limit
  • frontend/src/pages/MonitorPage.vue — Date picker constraints + validation
  • frontend/src/pages/DashboardPage.vue — Date picker constraints
  • pyproject.toml / uv.lock — cryptography upgrade
  • backend/app/services/pricing_updater.py — Claude 4.5 Haiku name alias

Upgrade Notes

  • Run alembic upgrade head to apply the new composite indexes on usage_records
  • Trigger a pricing refresh (or wait for the next scheduled run) to populate missing us.* pricing entries

v0.6.0

07 Apr 08:15

Choose a tag to compare

What's Changed

Full Changelog: v0.5.0...v0.6.0

v0.5.0 — Initial Release

01 Apr 00:32

Choose a tag to compare

Pre-release

Kolya Bedrock Proxy is an AI gateway that provides OpenAI-compatible and Anthropic-native API access to AWS Bedrock models, with a built-in admin dashboard for token management, usage tracking, and model configuration.

Highlights

  • Dual API Support — Compatible with both OpenAI SDK (/v1/chat/completions) and Anthropic SDK (/v1/messages), including Claude Code integration.
  • AWS Bedrock Models — Supports Claude, Nova, Llama, DeepSeek, GLM, Mistral, and more via AWS Bedrock Converse API.
  • Admin Dashboard — Web UI for managing API tokens, monitoring usage, configuring models, and viewing pricing.
  • Production-Ready Infrastructure — Terraform modules for VPC, EKS (with Karpenter), RDS Aurora PostgreSQL, WAF, Global Accelerator, and Cognito/Microsoft OAuth.
  • Security — Non-root containers, KMS CMK for Secrets Manager, RDS IAM authentication, rate limiting via Redis, and prompt injection protection.

What's Changed

  • chore(deps): bump cryptography from 46.0.5 to 46.0.6 by @dependabot[bot] in #1
  • chore(deps): bump pygments from 2.19.2 to 2.20.0 by @dependabot[bot] in #2

New Contributors

Full Changelog: https://github.com/aws-samples/sample-kolya-br-proxy/commits/v0.5.0