Releases · aws-samples/sample-kolya-br-proxy

14 Apr 08:47

koljahuang

v0.6.3

431f181

v0.6.3 Latest

Latest

What's New

Extended Thinking Fix

Fix: pass through thinking blocks instead of stripping them from assistant message history — resolves ValidationException on multi-turn conversations with extended thinking enabled
Forward signature_delta in streaming responses so clients receive valid signatures for subsequent turns
Fix non-streaming response to include actual thinking text and signature
Auto-adjust budget_tokens to Bedrock minimum (1024) when clients send smaller values

Prompt Cache TTL Fix

Fix: skip cache_control.ttl for unsupported models — only Claude 4.5 family (Opus 4.5, Sonnet 4.5, Haiku 4.5) supports the ttl field. Non-4.5 models (Sonnet 4, Opus 4, etc.) now get {"type": "ephemeral"} without ttl, resolving Extra inputs are not permitted errors
Strip ttl from pre-existing client breakpoints on unsupported models

Batch Token Creation

New POST /admin/tokens/batch endpoint — create multiple API keys at once with shared config and optional model list
Comma-separated names input replaces old name_prefix + count pattern — supports ASCII/Chinese commas, semicolons, and newlines
Frontend batch create dialog with textarea, model multi-select, and one-click "Copy All"

API Keys Search

Added search filter on API Keys table (search by name)

Zero-Downtime Rolling Updates

Added preStop hook (sleep 15 backend / sleep 10 frontend) to keep pods alive during ALB target deregistration
maxUnavailable: 0 + maxSurge: 1 ensures new pod is ready before old one terminates
Proper terminationGracePeriodSeconds (65s backend / 30s frontend)

EMF Metrics Fix

Switched CloudWatch Embedded Metrics from TCP socket to stdout sink (config.environment = "local")
Eliminates Connection refused errors in pods without CloudWatch Agent sidecar

Infrastructure

Renamed Karpenter IAM controller role to karp-ctrl-{alias}-... to prevent cross-project naming collision
Increased BEDROCK_ACCOUNT_RPM from 500 to 10,000

Code Quality

Replaced hand-rolled clipboard logic with Quasar copyToClipboard
Batch DB refresh: single WHERE IN query instead of N sequential SELECTs
Fixed fetchAvailableModels missing loading state (spinner bug)

PRs

#19 fix/filter-noise-traces
#20 fix/zero-downtime-rolling-update
#21 fix/emf-stdout-sink + batch token creation + search + karpenter rename
#23 fix/thinking-block-passthrough
#25 fix/cache-ttl-and-batch-names

Assets 2

11 Apr 11:21

koljahuang

v0.6.2

c7d39ca

v0.6.2

What's New

Stream Failover (PR #15)

Two-level stream failover for capacity-starved regions
- L1: Same model, different region (transparent to client)
- L2: Different model (client notified via x-actual-model SSE comment)
STREAM_FIRST_CONTENT_TIMEOUT default set to 600s

AWS-Native Observability (PR #16)

Structured logging: JSON format with per-API-key context (token_name, token_id) via contextvars
CloudWatch EMF metrics: RequestDuration, TokensInput/Output, CacheTokens, TTFT, BedrockCallDuration, FailoverTriggered, HttpRequestDuration/Count
AWS X-Ray tracing: OpenTelemetry integration via CloudWatch Agent DaemonSet (OTLP HTTP port 4316)
Log-trace correlation: trace_id and span_id auto-injected into every log record
Runtime observability config API (PUT /admin/observability) and Settings UI
CloudWatch Observability EKS addon with Pod Identity IAM
CloudWatch log retention: prod=7d, non-prod=3d

Locust Benchmark Tool (PR #17)

Three API endpoint benchmarks: OpenAI, Anthropic, Gemini
Streaming SSE parsing with TTFT measurement
Extended thinking support (BENCHMARK_THINKING_BUDGET)
Configurable prompt sizes (small/medium/large)

Bug Fixes & Improvements

Observability Fixes (PR #18, #19)

Add missing bedrock.invoke_stream span to failover streaming path
Exclude GeneratorExit from span exception recording (normal generator cleanup)
Record bedrock.duration_s in finally block to cover both success and error paths
Filter OPTIONS (CORS preflight) and /admin/* from X-Ray traces and EMF metrics to reduce noise
Self-manage Karpenter node IAM with create_before_destroy lifecycle

Security

Increase PBKDF2 iterations for refresh token hashing from 1 to 100,000
Upgrade vite (7.3.2) and axios (1.15.0) to resolve critical/high CVEs

Documentation

Comprehensive observability docs (EN + ZH) with ASGI span breakdown, request flow diagrams, and X-Ray trace examples
FAQ: why one user interaction produces multiple X-Ray traces
Stream failover, logging, security, and performance docs update

Assets 2

10 Apr 06:28

koljahuang

v0.6.1

97fd4eb

v0.6.1

v0.6.1 Release Notes

Bug Fixes

1. KMS Decrypt Permissions for External Secrets Operator

Issue: ESO (External Secrets Operator) failed with AccessDeniedException: Access to KMS is not allowed when Secrets Manager secrets are encrypted with a customer-managed KMS key (CMK)
Fix: Added kms:Decrypt and kms:DescribeKey permissions to the ESO IAM policy in Terraform (iac/modules/eks-addons/main.tf)
Impact: Deployments using CMK-encrypted secrets in Secrets Manager now sync correctly

2. Refresh Token Race Condition on Multi-Pod Deployments

Issue: When multiple pods handle concurrent refresh token requests, the token reuse detection logic would falsely trigger "Token theft detected", revoking the entire token family and forcing users to re-login
Fix (Backend): Added a 10-second grace period in refresh_token.py — if a child token was created within the grace window, the reuse is recognized as a concurrent refresh rather than token theft
Fix (Frontend): Added refresh request deduplication in axios.ts using a shared promise pattern — only one refresh request is sent at a time, other 401 responses queue and wait for the result
Impact: Users on multi-pod deployments no longer get randomly logged out

3. Usage Query Performance — Composite Indexes + 90-Day Limit

Issue: Usage statistics queries on usage_records table become slow as data grows, with no upper bound on query time range
Fix (Database): Added Alembic migration with composite indexes (user_id, created_at) and (token_id, created_at) on usage_records
Fix (Backend): Added _clamp_date_range() helper enforcing a 90-day maximum query range on all 5 usage API endpoints (/stats, /by-token, /by-model, /aggregated-stats, /token-summary, /tokens-timeseries). Returns HTTP 400 if range exceeds 90 days
Fix (Frontend): Added minAllowedDate constraint on date pickers in both DashboardPage.vue and MonitorPage.vue, with user-facing validation warnings via Quasar Notify
Impact: Query performance significantly improved; prevents unbounded scans on large usage tables

4. CVE-2026-39892 — cryptography Package Upgrade

Issue: cryptography 46.0.6 had a known vulnerability (CVE-2026-39892) causing pip-audit CI failures
Fix: Upgraded cryptography dependency from >=46.0.5 to >=46.0.7 in pyproject.toml
Impact: Resolves security vulnerability and unblocks CI pipeline

5. Missing `us.*` Pricing for Claude Haiku 4.5 in Geo Cross-Region

Issue: The pricing table in us-east-1 only had global.anthropic.claude-haiku-4-5-* but was missing us.anthropic.claude-haiku-4-5-*. This caused cost calculation to fall back to base model pricing (or fail) when users invoke us.anthropic.claude-haiku-4-5-* via Geo/In-region cross-region inference
Root Cause: The AWS Bedrock pricing page uses inconsistent naming — "Claude Haiku 4.5" in the Global Cross-region section vs "Claude 4.5 Haiku" in the Geo/In-region section. The static model name mapping only had the former
Fix: Added "Claude 4.5 Haiku" as an alias in pricing_updater.py static mapping
Impact: us.* cross-region pricing entries are now correctly created for Claude Haiku 4.5 during pricing refresh

Files Changed

iac/modules/eks-addons/main.tf — KMS permissions for ESO
backend/app/services/refresh_token.py — Concurrent refresh grace period
frontend/src/boot/axios.ts — Refresh request deduplication
backend/alembic/versions/e6f7g8h9i0j1_add_usage_records_composite_indexes.py — New migration
backend/app/models/usage.py — Model-level index declarations
backend/app/api/admin/endpoints/usage.py — 90-day query limit
frontend/src/pages/MonitorPage.vue — Date picker constraints + validation
frontend/src/pages/DashboardPage.vue — Date picker constraints
pyproject.toml / uv.lock — cryptography upgrade
backend/app/services/pricing_updater.py — Claude 4.5 Haiku name alias

Upgrade Notes

Run alembic upgrade head to apply the new composite indexes on usage_records
Trigger a pricing refresh (or wait for the next scheduled run) to populate missing us.* pricing entries

Assets 2

07 Apr 08:15

koljahuang

v0.6.0

bdc4bbe

v0.6.0

What's Changed

Feat/gemini integration by @koljahuang in #3
feat: increase all timeout configurations to 1 hour by @koljahuang in #4
Fix/model access and token hash by @koljahuang in #10

Full Changelog: v0.5.0...v0.6.0

Contributors

koljahuang

Assets 2

01 Apr 00:32

koljahuang

v0.5.0

53ee6d6

v0.5.0 — Initial Release Pre-release

Pre-release

Kolya Bedrock Proxy is an AI gateway that provides OpenAI-compatible and Anthropic-native API access to AWS Bedrock models, with a built-in admin dashboard for token management, usage tracking, and model configuration.

Highlights

Dual API Support — Compatible with both OpenAI SDK (/v1/chat/completions) and Anthropic SDK (/v1/messages), including Claude Code integration.
AWS Bedrock Models — Supports Claude, Nova, Llama, DeepSeek, GLM, Mistral, and more via AWS Bedrock Converse API.
Admin Dashboard — Web UI for managing API tokens, monitoring usage, configuring models, and viewing pricing.
Production-Ready Infrastructure — Terraform modules for VPC, EKS (with Karpenter), RDS Aurora PostgreSQL, WAF, Global Accelerator, and Cognito/Microsoft OAuth.
Security — Non-root containers, KMS CMK for Secrets Manager, RDS IAM authentication, rate limiting via Redis, and prompt injection protection.

What's Changed

chore(deps): bump cryptography from 46.0.5 to 46.0.6 by @dependabot[bot] in #1
chore(deps): bump pygments from 2.19.2 to 2.20.0 by @dependabot[bot] in #2

New Contributors

@dependabot[bot] made their first contribution in #1

Full Changelog: https://github.com/aws-samples/sample-kolya-br-proxy/commits/v0.5.0

Contributors

dependabot

Assets 2

Releases: aws-samples/sample-kolya-br-proxy

v0.6.3

What's New

Extended Thinking Fix

Prompt Cache TTL Fix

Batch Token Creation

API Keys Search

Zero-Downtime Rolling Updates

EMF Metrics Fix

Infrastructure

Code Quality

PRs

Uh oh!

v0.6.2

What's New

Stream Failover (PR #15)

AWS-Native Observability (PR #16)

Locust Benchmark Tool (PR #17)

Bug Fixes & Improvements

Observability Fixes (PR #18, #19)

Security

Documentation

Uh oh!

v0.6.1

v0.6.1 Release Notes

Bug Fixes

1. KMS Decrypt Permissions for External Secrets Operator

2. Refresh Token Race Condition on Multi-Pod Deployments

3. Usage Query Performance — Composite Indexes + 90-Day Limit

4. CVE-2026-39892 — cryptography Package Upgrade

5. Missing us.* Pricing for Claude Haiku 4.5 in Geo Cross-Region

Files Changed

Upgrade Notes

Uh oh!

v0.6.0

What's Changed

Contributors

Uh oh!

v0.5.0 — Initial Release

What's Changed

New Contributors

Contributors

Uh oh!

5. Missing `us.*` Pricing for Claude Haiku 4.5 in Geo Cross-Region