Releases: aws-samples/sample-kolya-br-proxy
Releases · aws-samples/sample-kolya-br-proxy
v0.6.3
What's New
Extended Thinking Fix
- Fix: pass through thinking blocks instead of stripping them from assistant message history — resolves
ValidationExceptionon multi-turn conversations with extended thinking enabled - Forward
signature_deltain streaming responses so clients receive valid signatures for subsequent turns - Fix non-streaming response to include actual thinking text and signature
- Auto-adjust
budget_tokensto Bedrock minimum (1024) when clients send smaller values
Prompt Cache TTL Fix
- Fix: skip
cache_control.ttlfor unsupported models — only Claude 4.5 family (Opus 4.5, Sonnet 4.5, Haiku 4.5) supports thettlfield. Non-4.5 models (Sonnet 4, Opus 4, etc.) now get{"type": "ephemeral"}withoutttl, resolvingExtra inputs are not permittederrors - Strip
ttlfrom pre-existing client breakpoints on unsupported models
Batch Token Creation
- New
POST /admin/tokens/batchendpoint — create multiple API keys at once with shared config and optional model list - Comma-separated names input replaces old
name_prefix+countpattern — supports ASCII/Chinese commas, semicolons, and newlines - Frontend batch create dialog with textarea, model multi-select, and one-click "Copy All"
API Keys Search
- Added search filter on API Keys table (search by name)
Zero-Downtime Rolling Updates
- Added
preStophook (sleep 15backend /sleep 10frontend) to keep pods alive during ALB target deregistration maxUnavailable: 0+maxSurge: 1ensures new pod is ready before old one terminates- Proper
terminationGracePeriodSeconds(65s backend / 30s frontend)
EMF Metrics Fix
- Switched CloudWatch Embedded Metrics from TCP socket to stdout sink (
config.environment = "local") - Eliminates
Connection refusederrors in pods without CloudWatch Agent sidecar
Infrastructure
- Renamed Karpenter IAM controller role to
karp-ctrl-{alias}-...to prevent cross-project naming collision - Increased
BEDROCK_ACCOUNT_RPMfrom 500 to 10,000
Code Quality
- Replaced hand-rolled clipboard logic with Quasar
copyToClipboard - Batch DB refresh: single
WHERE INquery instead of N sequential SELECTs - Fixed
fetchAvailableModelsmissing loading state (spinner bug)
PRs
v0.6.2
What's New
Stream Failover (PR #15)
- Two-level stream failover for capacity-starved regions
- L1: Same model, different region (transparent to client)
- L2: Different model (client notified via
x-actual-modelSSE comment)
STREAM_FIRST_CONTENT_TIMEOUTdefault set to 600s
AWS-Native Observability (PR #16)
- Structured logging: JSON format with per-API-key context (
token_name,token_id) viacontextvars - CloudWatch EMF metrics: RequestDuration, TokensInput/Output, CacheTokens, TTFT, BedrockCallDuration, FailoverTriggered, HttpRequestDuration/Count
- AWS X-Ray tracing: OpenTelemetry integration via CloudWatch Agent DaemonSet (OTLP HTTP port 4316)
- Log-trace correlation:
trace_idandspan_idauto-injected into every log record - Runtime observability config API (
PUT /admin/observability) and Settings UI - CloudWatch Observability EKS addon with Pod Identity IAM
- CloudWatch log retention: prod=7d, non-prod=3d
Locust Benchmark Tool (PR #17)
- Three API endpoint benchmarks: OpenAI, Anthropic, Gemini
- Streaming SSE parsing with TTFT measurement
- Extended thinking support (
BENCHMARK_THINKING_BUDGET) - Configurable prompt sizes (small/medium/large)
Bug Fixes & Improvements
Observability Fixes (PR #18, #19)
- Add missing
bedrock.invoke_streamspan to failover streaming path - Exclude
GeneratorExitfrom span exception recording (normal generator cleanup) - Record
bedrock.duration_sinfinallyblock to cover both success and error paths - Filter OPTIONS (CORS preflight) and
/admin/*from X-Ray traces and EMF metrics to reduce noise - Self-manage Karpenter node IAM with
create_before_destroylifecycle
Security
- Increase PBKDF2 iterations for refresh token hashing from 1 to 100,000
- Upgrade vite (7.3.2) and axios (1.15.0) to resolve critical/high CVEs
Documentation
- Comprehensive observability docs (EN + ZH) with ASGI span breakdown, request flow diagrams, and X-Ray trace examples
- FAQ: why one user interaction produces multiple X-Ray traces
- Stream failover, logging, security, and performance docs update
v0.6.1
v0.6.1 Release Notes
Bug Fixes
1. KMS Decrypt Permissions for External Secrets Operator
- Issue: ESO (External Secrets Operator) failed with
AccessDeniedException: Access to KMS is not allowedwhen Secrets Manager secrets are encrypted with a customer-managed KMS key (CMK) - Fix: Added
kms:Decryptandkms:DescribeKeypermissions to the ESO IAM policy in Terraform (iac/modules/eks-addons/main.tf) - Impact: Deployments using CMK-encrypted secrets in Secrets Manager now sync correctly
2. Refresh Token Race Condition on Multi-Pod Deployments
- Issue: When multiple pods handle concurrent refresh token requests, the token reuse detection logic would falsely trigger "Token theft detected", revoking the entire token family and forcing users to re-login
- Fix (Backend): Added a 10-second grace period in
refresh_token.py— if a child token was created within the grace window, the reuse is recognized as a concurrent refresh rather than token theft - Fix (Frontend): Added refresh request deduplication in
axios.tsusing a shared promise pattern — only one refresh request is sent at a time, other 401 responses queue and wait for the result - Impact: Users on multi-pod deployments no longer get randomly logged out
3. Usage Query Performance — Composite Indexes + 90-Day Limit
- Issue: Usage statistics queries on
usage_recordstable become slow as data grows, with no upper bound on query time range - Fix (Database): Added Alembic migration with composite indexes
(user_id, created_at)and(token_id, created_at)onusage_records - Fix (Backend): Added
_clamp_date_range()helper enforcing a 90-day maximum query range on all 5 usage API endpoints (/stats,/by-token,/by-model,/aggregated-stats,/token-summary,/tokens-timeseries). Returns HTTP 400 if range exceeds 90 days - Fix (Frontend): Added
minAllowedDateconstraint on date pickers in bothDashboardPage.vueandMonitorPage.vue, with user-facing validation warnings via Quasar Notify - Impact: Query performance significantly improved; prevents unbounded scans on large usage tables
4. CVE-2026-39892 — cryptography Package Upgrade
- Issue:
cryptography46.0.6 had a known vulnerability (CVE-2026-39892) causingpip-auditCI failures - Fix: Upgraded
cryptographydependency from>=46.0.5to>=46.0.7inpyproject.toml - Impact: Resolves security vulnerability and unblocks CI pipeline
5. Missing us.* Pricing for Claude Haiku 4.5 in Geo Cross-Region
- Issue: The pricing table in us-east-1 only had
global.anthropic.claude-haiku-4-5-*but was missingus.anthropic.claude-haiku-4-5-*. This caused cost calculation to fall back to base model pricing (or fail) when users invokeus.anthropic.claude-haiku-4-5-*via Geo/In-region cross-region inference - Root Cause: The AWS Bedrock pricing page uses inconsistent naming — "Claude Haiku 4.5" in the Global Cross-region section vs "Claude 4.5 Haiku" in the Geo/In-region section. The static model name mapping only had the former
- Fix: Added
"Claude 4.5 Haiku"as an alias inpricing_updater.pystatic mapping - Impact:
us.*cross-region pricing entries are now correctly created for Claude Haiku 4.5 during pricing refresh
Files Changed
iac/modules/eks-addons/main.tf— KMS permissions for ESObackend/app/services/refresh_token.py— Concurrent refresh grace periodfrontend/src/boot/axios.ts— Refresh request deduplicationbackend/alembic/versions/e6f7g8h9i0j1_add_usage_records_composite_indexes.py— New migrationbackend/app/models/usage.py— Model-level index declarationsbackend/app/api/admin/endpoints/usage.py— 90-day query limitfrontend/src/pages/MonitorPage.vue— Date picker constraints + validationfrontend/src/pages/DashboardPage.vue— Date picker constraintspyproject.toml/uv.lock— cryptography upgradebackend/app/services/pricing_updater.py— Claude 4.5 Haiku name alias
Upgrade Notes
- Run
alembic upgrade headto apply the new composite indexes onusage_records - Trigger a pricing refresh (or wait for the next scheduled run) to populate missing
us.*pricing entries
v0.6.0
What's Changed
- Feat/gemini integration by @koljahuang in #3
- feat: increase all timeout configurations to 1 hour by @koljahuang in #4
- Fix/model access and token hash by @koljahuang in #10
Full Changelog: v0.5.0...v0.6.0
v0.5.0 — Initial Release
Kolya Bedrock Proxy is an AI gateway that provides OpenAI-compatible and Anthropic-native API access to AWS Bedrock models, with a built-in admin dashboard for token management, usage tracking, and model configuration.
Highlights
- Dual API Support — Compatible with both OpenAI SDK (/v1/chat/completions) and Anthropic SDK (/v1/messages), including Claude Code integration.
- AWS Bedrock Models — Supports Claude, Nova, Llama, DeepSeek, GLM, Mistral, and more via AWS Bedrock Converse API.
- Admin Dashboard — Web UI for managing API tokens, monitoring usage, configuring models, and viewing pricing.
- Production-Ready Infrastructure — Terraform modules for VPC, EKS (with Karpenter), RDS Aurora PostgreSQL, WAF, Global Accelerator, and Cognito/Microsoft OAuth.
- Security — Non-root containers, KMS CMK for Secrets Manager, RDS IAM authentication, rate limiting via Redis, and prompt injection protection.
What's Changed
- chore(deps): bump cryptography from 46.0.5 to 46.0.6 by @dependabot[bot] in #1
- chore(deps): bump pygments from 2.19.2 to 2.20.0 by @dependabot[bot] in #2
New Contributors
- @dependabot[bot] made their first contribution in #1
Full Changelog: https://github.com/aws-samples/sample-kolya-br-proxy/commits/v0.5.0