An open-source compatibility and routing layer for developer workflows, with multi-account rotation, token auto-refresh, quota management, and protocol adaptation for OpenAI / Anthropic / Gemini
Features • Quick Start • CLI Configuration • API • Project Structure • License
中文 | English
Kiro API Proxy is an open-source compatibility and request routing layer for developer tooling workflows. It is designed to connect Kiro-related capabilities with common LLM client workflows, with a focus on protocol compatibility, authentication management, request routing, quota control, and operational stability in multi-account environments.
It can serve as a unified integration layer for tools such as Claude Code, Codex CLI, and Gemini CLI, making it easier to debug, switch, monitor, and maintain real-world developer workflows.
⚠️ Testing NoteThis project currently supports Claude Code, Codex CLI, and Gemini CLI, with full tool-calling support.
- Multi-protocol support - Compatible with OpenAI / Anthropic / Gemini protocols
- Full tool-calling support - Complete tool-calling support across all three protocols
- Image understanding - Supports image input for Claude Code / Codex CLI
- Web search - Supports web search tools for Claude Code / Codex CLI
- Multi-account rotation - Add multiple Kiro accounts with automatic load balancing
- Session stickiness - Reuses the same account within 60 seconds for the same session to preserve context continuity
- Web UI - A clean admin interface with monitoring, logs, and settings
- Multilingual interface - Supports both Chinese and English UI switching
- Multilingual support - Full Chinese / English switching in the Web UI
- Bilingual launcher - Port / language settings with clearer launch actions
- English documentation - All 5 built-in docs have been translated into English
- Improved Windows support - Registry browser detection + PATH fallback, including portable browser support
- Packaging resource fixes - Icons and built-in docs now load correctly after PyInstaller packaging
- More stable token scanning - Fixed Windows path encoding issues
- Command-line interface (CLI) - Easy management in headless or server environments
python run.py accounts list- List accountspython run.py accounts export/import- Export / import accountspython run.py accounts add- Add token interactivelypython run.py accounts scan- Scan local tokenspython run.py login google/github- Log in from the command linepython run.py login remote- Generate a remote login link
- Remote login links - Complete authorization on a browser-enabled machine and sync tokens automatically
- Account import/export - Migrate account configurations across machines
- Manual token input - Paste accessToken / refreshToken directly
- Full Codex CLI support - Uses the OpenAI Responses API (
/v1/responses)- Full support for tool calls (shell, file, and all other tools)
- Image input support (
input_imagetype) - Web search support (
web_searchtool) - Error code mapping (
rate_limit,context_length, etc.)
- Enhanced Claude Code support - Full image understanding and web search support
- Supports both Anthropic and OpenAI image formats
- Supports
web_search/web_search_20250305tools
- Request rate limiting - Reduces account risk by controlling request frequency
- Minimum interval per account
- Maximum requests per minute per account
- Global maximum requests per minute
- Configurable in the Web UI settings page
- Account anomaly detection - Automatically detects errors such as
TEMPORARILY_SUSPENDED- Clear and user-friendly error logs
- Automatically disables affected accounts
- Automatically switches to another available account
- Unified error handling - Shared error classification and handling logic across all three protocols
- Conversation history management - Four strategies for handling context length limits, freely combinable
- Auto truncation: preserve the most recent context and summarize earlier messages before sending; truncate by count / chars if necessary
- Smart summarization: use AI to summarize earlier conversation while preserving key context
- Summary cache: reuse recent summaries when history changes only slightly, reducing repeated LLM calls (enabled by default)
- Retry on error: automatically truncate and retry on length errors (enabled by default)
- Pre-check estimation: estimate token usage and truncate proactively before hitting the limit
- Gemini tool-calling support - Full support for
functionDeclarations/functionCall/functionResponse - Settings page - Added a settings tab in the Web UI for configuring conversation history management
- Usage tracking - Check quota usage, including used / remaining / utilization rate
- Multiple login methods - Supports Google / GitHub / AWS Builder ID
- Traffic monitoring - Full LLM request monitoring with search, filtering, and export
- Browser selection - Automatically detects installed browsers and supports incognito mode
- Documentation center - Built-in help docs with sidebar navigation and Markdown rendering
- Token pre-refresh - Background checks every 5 minutes and refreshes tokens 15 minutes before expiry
- Health checks - Verifies account availability every 10 minutes and updates status automatically
- Enhanced request statistics - Stats by account / model, plus 24-hour trends
- Retry mechanism - Automatic retry with exponential backoff for network errors / 5xx responses
| Feature | Anthropic (Claude Code) | OpenAI (Codex CLI) | Gemini |
|---|---|---|---|
| Tool definitions | ✅ tools |
✅ tools.function |
✅ functionDeclarations |
| Tool call response | ✅ tool_use |
✅ tool_calls |
✅ functionCall |
| Tool result | ✅ tool_result |
✅ tool role message |
✅ functionResponse |
| Forced tool calling | ✅ tool_choice |
✅ tool_choice |
✅ toolConfig.mode |
| Tool count limit | ✅ 50 | ✅ 50 | ✅ 50 |
| History repair | ✅ | ✅ | ✅ |
| Image understanding | ✅ | ✅ | ❌ |
| Web search | ✅ | ✅ | ❌ |
The Kiro API has an input length limit. When the conversation history becomes too long, it may return an error like:
Input is too long. (CONTENT_LENGTH_EXCEEDS_THRESHOLD)
The proxy includes built-in history management, configurable from the Settings page:
- Retry on error (default): automatically truncate and retry when a length error occurs
- Smart summarization: use AI to summarize earlier conversation while keeping key context
- Summary cache (default): reuse recent summaries when history changes only slightly, reducing repeated LLM calls
- Auto truncation: preserve the latest context and summarize earlier messages before each request; truncate by count / chars if needed
- Pre-check estimation: estimate token usage and truncate before hitting the limit
The summary cache can be tuned with the following config options (default values):
summary_cache_enabled:truesummary_cache_min_delta_messages:3summary_cache_min_delta_chars:4000summary_cache_max_age_seconds:180
- In Claude Code, enter
/clearto clear the conversation history - Tell the AI what you were working on previously; it can read code files to recover context
Download the package for your platform from Releases, extract it, and run it directly.
# Clone the project
git clone https://github.com/yourname/kiro-proxy.git
cd kiro-proxy
# Create a virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run
python run.py
# Or specify a port
python run.py 8081After startup, visit:
http://localhost:8080
In headless environments, use the CLI to manage accounts and services:
# Account management
python run.py accounts list # List accounts
python run.py accounts export -o acc.json # Export accounts
python run.py accounts import acc.json # Import accounts
python run.py accounts add # Add token interactively
python run.py accounts scan --auto # Scan and auto-add local tokens
# Login
python run.py login google # Google login
python run.py login github # GitHub login
python run.py login remote --host myserver.com:8080 # Generate remote login link
# Service
python run.py serve # Start service (default: 8080)
python run.py serve -p 8081 # Specify port
python run.py status # Show statusOption 1: Online Login (Recommended)
- Open the Web UI and click Online Login
- Choose a login method: Google / GitHub / AWS Builder ID
- Complete authorization in the browser
- The account will be added automatically
Option 2: Scan Tokens
- Open Kiro IDE and sign in with a Google / GitHub account
- After login, tokens are automatically saved to
~/.aws/sso/cache/ - Click Scan Tokens in the Web UI to add the account
| Kiro Model | Capability | Claude Code | Codex |
|---|---|---|---|
claude-sonnet-4 |
⭐⭐⭐ Recommended | claude-sonnet-4 |
gpt-4o |
claude-sonnet-4.5 |
⭐⭐⭐⭐ Stronger | claude-sonnet-4.5 |
gpt-4o |
claude-haiku-4.5 |
⚡ Faster | claude-haiku-4.5 |
gpt-4o-mini |
claude-opus-4.5 |
⭐⭐⭐⭐⭐ Best | claude-opus-4.5 |
o1 |
Name: Kiro Proxy
API Key: any
Base URL: http://localhost:8080
Model: claude-sonnet-4
Codex CLI uses the OpenAI Responses API. Configure it like this:
# Set environment variables
export OPENAI_API_KEY=any
export OPENAI_BASE_URL=http://localhost:8080/v1
# Run Codex
codexOr configure it in ~/.codex/config.toml:
[providers.openai]
api_key = "any"
base_url = "http://localhost:8080/v1"| Protocol | Endpoint | Purpose |
|---|---|---|
| OpenAI | POST /v1/chat/completions |
Chat Completions API |
| OpenAI | POST /v1/responses |
Responses API (Codex CLI) |
| OpenAI | GET /v1/models |
Model list |
| Anthropic | POST /v1/messages |
Claude Code |
| Anthropic | POST /v1/messages/count_tokens |
Token counting |
| Gemini | POST /v1/models/{model}:generateContent |
Gemini CLI |
| Endpoint | Method | Description |
|---|---|---|
/api/accounts |
GET | Get all account states |
/api/accounts/{id} |
GET | Get account details |
/api/accounts/{id}/usage |
GET | Get account usage info |
/api/accounts/{id}/refresh |
POST | Refresh account token |
/api/accounts/{id}/restore |
POST | Restore account from cooldown state |
/api/accounts/refresh-all |
POST | Refresh all soon-to-expire tokens |
/api/flows |
GET | Get traffic logs |
/api/flows/stats |
GET | Get traffic statistics |
/api/flows/{id} |
GET | Get traffic detail |
/api/quota |
GET | Get quota status |
/api/stats |
GET | Get statistics |
/api/health-check |
POST | Trigger health check manually |
/api/browsers |
GET | Get available browsers |
/api/docs |
GET | Get documentation list |
/api/docs/{id} |
GET | Get documentation content |
kiro_proxy/
├── main.py # FastAPI app entrypoint
├── config.py # Global configuration
├── converters.py # Protocol conversion
│
├── core/ # Core modules
│ ├── account.py # Account management
│ ├── state.py # Global state
│ ├── persistence.py # Persistent config storage
│ ├── scheduler.py # Background task scheduler
│ ├── stats.py # Request statistics
│ ├── retry.py # Retry mechanism
│ ├── browser.py # Browser detection
│ ├── flow_monitor.py # Traffic monitoring
│ └── usage.py # Usage query
│
├── credential/ # Credential management
│ ├── types.py # KiroCredentials
│ ├── fingerprint.py # Machine ID generation
│ ├── quota.py # Quota manager
│ └── refresher.py # Token refresh
│
├── auth/ # Authentication modules
│ └── device_flow.py # Device Code Flow / Social Auth
│
├── handlers/ # API handlers
│ ├── anthropic.py # /v1/messages
│ ├── openai.py # /v1/chat/completions
│ ├── responses.py # /v1/responses (Codex CLI)
│ ├── gemini.py # /v1/models/{model}:generateContent
│ └── admin.py # Admin API
│
├── cli.py # Command-line interface
│
├── docs/ # Built-in documentation
│ ├── 01-quickstart.md # Quick start
│ ├── 02-features.md # Features
│ ├── 03-faq.md # FAQ
│ └── 04-api.md # API reference
│
└── web/
└── html.py # Web UI (componentized single file)
# Install build dependency
pip install pyinstaller
# Build
python build.pyThe output files will be generated in the dist/ directory.
- Connect Kiro-related capabilities to clients such as Claude Code, Codex CLI, and Gemini CLI
- Centralize request routing and account management in multi-account environments
- Maintain token refresh, quota status, and health checks in one place
- Provide a unified compatibility layer and observability surface for developer workflows
This project is for learning and research purposes only. Please use it in compliance with the applicable terms of service and relevant usage rules. Any consequences arising from the use of this project are the sole responsibility of the user.
This project is not officially affiliated with Kiro, AWS, Anthropic, Google or OpenAI.