Kiro API Proxy

An open-source compatibility and routing layer for developer workflows, with multi-account rotation, token auto-refresh, quota management, and protocol adaptation for OpenAI / Anthropic / Gemini

Features • Quick Start • CLI Configuration • API • Project Structure • License

中文 | English

Kiro API Proxy is an open-source compatibility and request routing layer for developer tooling workflows. It is designed to connect Kiro-related capabilities with common LLM client workflows, with a focus on protocol compatibility, authentication management, request routing, quota control, and operational stability in multi-account environments.

It can serve as a unified integration layer for tools such as Claude Code, Codex CLI, and Gemini CLI, making it easier to debug, switch, monitor, and maintain real-world developer workflows.

⚠️ Testing Note

This project currently supports Claude Code, Codex CLI, and Gemini CLI, with full tool-calling support.

Features

Core Features

Multi-protocol support - Compatible with OpenAI / Anthropic / Gemini protocols
Full tool-calling support - Complete tool-calling support across all three protocols
Image understanding - Supports image input for Claude Code / Codex CLI
Web search - Supports web search tools for Claude Code / Codex CLI
Multi-account rotation - Add multiple Kiro accounts with automatic load balancing
Session stickiness - Reuses the same account within 60 seconds for the same session to preserve context continuity
Web UI - A clean admin interface with monitoring, logs, and settings
Multilingual interface - Supports both Chinese and English UI switching

What's New in v1.7.2

Multilingual support - Full Chinese / English switching in the Web UI
Bilingual launcher - Port / language settings with clearer launch actions
English documentation - All 5 built-in docs have been translated into English

What's New in v1.7.1

Improved Windows support - Registry browser detection + PATH fallback, including portable browser support
Packaging resource fixes - Icons and built-in docs now load correctly after PyInstaller packaging
More stable token scanning - Fixed Windows path encoding issues

What's New in v1.6.3

Command-line interface (CLI) - Easy management in headless or server environments
- python run.py accounts list - List accounts
- python run.py accounts export/import - Export / import accounts
- python run.py accounts add - Add token interactively
- python run.py accounts scan - Scan local tokens
- python run.py login google/github - Log in from the command line
- python run.py login remote - Generate a remote login link
Remote login links - Complete authorization on a browser-enabled machine and sync tokens automatically
Account import/export - Migrate account configurations across machines
Manual token input - Paste accessToken / refreshToken directly

What's New in v1.6.2

Full Codex CLI support - Uses the OpenAI Responses API (/v1/responses)
- Full support for tool calls (shell, file, and all other tools)
- Image input support (input_image type)
- Web search support (web_search tool)
- Error code mapping (rate_limit, context_length, etc.)
Enhanced Claude Code support - Full image understanding and web search support
- Supports both Anthropic and OpenAI image formats
- Supports web_search / web_search_20250305 tools

What's New in v1.6.1

Request rate limiting - Reduces account risk by controlling request frequency
- Minimum interval per account
- Maximum requests per minute per account
- Global maximum requests per minute
- Configurable in the Web UI settings page
Account anomaly detection - Automatically detects errors such as TEMPORARILY_SUSPENDED
- Clear and user-friendly error logs
- Automatically disables affected accounts
- Automatically switches to another available account
Unified error handling - Shared error classification and handling logic across all three protocols

Features in v1.6.0

Conversation history management - Four strategies for handling context length limits, freely combinable
- Auto truncation: preserve the most recent context and summarize earlier messages before sending; truncate by count / chars if necessary
- Smart summarization: use AI to summarize earlier conversation while preserving key context
- Summary cache: reuse recent summaries when history changes only slightly, reducing repeated LLM calls (enabled by default)
- Retry on error: automatically truncate and retry on length errors (enabled by default)
- Pre-check estimation: estimate token usage and truncate proactively before hitting the limit
Gemini tool-calling support - Full support for functionDeclarations / functionCall / functionResponse
Settings page - Added a settings tab in the Web UI for configuring conversation history management

Features in v1.5.0

Usage tracking - Check quota usage, including used / remaining / utilization rate
Multiple login methods - Supports Google / GitHub / AWS Builder ID
Traffic monitoring - Full LLM request monitoring with search, filtering, and export
Browser selection - Automatically detects installed browsers and supports incognito mode
Documentation center - Built-in help docs with sidebar navigation and Markdown rendering

Features in v1.4.0

Token pre-refresh - Background checks every 5 minutes and refreshes tokens 15 minutes before expiry
Health checks - Verifies account availability every 10 minutes and updates status automatically
Enhanced request statistics - Stats by account / model, plus 24-hour trends
Retry mechanism - Automatic retry with exponential backoff for network errors / 5xx responses

Tool Calling Support

Feature	Anthropic (Claude Code)	OpenAI (Codex CLI)	Gemini
Tool definitions	✅ `tools`	✅ `tools.function`	✅ `functionDeclarations`
Tool call response	✅ `tool_use`	✅ `tool_calls`	✅ `functionCall`
Tool result	✅ `tool_result`	✅ `tool` role message	✅ `functionResponse`
Forced tool calling	✅ `tool_choice`	✅ `tool_choice`	✅ `toolConfig.mode`
Tool count limit	✅ 50	✅ 50	✅ 50
History repair	✅	✅	✅
Image understanding	✅	✅	❌
Web search	✅	✅	❌

Known Limitations

Conversation Length Limit

The Kiro API has an input length limit. When the conversation history becomes too long, it may return an error like:

Input is too long. (CONTENT_LENGTH_EXCEEDS_THRESHOLD)

Automatic Handling (v1.6.0+)

The proxy includes built-in history management, configurable from the Settings page:

Retry on error (default): automatically truncate and retry when a length error occurs
Smart summarization: use AI to summarize earlier conversation while keeping key context
Summary cache (default): reuse recent summaries when history changes only slightly, reducing repeated LLM calls
Auto truncation: preserve the latest context and summarize earlier messages before each request; truncate by count / chars if needed
Pre-check estimation: estimate token usage and truncate before hitting the limit

The summary cache can be tuned with the following config options (default values):

summary_cache_enabled: true
summary_cache_min_delta_messages: 3
summary_cache_min_delta_chars: 4000
summary_cache_max_age_seconds: 180

Manual Handling

In Claude Code, enter /clear to clear the conversation history
Tell the AI what you were working on previously; it can read code files to recover context

Quick Start

Option 1: Download Prebuilt Release

Download the package for your platform from Releases, extract it, and run it directly.

Option 2: Run from Source

# Clone the project
git clone https://github.com/yourname/kiro-proxy.git
cd kiro-proxy

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run
python run.py

# Or specify a port
python run.py 8081

After startup, visit:

http://localhost:8080

Command-Line Interface (CLI)

In headless environments, use the CLI to manage accounts and services:

# Account management
python run.py accounts list                 # List accounts
python run.py accounts export -o acc.json   # Export accounts
python run.py accounts import acc.json      # Import accounts
python run.py accounts add                  # Add token interactively
python run.py accounts scan --auto          # Scan and auto-add local tokens

# Login
python run.py login google                  # Google login
python run.py login github                  # GitHub login
python run.py login remote --host myserver.com:8080  # Generate remote login link

# Service
python run.py serve                         # Start service (default: 8080)
python run.py serve -p 8081                 # Specify port
python run.py status                        # Show status

Getting Tokens

Option 1: Online Login (Recommended)

Open the Web UI and click Online Login
Choose a login method: Google / GitHub / AWS Builder ID
Complete authorization in the browser
The account will be added automatically

Option 2: Scan Tokens

Open Kiro IDE and sign in with a Google / GitHub account
After login, tokens are automatically saved to ~/.aws/sso/cache/
Click Scan Tokens in the Web UI to add the account

CLI Configuration

Model Mapping

Kiro Model	Capability	Claude Code	Codex
`claude-sonnet-4`	⭐⭐⭐ Recommended	`claude-sonnet-4`	`gpt-4o`
`claude-sonnet-4.5`	⭐⭐⭐⭐ Stronger	`claude-sonnet-4.5`	`gpt-4o`
`claude-haiku-4.5`	⚡ Faster	`claude-haiku-4.5`	`gpt-4o-mini`
`claude-opus-4.5`	⭐⭐⭐⭐⭐ Best	`claude-opus-4.5`	`o1`

Claude Code Configuration

Name: Kiro Proxy
API Key: any
Base URL: http://localhost:8080
Model: claude-sonnet-4

Codex Configuration

Codex CLI uses the OpenAI Responses API. Configure it like this:

# Set environment variables
export OPENAI_API_KEY=any
export OPENAI_BASE_URL=http://localhost:8080/v1

# Run Codex
codex

Or configure it in ~/.codex/config.toml:

[providers.openai]
api_key = "any"
base_url = "http://localhost:8080/v1"

API Endpoints

Protocol	Endpoint	Purpose
OpenAI	`POST /v1/chat/completions`	Chat Completions API
OpenAI	`POST /v1/responses`	Responses API (Codex CLI)
OpenAI	`GET /v1/models`	Model list
Anthropic	`POST /v1/messages`	Claude Code
Anthropic	`POST /v1/messages/count_tokens`	Token counting
Gemini	`POST /v1/models/{model}:generateContent`	Gemini CLI

Admin API

Endpoint	Method	Description
`/api/accounts`	GET	Get all account states
`/api/accounts/{id}`	GET	Get account details
`/api/accounts/{id}/usage`	GET	Get account usage info
`/api/accounts/{id}/refresh`	POST	Refresh account token
`/api/accounts/{id}/restore`	POST	Restore account from cooldown state
`/api/accounts/refresh-all`	POST	Refresh all soon-to-expire tokens
`/api/flows`	GET	Get traffic logs
`/api/flows/stats`	GET	Get traffic statistics
`/api/flows/{id}`	GET	Get traffic detail
`/api/quota`	GET	Get quota status
`/api/stats`	GET	Get statistics
`/api/health-check`	POST	Trigger health check manually
`/api/browsers`	GET	Get available browsers
`/api/docs`	GET	Get documentation list
`/api/docs/{id}`	GET	Get documentation content

Project Structure

kiro_proxy/
├── main.py                    # FastAPI app entrypoint
├── config.py                  # Global configuration
├── converters.py              # Protocol conversion
│
├── core/                      # Core modules
│   ├── account.py            # Account management
│   ├── state.py              # Global state
│   ├── persistence.py        # Persistent config storage
│   ├── scheduler.py          # Background task scheduler
│   ├── stats.py              # Request statistics
│   ├── retry.py              # Retry mechanism
│   ├── browser.py            # Browser detection
│   ├── flow_monitor.py       # Traffic monitoring
│   └── usage.py              # Usage query
│
├── credential/                # Credential management
│   ├── types.py              # KiroCredentials
│   ├── fingerprint.py        # Machine ID generation
│   ├── quota.py              # Quota manager
│   └── refresher.py          # Token refresh
│
├── auth/                      # Authentication modules
│   └── device_flow.py        # Device Code Flow / Social Auth
│
├── handlers/                  # API handlers
│   ├── anthropic.py          # /v1/messages
│   ├── openai.py             # /v1/chat/completions
│   ├── responses.py          # /v1/responses (Codex CLI)
│   ├── gemini.py             # /v1/models/{model}:generateContent
│   └── admin.py              # Admin API
│
├── cli.py                     # Command-line interface
│
├── docs/                      # Built-in documentation
│   ├── 01-quickstart.md      # Quick start
│   ├── 02-features.md        # Features
│   ├── 03-faq.md             # FAQ
│   └── 04-api.md             # API reference
│
└── web/
    └── html.py               # Web UI (componentized single file)

Build

# Install build dependency
pip install pyinstaller

# Build
python build.py

The output files will be generated in the dist/ directory.

Use Cases

Connect Kiro-related capabilities to clients such as Claude Code, Codex CLI, and Gemini CLI
Centralize request routing and account management in multi-account environments
Maintain token refresh, quota status, and health checks in one place
Provide a unified compatibility layer and observability surface for developer workflows

Disclaimer

This project is for learning and research purposes only. Please use it in compliance with the applicable terms of service and relevant usage rules. Any consequences arising from the use of this project are the sole responsibility of the user.

This project is not officially affiliated with Kiro, AWS, Anthropic, Google or OpenAI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kiro API Proxy

Features

Core Features

What's New in v1.7.2

What's New in v1.7.1

What's New in v1.6.3

What's New in v1.6.2

What's New in v1.6.1

Features in v1.6.0

Features in v1.5.0

Features in v1.4.0

Tool Calling Support

Known Limitations

Conversation Length Limit

Automatic Handling (v1.6.0+)

Manual Handling

Quick Start

Option 1: Download Prebuilt Release

Option 2: Run from Source

Command-Line Interface (CLI)

Getting Tokens

CLI Configuration

Model Mapping

Claude Code Configuration

Codex Configuration

API Endpoints

Admin API

Project Structure

Build

Use Cases

Disclaimer

FilesExpand file tree

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

Kiro API Proxy

Features

Core Features

What's New in v1.7.2

What's New in v1.7.1

What's New in v1.6.3

What's New in v1.6.2

What's New in v1.6.1

Features in v1.6.0

Features in v1.5.0

Features in v1.4.0

Tool Calling Support

Known Limitations

Conversation Length Limit

Automatic Handling (v1.6.0+)

Manual Handling

Quick Start

Option 1: Download Prebuilt Release

Option 2: Run from Source

Command-Line Interface (CLI)

Getting Tokens

CLI Configuration

Model Mapping

Claude Code Configuration

Codex Configuration

API Endpoints

Admin API

Project Structure

Build

Use Cases

Disclaimer