Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 25 additions & 8 deletions README.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,16 @@
> Select a `route_id` from request intent, then resolve it to your local LiteLLM model group.

<p align="center">
<img alt="status: local validation" src="https://img.shields.io/badge/status-local_validation-f59e0b?style=for-the-badge">
<img alt="entry: semantic-router" src="https://img.shields.io/badge/entry-semantic--router-2563eb?style=for-the-badge">
<img alt="gateway: LiteLLM compatible" src="https://img.shields.io/badge/gateway-LiteLLM_compatible-16a34a?style=for-the-badge">
<img alt="logs: no prompts or tokens" src="https://img.shields.io/badge/logs-no_prompts_or_tokens-7c3aed?style=for-the-badge">
<img alt="runtime Python 3.11+" src="https://img.shields.io/badge/runtime-Python%203.11%2B-3776AB">
<img alt="entry semantic-router" src="https://img.shields.io/badge/entry-semantic--router-0EA5E9">
<img alt="gateway LiteLLM compatible" src="https://img.shields.io/badge/gateway-LiteLLM%20compatible-16A34A">
<img alt="logs no prompt or token" src="https://img.shields.io/badge/logs-no%20prompt%20%7C%20token-7C3AED">
</p>
<p align="center">
<img alt="built with FastAPI" src="https://img.shields.io/badge/built%20with-FastAPI-009688">
<img alt="config YAML" src="https://img.shields.io/badge/config-YAML-CB171E">
<img alt="tests pytest" src="https://img.shields.io/badge/tests-pytest-0A9EDC">
<img alt="package uv" src="https://img.shields.io/badge/package-uv-DE5FE9">
</p>

[中文](README.md)
Expand Down Expand Up @@ -37,7 +43,9 @@ model=semantic-router -> route_id -> target_model -> LiteLLM model group

All other model names pass through.

The default sample config uses product-level route ids such as `fast`, `strong`, and `experimental`, mapped to local LiteLLM model groups such as `cheap-router`, `pro-router`, and `free-probe-router`. These `target_model` values are deployment names, not product API names.
The default sample config uses product-level route ids such as `fast`, `strong`, and `experimental`, mapped to LiteLLM model groups such as `cheap-router`, `pro-router`, and `free-probe-router`. These `target_model` values are deployment names, not product API names.

Deploy IntentMux as a sidecar next to LiteLLM and keep provider secrets, tokens, `.env` files, and mounted LiteLLM data outside this repository.

## Quick Start

Expand Down Expand Up @@ -65,7 +73,7 @@ client -> LiteLLM :4000, model=semantic-router
-> LiteLLM model group
```

`semantic-router` is the compatibility entry name. It does not have to match the product name. LiteLLM's native `smart-router` should remain separate.
Configure `semantic-router` in LiteLLM as a model entry that points to the IntentMux sidecar. Requests that use that model name are routed by intent; other model names pass through unchanged.

## Verification

Expand Down Expand Up @@ -115,6 +123,15 @@ curl http://127.0.0.1:4001/v1/semantic-router/decision \

This returns the selected `route_id`, resolved `target_model`, `policy_id`, reason, rewrite flag, and scores without forwarding to LiteLLM.

## Status
## Runtime Behavior

Run IntentMux as a sidecar in the same deployment boundary as LiteLLM.

- Docker health uses `/health` to avoid readiness flapping restart loops.
- `/ready` checks router, LiteLLM, and embedding availability.
- When embeddings are unavailable, chat requests fail open to `fallback_route_id` and log `reason=embedding_error`.
- LiteLLM/upstream `5xx` responses or connection errors fail closed as redacted `502` responses and log `route_error`.

## Current Capabilities

IntentMux is built for a real local deployment and already includes routing, preflight, LiteLLM-entry E2E, structured logs, and route-error budget gates. It is still in production validation and documentation polish; public-release packaging, license polish, local-path cleanup, and release metadata should be handled after the operational baseline is stable.
IntentMux includes basic routing, preflight checks, LiteLLM-entry E2E, structured logs, and route-error budget gates for lightweight intent-routing validation in local or private gateway deployments.
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,16 @@
> 按请求意图选择 `route_id`,再映射到你的本地 LiteLLM 模型组。

<p align="center">
<img alt="status: local validation" src="https://img.shields.io/badge/status-local_validation-f59e0b?style=for-the-badge">
<img alt="entry: semantic-router" src="https://img.shields.io/badge/entry-semantic--router-2563eb?style=for-the-badge">
<img alt="gateway: LiteLLM compatible" src="https://img.shields.io/badge/gateway-LiteLLM_compatible-16a34a?style=for-the-badge">
<img alt="logs: no prompts or tokens" src="https://img.shields.io/badge/logs-no_prompts_or_tokens-7c3aed?style=for-the-badge">
<img alt="runtime Python 3.11+" src="https://img.shields.io/badge/runtime-Python%203.11%2B-3776AB">
<img alt="entry semantic-router" src="https://img.shields.io/badge/entry-semantic--router-0EA5E9">
<img alt="gateway LiteLLM compatible" src="https://img.shields.io/badge/gateway-LiteLLM%20compatible-16A34A">
<img alt="logs no prompt or token" src="https://img.shields.io/badge/logs-no%20prompt%20%7C%20token-7C3AED">
</p>
<p align="center">
<img alt="built with FastAPI" src="https://img.shields.io/badge/built%20with-FastAPI-009688">
<img alt="config YAML" src="https://img.shields.io/badge/config-YAML-CB171E">
<img alt="tests pytest" src="https://img.shields.io/badge/tests-pytest-0A9EDC">
<img alt="package uv" src="https://img.shields.io/badge/package-uv-DE5FE9">
</p>

[English](README.en.md)
Expand Down Expand Up @@ -37,9 +43,9 @@ model=semantic-router -> route_id -> target_model -> LiteLLM model group

其他模型名默认透传给 LiteLLM。

当前示例配置使用 `fast`、`strong`、`experimental` 三个产品级 route id,并映射到本机 LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些 `target_model` 是部署名,不是产品接口。
默认示例配置使用 `fast`、`strong`、`experimental` 三个产品级 route id,并映射到 LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些 `target_model` 是部署名,不是产品接口。

本仓库和 `/home/raystorm/gateway/litellm` 保持边界清晰。不要把 LiteLLM 挂载目录、token、`.env` 或 provider 凭据加入本仓库。
部署时建议把 IntentMux 作为 LiteLLM 旁路 sidecar 独立管理;不要把 LiteLLM 挂载目录、token、`.env` 或 provider 凭据加入本仓库。

## 适合什么场景

Expand Down Expand Up @@ -87,9 +93,7 @@ client -> LiteLLM :4000, model=semantic-router
-> LiteLLM model group
```

`semantic-router` 是兼容入口名,不等于项目品牌名。项目叫 IntentMux;入口名保留 `semantic-router`,是为了降低现有部署迁移成本。

LiteLLM 原生 `smart-router` 应保持独立:它仍表示 LiteLLM 的 complexity router;IntentMux 的 `semantic-router` 表示本项目的意图分流入口。
在 LiteLLM 中把 `semantic-router` 配置为指向 IntentMux sidecar 的模型入口后,客户端即可通过这个模型名触发意图分流。未命中该入口的模型名会保持透传。

## 配置模型

Expand Down Expand Up @@ -220,19 +224,15 @@ uv run python scripts/import_review_samples.py \

每条 JSONL 必须设置 `redacted: true`,并用 route id 作为 `expect`。

## 生命周期
## 运行行为

推荐把 IntentMux 作为 LiteLLM compose project 里的并列 sidecar,而不是塞进 LiteLLM 挂载目录或服务内部。

当前行为:

- Docker health 使用 `/health`,避免 readiness 抖动触发重启循环。
- `/ready` 检查 router、LiteLLM、embedding 三层。
- embedding 不可用时,聊天请求 fail-open 到 `fallback_route_id`,并记录 `reason=embedding_error`。
- LiteLLM/upstream `5xx` 或连接异常 fail-closed 为脱敏 `502`,并记录 `route_error`。

未来是否把 sidecar 生命周期更强地绑定到 LiteLLM 本体服务,是单独的设计项,不在当前运行时里隐式实现。

## 项目状态
## 当前能力

IntentMux 当前服务真实本地需求,已具备基本路由、preflight、E2E、结构化日志和 error-budget gate。仓库仍处于生产验证和文档打磨阶段,许可证、public-release 文档、本地路径统一和发布包装会在稳定后再处理
IntentMux 已具备基本路由、preflight、LiteLLM-entry E2E、结构化日志和 error-budget gate,适合在本地或私有网关环境中做轻量意图分流验证