From f0766b16be2a473e4087d896dae52a93d4ae6dd3 Mon Sep 17 00:00:00 2001
From: raystorm <2557058999@qq.com>
Date: Tue, 5 May 2026 23:11:26 +0800
Subject: [PATCH 1/4] docs: add README cards and tighten project overview
---
README.md | 93 +++++++++++++++++++++++++++++--------------------------
1 file changed, 49 insertions(+), 44 deletions(-)
diff --git a/README.md b/README.md
index e2ab404..7f8d4ff 100644
--- a/README.md
+++ b/README.md
@@ -3,28 +3,43 @@
> 轻量、可审计的 LiteLLM 意图分流 sidecar。
> 按请求意图选择 `route_id`,再映射到你的本地 LiteLLM 模型组。
+
+
+
+
+
+
+
[English](README.en.md)
-| 项目 | 内容 |
-| --- | --- |
-| 用途 | 在 LiteLLM / OpenAI-compatible gateway 前做轻量意图分流 |
-| 接入面 | 客户端保持打 LiteLLM,只把模型名切到 `semantic-router` |
-| 路由模型 | `semantic-router` 是兼容入口名;产品名是 IntentMux |
-| 决策输出 | `route_id -> target_model`,例如 `strong -> pro-router` |
-| 可审计性 | 结构化 `route_complete` / `route_error` 日志,不记录 prompt 或 bearer token |
-| 运行状态 | 本地生产验证中;暂不按 public-release 项目发布 |
+## 一句话
+
+IntentMux 是一个本地优先的 OpenAI/LiteLLM-compatible 路由 sidecar:客户端仍然请求原来的 LiteLLM 入口,只把模型名切到 `semantic-router`,IntentMux 根据请求意图选择 `route_id`,再映射到实际部署中的 `target_model`。
+
+
+
+ 意图分流 从请求内容判断 `fast` / `strong` / `experimental` 等 route id。 |
+ 低侵入接入 保留 LiteLLM 作为 provider、fallback、限流和鉴权层。 |
+
+
+ 可审计日志 结构化记录 `route_complete` / `route_error`,不记录 prompt、token 或 bearer token。 |
+ 生产前验证 提供 preflight、LiteLLM-entry E2E、日志 summary 和 route-error budget gate。 |
+
+
+
+## 项目边界
-IntentMux 不是模型提供商,也不是 LiteLLM 的替代品。它是一个本地优先的
-routing sidecar:只改写进入 sidecar 的请求 `model` 字段,把
-`model=semantic-router` 路由到配置里的 `route_id`,再解析到实际部署中的
-`target_model`。其他模型名默认透传给 LiteLLM。
+IntentMux 不是模型提供商,也不是 LiteLLM 的替代品。它只处理进入 sidecar 的兼容入口模型:
-当前示例配置使用 `fast`、`strong`、`experimental` 三个产品级 route id,并映射到本机
-LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些
-`target_model` 是部署名,不是产品接口。
+```text
+model=semantic-router -> route_id -> target_model -> LiteLLM model group
+```
+
+其他模型名默认透传给 LiteLLM。
-本仓库和 `/home/raystorm/gateway/litellm` 保持边界清晰。不要把 LiteLLM
-挂载目录、token、`.env` 或 provider 凭据加入本仓库。
+当前示例配置使用 `fast`、`strong`、`experimental` 三个产品级 route id,并映射到本机 LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些 `target_model` 是部署名,不是产品接口。
+
+本仓库和 `/home/raystorm/gateway/litellm` 保持边界清晰。不要把 LiteLLM 挂载目录、token、`.env` 或 provider 凭据加入本仓库。
## 适合什么场景
@@ -33,8 +48,7 @@ LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些
- 你希望路由决策可回放、可审计、可用日志继续改进。
- 你不想引入一个大型调度平台,也不想让客户端大改端点。
-IntentMux 的差异化不是“再造一个复杂 router”,而是轻量、本地、快速部署、日志可读。
-成熟的 provider 路由、fallback、限流和鉴权仍交给 LiteLLM。
+IntentMux 的差异化不是“再造一个复杂 router”,而是轻量、本地、快速部署、日志可读。成熟的 provider 路由、fallback、限流和鉴权仍交给 LiteLLM。
## 快速运行
@@ -44,11 +58,13 @@ uv run python -m router.app
默认端点:
-- IntentMux sidecar: `http://127.0.0.1:4001`
-- LiteLLM upstream: `http://127.0.0.1:4000`
-- Embedding upstream: `http://127.0.0.1:1234/v1/embeddings`
+| 服务 | 地址 |
+| --- | --- |
+| IntentMux sidecar | `http://127.0.0.1:4001` |
+| LiteLLM upstream | `http://127.0.0.1:4000` |
+| Embedding upstream | `http://127.0.0.1:1234/v1/embeddings` |
-环境变量:
+常用环境变量:
- `ROUTER_HOST`
- `ROUTER_PORT`
@@ -61,8 +77,7 @@ uv run python -m router.app
## LiteLLM 接入方式
-低侵入接入方式是:客户端继续请求 LiteLLM `:4000`,只把模型名切到
-`semantic-router`。
+低侵入接入方式是:客户端继续请求 LiteLLM `:4000`,只把模型名切到 `semantic-router`。
```text
client -> LiteLLM :4000, model=semantic-router
@@ -72,11 +87,9 @@ client -> LiteLLM :4000, model=semantic-router
-> LiteLLM model group
```
-`semantic-router` 是兼容入口名,不等于项目品牌名。项目叫 IntentMux;入口名保留
-`semantic-router`,是为了降低现有部署迁移成本。
+`semantic-router` 是兼容入口名,不等于项目品牌名。项目叫 IntentMux;入口名保留 `semantic-router`,是为了降低现有部署迁移成本。
-LiteLLM 原生 `smart-router` 应保持独立:它仍表示 LiteLLM 的 complexity router;
-IntentMux 的 `semantic-router` 表示本项目的意图分流入口。
+LiteLLM 原生 `smart-router` 应保持独立:它仍表示 LiteLLM 的 complexity router;IntentMux 的 `semantic-router` 表示本项目的意图分流入口。
## 配置模型
@@ -100,8 +113,7 @@ routes:
- 这个线上 bug 为什么偶发
```
-运行时校验会阻止递归配置:入口模型本身不能作为 route id 或 target model,
-`fallback_route_id` 必须存在。
+运行时校验会阻止递归配置:入口模型本身不能作为 route id 或 target model,`fallback_route_id` 必须存在。
## 验证
@@ -189,8 +201,7 @@ curl http://127.0.0.1:4001/v1/semantic-router/decision \
## 语义资产
-运行时保持轻依赖。更大的 route bank 从 `config/route_sources.yaml` 声明的来源离线生成,
-不把 Hugging Face 等构建依赖带进运行时。
+运行时保持轻依赖。更大的 route bank 从 `config/route_sources.yaml` 声明的来源离线生成,不把 Hugging Face 等构建依赖带进运行时。
```bash
uv sync --group assets
@@ -211,23 +222,17 @@ uv run python scripts/import_review_samples.py \
## 生命周期
-推荐把 IntentMux 作为 LiteLLM compose project 里的并列 sidecar,而不是塞进
-LiteLLM 挂载目录或服务内部。
+推荐把 IntentMux 作为 LiteLLM compose project 里的并列 sidecar,而不是塞进 LiteLLM 挂载目录或服务内部。
当前行为:
- Docker health 使用 `/health`,避免 readiness 抖动触发重启循环。
- `/ready` 检查 router、LiteLLM、embedding 三层。
-- embedding 不可用时,聊天请求 fail-open 到 `fallback_route_id`,并记录
- `reason=embedding_error`。
-- LiteLLM/upstream `5xx` 或连接异常 fail-closed 为脱敏 `502`,并记录
- `route_error`。
+- embedding 不可用时,聊天请求 fail-open 到 `fallback_route_id`,并记录 `reason=embedding_error`。
+- LiteLLM/upstream `5xx` 或连接异常 fail-closed 为脱敏 `502`,并记录 `route_error`。
-未来是否把 sidecar 生命周期更强地绑定到 LiteLLM 本体服务,是单独的设计项,不在当前
-运行时里隐式实现。
+未来是否把 sidecar 生命周期更强地绑定到 LiteLLM 本体服务,是单独的设计项,不在当前运行时里隐式实现。
## 项目状态
-IntentMux 当前服务真实本地需求,已具备基本路由、preflight、E2E、结构化日志和
-error-budget gate。仓库仍处于生产验证和文档打磨阶段,许可证、public-release 文档、
-本地路径统一和发布包装会在稳定后再处理。
+IntentMux 当前服务真实本地需求,已具备基本路由、preflight、E2E、结构化日志和 error-budget gate。仓库仍处于生产验证和文档打磨阶段,许可证、public-release 文档、本地路径统一和发布包装会在稳定后再处理。
From a8c9c3b6666bc43f1ac89122f502df300bd522c6 Mon Sep 17 00:00:00 2001
From: raystorm <2557058999@qq.com>
Date: Tue, 5 May 2026 23:12:18 +0800
Subject: [PATCH 2/4] docs: sync English README overview with cards
---
README.en.md | 76 +++++++++++++++++++++++++++++-----------------------
1 file changed, 43 insertions(+), 33 deletions(-)
diff --git a/README.en.md b/README.en.md
index f5e52f1..ee5a472 100644
--- a/README.en.md
+++ b/README.en.md
@@ -3,25 +3,41 @@
> Lightweight, auditable intent-routing sidecar for LiteLLM.
> Select a `route_id` from request intent, then resolve it to your local LiteLLM model group.
+
+
+
+
+
+
+
[中文](README.md)
-| Area | Value |
-| --- | --- |
-| Purpose | Lightweight intent routing in front of LiteLLM / OpenAI-compatible gateways |
-| Integration | Keep clients on LiteLLM; opt in with `model=semantic-router` |
-| Entry model | `semantic-router` is the compatibility entry name; IntentMux is the product name |
-| Decision shape | `route_id -> target_model`, for example `strong -> pro-router` |
-| Auditability | Structured `route_complete` / `route_error` logs without prompts or bearer tokens |
-| Status | Local production validation; not packaged as a public release yet |
-
-IntentMux is not a model provider and does not replace LiteLLM. It is a
-local-first routing sidecar that rewrites only selected request `model` fields:
-`model=semantic-router` becomes a configured `route_id`, then resolves to a
-deployment-specific `target_model`. All other model names pass through.
-
-The default sample config uses product-level route ids such as `fast`, `strong`,
-and `experimental`, mapped to local LiteLLM model groups such as `cheap-router`,
-`pro-router`, and `free-probe-router`.
+## One Line
+
+IntentMux is a local-first OpenAI/LiteLLM-compatible routing sidecar. Clients keep using the existing LiteLLM endpoint and opt in with `model=semantic-router`; IntentMux selects a `route_id` from request intent, then resolves that route to the deployment-specific `target_model`.
+
+
+
+ Intent routing Select product-level routes such as `fast`, `strong`, and `experimental` from request content. |
+ Low-intrusion integration Keep LiteLLM responsible for providers, fallback, rate limits, and authentication. |
+
+
+ Auditable logs Record structured `route_complete` / `route_error` events without prompts, tokens, or bearer tokens. |
+ Operational gates Ship with preflight, LiteLLM-entry E2E, log summaries, and route-error budget checks. |
+
+
+
+## Project Boundary
+
+IntentMux is not a model provider and does not replace LiteLLM. It only handles the configured compatibility entry model:
+
+```text
+model=semantic-router -> route_id -> target_model -> LiteLLM model group
+```
+
+All other model names pass through.
+
+The default sample config uses product-level route ids such as `fast`, `strong`, and `experimental`, mapped to local LiteLLM model groups such as `cheap-router`, `pro-router`, and `free-probe-router`. These `target_model` values are deployment names, not product API names.
## Quick Start
@@ -31,14 +47,15 @@ uv run python -m router.app
Default endpoints:
-- IntentMux sidecar: `http://127.0.0.1:4001`
-- LiteLLM upstream: `http://127.0.0.1:4000`
-- Embedding upstream: `http://127.0.0.1:1234/v1/embeddings`
+| Service | URL |
+| --- | --- |
+| IntentMux sidecar | `http://127.0.0.1:4001` |
+| LiteLLM upstream | `http://127.0.0.1:4000` |
+| Embedding upstream | `http://127.0.0.1:1234/v1/embeddings` |
## LiteLLM Entry
-The low-intrusion path is to keep clients on LiteLLM `:4000` and change only the
-model name to `semantic-router`.
+The low-intrusion path is to keep clients on LiteLLM `:4000` and change only the model name to `semantic-router`.
```text
client -> LiteLLM :4000, model=semantic-router
@@ -48,8 +65,7 @@ client -> LiteLLM :4000, model=semantic-router
-> LiteLLM model group
```
-`semantic-router` is the compatibility entry name. It does not have to match the
-product name. LiteLLM's native `smart-router` should remain separate.
+`semantic-router` is the compatibility entry name. It does not have to match the product name. LiteLLM's native `smart-router` should remain separate.
## Verification
@@ -87,8 +103,7 @@ docker logs --since 12h gateway_semantic_router 2>&1 \
--max-upstream-status-rate 400=0
```
-Structured logs count `route_id`, `target_model`, `policy_id`, `reason`,
-`stream`, and `upstream_status`, while avoiding prompt and bearer-token logging.
+Structured logs count `route_id`, `target_model`, `policy_id`, `reason`, `stream`, and `upstream_status`, while avoiding prompt and bearer-token logging.
## Decision Preview
@@ -98,13 +113,8 @@ curl http://127.0.0.1:4001/v1/semantic-router/decision \
-d '{"model":"semantic-router","messages":[{"role":"user","content":"Why is this production bug intermittent?"}]}'
```
-This returns the selected `route_id`, resolved `target_model`, `policy_id`,
-reason, rewrite flag, and scores without forwarding to LiteLLM.
+This returns the selected `route_id`, resolved `target_model`, `policy_id`, reason, rewrite flag, and scores without forwarding to LiteLLM.
## Status
-IntentMux is built for a real local deployment and already includes routing,
-preflight, LiteLLM-entry E2E, structured logs, and route-error budget gates. It
-is still in production validation and documentation polish; public-release
-packaging, license polish, local-path cleanup, and release metadata should be
-handled after the operational baseline is stable.
+IntentMux is built for a real local deployment and already includes routing, preflight, LiteLLM-entry E2E, structured logs, and route-error budget gates. It is still in production validation and documentation polish; public-release packaging, license polish, local-path cleanup, and release metadata should be handled after the operational baseline is stable.
From 2a51feabd4232ea3888b98aaf3660071606dee6d Mon Sep 17 00:00:00 2001
From: raystorm <2557058999@qq.com>
Date: Wed, 6 May 2026 12:35:58 +0800
Subject: [PATCH 3/4] docs: tighten README compatibility and log wording
---
README.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 7f8d4ff..f484e80 100644
--- a/README.md
+++ b/README.md
@@ -7,14 +7,14 @@
-
+
[English](README.en.md)
## 一句话
-IntentMux 是一个本地优先的 OpenAI/LiteLLM-compatible 路由 sidecar:客户端仍然请求原来的 LiteLLM 入口,只把模型名切到 `semantic-router`,IntentMux 根据请求意图选择 `route_id`,再映射到实际部署中的 `target_model`。
+IntentMux 是一个本地优先的 OpenAI-compatible / LiteLLM-compatible 路由 sidecar:客户端仍然请求原来的 LiteLLM 入口,只把模型名切到 `semantic-router`,IntentMux 根据请求意图选择 `route_id`,再映射到实际部署中的 `target_model`。
@@ -157,7 +157,7 @@ IntentMux 只统计结构化 JSON 路由日志:
- `stream`
- `upstream_status`
-不会记录 prompt 或 bearer token。
+不会记录 prompt、completion、token usage 或 bearer token。`request_id` 只用于跨层关联,可能来自请求头、`metadata.semantic_router_request_id`、`user` 字段,或由 IntentMux 生成。
12 小时窗口 summary:
From 082ff2b358fe6a8b3db81e28da33f2608c692ecf Mon Sep 17 00:00:00 2001
From: raystorm <2557058999@qq.com>
Date: Wed, 6 May 2026 12:37:11 +0800
Subject: [PATCH 4/4] docs: tighten English README compatibility and log
wording
---
README.en.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/README.en.md b/README.en.md
index ee5a472..fad42c5 100644
--- a/README.en.md
+++ b/README.en.md
@@ -7,14 +7,14 @@
-
+
[中文](README.md)
## One Line
-IntentMux is a local-first OpenAI/LiteLLM-compatible routing sidecar. Clients keep using the existing LiteLLM endpoint and opt in with `model=semantic-router`; IntentMux selects a `route_id` from request intent, then resolves that route to the deployment-specific `target_model`.
+IntentMux is a local-first OpenAI-compatible / LiteLLM-compatible routing sidecar. Clients keep using the existing LiteLLM endpoint and opt in with `model=semantic-router`; IntentMux selects a `route_id` from request intent, then resolves that route to the deployment-specific `target_model`.
@@ -103,7 +103,7 @@ docker logs --since 12h gateway_semantic_router 2>&1 \
--max-upstream-status-rate 400=0
```
-Structured logs count `route_id`, `target_model`, `policy_id`, `reason`, `stream`, and `upstream_status`, while avoiding prompt and bearer-token logging.
+Structured logs count `route_id`, `target_model`, `policy_id`, `reason`, `request_id`, `request_id_source`, `stream`, and `upstream_status`, while avoiding prompts, completions, token usage, and bearer tokens. `request_id` is only for cross-layer correlation and may come from headers, `metadata.semantic_router_request_id`, the `user` field, or IntentMux itself.
## Decision Preview