From f0766b16be2a473e4087d896dae52a93d4ae6dd3 Mon Sep 17 00:00:00 2001 From: raystorm <2557058999@qq.com> Date: Tue, 5 May 2026 23:11:26 +0800 Subject: [PATCH 1/4] docs: add README cards and tighten project overview --- README.md | 93 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 49 insertions(+), 44 deletions(-) diff --git a/README.md b/README.md index e2ab404..7f8d4ff 100644 --- a/README.md +++ b/README.md @@ -3,28 +3,43 @@ > 轻量、可审计的 LiteLLM 意图分流 sidecar。
> 按请求意图选择 `route_id`,再映射到你的本地 LiteLLM 模型组。 +

+ status: local validation + entry: semantic-router + gateway: LiteLLM compatible + logs: no prompts +

+ [English](README.en.md) -| 项目 | 内容 | -| --- | --- | -| 用途 | 在 LiteLLM / OpenAI-compatible gateway 前做轻量意图分流 | -| 接入面 | 客户端保持打 LiteLLM,只把模型名切到 `semantic-router` | -| 路由模型 | `semantic-router` 是兼容入口名;产品名是 IntentMux | -| 决策输出 | `route_id -> target_model`,例如 `strong -> pro-router` | -| 可审计性 | 结构化 `route_complete` / `route_error` 日志,不记录 prompt 或 bearer token | -| 运行状态 | 本地生产验证中;暂不按 public-release 项目发布 | +## 一句话 + +IntentMux 是一个本地优先的 OpenAI/LiteLLM-compatible 路由 sidecar:客户端仍然请求原来的 LiteLLM 入口,只把模型名切到 `semantic-router`,IntentMux 根据请求意图选择 `route_id`,再映射到实际部署中的 `target_model`。 + + + + + + + + + + +
意图分流
从请求内容判断 `fast` / `strong` / `experimental` 等 route id。
低侵入接入
保留 LiteLLM 作为 provider、fallback、限流和鉴权层。
可审计日志
结构化记录 `route_complete` / `route_error`,不记录 prompt、token 或 bearer token。
生产前验证
提供 preflight、LiteLLM-entry E2E、日志 summary 和 route-error budget gate。
+ +## 项目边界 -IntentMux 不是模型提供商,也不是 LiteLLM 的替代品。它是一个本地优先的 -routing sidecar:只改写进入 sidecar 的请求 `model` 字段,把 -`model=semantic-router` 路由到配置里的 `route_id`,再解析到实际部署中的 -`target_model`。其他模型名默认透传给 LiteLLM。 +IntentMux 不是模型提供商,也不是 LiteLLM 的替代品。它只处理进入 sidecar 的兼容入口模型: -当前示例配置使用 `fast`、`strong`、`experimental` 三个产品级 route id,并映射到本机 -LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些 -`target_model` 是部署名,不是产品接口。 +```text +model=semantic-router -> route_id -> target_model -> LiteLLM model group +``` + +其他模型名默认透传给 LiteLLM。 -本仓库和 `/home/raystorm/gateway/litellm` 保持边界清晰。不要把 LiteLLM -挂载目录、token、`.env` 或 provider 凭据加入本仓库。 +当前示例配置使用 `fast`、`strong`、`experimental` 三个产品级 route id,并映射到本机 LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些 `target_model` 是部署名,不是产品接口。 + +本仓库和 `/home/raystorm/gateway/litellm` 保持边界清晰。不要把 LiteLLM 挂载目录、token、`.env` 或 provider 凭据加入本仓库。 ## 适合什么场景 @@ -33,8 +48,7 @@ LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些 - 你希望路由决策可回放、可审计、可用日志继续改进。 - 你不想引入一个大型调度平台,也不想让客户端大改端点。 -IntentMux 的差异化不是“再造一个复杂 router”,而是轻量、本地、快速部署、日志可读。 -成熟的 provider 路由、fallback、限流和鉴权仍交给 LiteLLM。 +IntentMux 的差异化不是“再造一个复杂 router”,而是轻量、本地、快速部署、日志可读。成熟的 provider 路由、fallback、限流和鉴权仍交给 LiteLLM。 ## 快速运行 @@ -44,11 +58,13 @@ uv run python -m router.app 默认端点: -- IntentMux sidecar: `http://127.0.0.1:4001` -- LiteLLM upstream: `http://127.0.0.1:4000` -- Embedding upstream: `http://127.0.0.1:1234/v1/embeddings` +| 服务 | 地址 | +| --- | --- | +| IntentMux sidecar | `http://127.0.0.1:4001` | +| LiteLLM upstream | `http://127.0.0.1:4000` | +| Embedding upstream | `http://127.0.0.1:1234/v1/embeddings` | -环境变量: +常用环境变量: - `ROUTER_HOST` - `ROUTER_PORT` @@ -61,8 +77,7 @@ uv run python -m router.app ## LiteLLM 接入方式 -低侵入接入方式是:客户端继续请求 LiteLLM `:4000`,只把模型名切到 -`semantic-router`。 +低侵入接入方式是:客户端继续请求 LiteLLM `:4000`,只把模型名切到 `semantic-router`。 ```text client -> LiteLLM :4000, model=semantic-router @@ -72,11 +87,9 @@ client -> LiteLLM :4000, model=semantic-router -> LiteLLM model group ``` -`semantic-router` 是兼容入口名,不等于项目品牌名。项目叫 IntentMux;入口名保留 -`semantic-router`,是为了降低现有部署迁移成本。 +`semantic-router` 是兼容入口名,不等于项目品牌名。项目叫 IntentMux;入口名保留 `semantic-router`,是为了降低现有部署迁移成本。 -LiteLLM 原生 `smart-router` 应保持独立:它仍表示 LiteLLM 的 complexity router; -IntentMux 的 `semantic-router` 表示本项目的意图分流入口。 +LiteLLM 原生 `smart-router` 应保持独立:它仍表示 LiteLLM 的 complexity router;IntentMux 的 `semantic-router` 表示本项目的意图分流入口。 ## 配置模型 @@ -100,8 +113,7 @@ routes: - 这个线上 bug 为什么偶发 ``` -运行时校验会阻止递归配置:入口模型本身不能作为 route id 或 target model, -`fallback_route_id` 必须存在。 +运行时校验会阻止递归配置:入口模型本身不能作为 route id 或 target model,`fallback_route_id` 必须存在。 ## 验证 @@ -189,8 +201,7 @@ curl http://127.0.0.1:4001/v1/semantic-router/decision \ ## 语义资产 -运行时保持轻依赖。更大的 route bank 从 `config/route_sources.yaml` 声明的来源离线生成, -不把 Hugging Face 等构建依赖带进运行时。 +运行时保持轻依赖。更大的 route bank 从 `config/route_sources.yaml` 声明的来源离线生成,不把 Hugging Face 等构建依赖带进运行时。 ```bash uv sync --group assets @@ -211,23 +222,17 @@ uv run python scripts/import_review_samples.py \ ## 生命周期 -推荐把 IntentMux 作为 LiteLLM compose project 里的并列 sidecar,而不是塞进 -LiteLLM 挂载目录或服务内部。 +推荐把 IntentMux 作为 LiteLLM compose project 里的并列 sidecar,而不是塞进 LiteLLM 挂载目录或服务内部。 当前行为: - Docker health 使用 `/health`,避免 readiness 抖动触发重启循环。 - `/ready` 检查 router、LiteLLM、embedding 三层。 -- embedding 不可用时,聊天请求 fail-open 到 `fallback_route_id`,并记录 - `reason=embedding_error`。 -- LiteLLM/upstream `5xx` 或连接异常 fail-closed 为脱敏 `502`,并记录 - `route_error`。 +- embedding 不可用时,聊天请求 fail-open 到 `fallback_route_id`,并记录 `reason=embedding_error`。 +- LiteLLM/upstream `5xx` 或连接异常 fail-closed 为脱敏 `502`,并记录 `route_error`。 -未来是否把 sidecar 生命周期更强地绑定到 LiteLLM 本体服务,是单独的设计项,不在当前 -运行时里隐式实现。 +未来是否把 sidecar 生命周期更强地绑定到 LiteLLM 本体服务,是单独的设计项,不在当前运行时里隐式实现。 ## 项目状态 -IntentMux 当前服务真实本地需求,已具备基本路由、preflight、E2E、结构化日志和 -error-budget gate。仓库仍处于生产验证和文档打磨阶段,许可证、public-release 文档、 -本地路径统一和发布包装会在稳定后再处理。 +IntentMux 当前服务真实本地需求,已具备基本路由、preflight、E2E、结构化日志和 error-budget gate。仓库仍处于生产验证和文档打磨阶段,许可证、public-release 文档、本地路径统一和发布包装会在稳定后再处理。 From a8c9c3b6666bc43f1ac89122f502df300bd522c6 Mon Sep 17 00:00:00 2001 From: raystorm <2557058999@qq.com> Date: Tue, 5 May 2026 23:12:18 +0800 Subject: [PATCH 2/4] docs: sync English README overview with cards --- README.en.md | 76 +++++++++++++++++++++++++++++----------------------- 1 file changed, 43 insertions(+), 33 deletions(-) diff --git a/README.en.md b/README.en.md index f5e52f1..ee5a472 100644 --- a/README.en.md +++ b/README.en.md @@ -3,25 +3,41 @@ > Lightweight, auditable intent-routing sidecar for LiteLLM.
> Select a `route_id` from request intent, then resolve it to your local LiteLLM model group. +

+ status: local validation + entry: semantic-router + gateway: LiteLLM compatible + logs: no prompts +

+ [中文](README.md) -| Area | Value | -| --- | --- | -| Purpose | Lightweight intent routing in front of LiteLLM / OpenAI-compatible gateways | -| Integration | Keep clients on LiteLLM; opt in with `model=semantic-router` | -| Entry model | `semantic-router` is the compatibility entry name; IntentMux is the product name | -| Decision shape | `route_id -> target_model`, for example `strong -> pro-router` | -| Auditability | Structured `route_complete` / `route_error` logs without prompts or bearer tokens | -| Status | Local production validation; not packaged as a public release yet | - -IntentMux is not a model provider and does not replace LiteLLM. It is a -local-first routing sidecar that rewrites only selected request `model` fields: -`model=semantic-router` becomes a configured `route_id`, then resolves to a -deployment-specific `target_model`. All other model names pass through. - -The default sample config uses product-level route ids such as `fast`, `strong`, -and `experimental`, mapped to local LiteLLM model groups such as `cheap-router`, -`pro-router`, and `free-probe-router`. +## One Line + +IntentMux is a local-first OpenAI/LiteLLM-compatible routing sidecar. Clients keep using the existing LiteLLM endpoint and opt in with `model=semantic-router`; IntentMux selects a `route_id` from request intent, then resolves that route to the deployment-specific `target_model`. + + + + + + + + + + +
Intent routing
Select product-level routes such as `fast`, `strong`, and `experimental` from request content.
Low-intrusion integration
Keep LiteLLM responsible for providers, fallback, rate limits, and authentication.
Auditable logs
Record structured `route_complete` / `route_error` events without prompts, tokens, or bearer tokens.
Operational gates
Ship with preflight, LiteLLM-entry E2E, log summaries, and route-error budget checks.
+ +## Project Boundary + +IntentMux is not a model provider and does not replace LiteLLM. It only handles the configured compatibility entry model: + +```text +model=semantic-router -> route_id -> target_model -> LiteLLM model group +``` + +All other model names pass through. + +The default sample config uses product-level route ids such as `fast`, `strong`, and `experimental`, mapped to local LiteLLM model groups such as `cheap-router`, `pro-router`, and `free-probe-router`. These `target_model` values are deployment names, not product API names. ## Quick Start @@ -31,14 +47,15 @@ uv run python -m router.app Default endpoints: -- IntentMux sidecar: `http://127.0.0.1:4001` -- LiteLLM upstream: `http://127.0.0.1:4000` -- Embedding upstream: `http://127.0.0.1:1234/v1/embeddings` +| Service | URL | +| --- | --- | +| IntentMux sidecar | `http://127.0.0.1:4001` | +| LiteLLM upstream | `http://127.0.0.1:4000` | +| Embedding upstream | `http://127.0.0.1:1234/v1/embeddings` | ## LiteLLM Entry -The low-intrusion path is to keep clients on LiteLLM `:4000` and change only the -model name to `semantic-router`. +The low-intrusion path is to keep clients on LiteLLM `:4000` and change only the model name to `semantic-router`. ```text client -> LiteLLM :4000, model=semantic-router @@ -48,8 +65,7 @@ client -> LiteLLM :4000, model=semantic-router -> LiteLLM model group ``` -`semantic-router` is the compatibility entry name. It does not have to match the -product name. LiteLLM's native `smart-router` should remain separate. +`semantic-router` is the compatibility entry name. It does not have to match the product name. LiteLLM's native `smart-router` should remain separate. ## Verification @@ -87,8 +103,7 @@ docker logs --since 12h gateway_semantic_router 2>&1 \ --max-upstream-status-rate 400=0 ``` -Structured logs count `route_id`, `target_model`, `policy_id`, `reason`, -`stream`, and `upstream_status`, while avoiding prompt and bearer-token logging. +Structured logs count `route_id`, `target_model`, `policy_id`, `reason`, `stream`, and `upstream_status`, while avoiding prompt and bearer-token logging. ## Decision Preview @@ -98,13 +113,8 @@ curl http://127.0.0.1:4001/v1/semantic-router/decision \ -d '{"model":"semantic-router","messages":[{"role":"user","content":"Why is this production bug intermittent?"}]}' ``` -This returns the selected `route_id`, resolved `target_model`, `policy_id`, -reason, rewrite flag, and scores without forwarding to LiteLLM. +This returns the selected `route_id`, resolved `target_model`, `policy_id`, reason, rewrite flag, and scores without forwarding to LiteLLM. ## Status -IntentMux is built for a real local deployment and already includes routing, -preflight, LiteLLM-entry E2E, structured logs, and route-error budget gates. It -is still in production validation and documentation polish; public-release -packaging, license polish, local-path cleanup, and release metadata should be -handled after the operational baseline is stable. +IntentMux is built for a real local deployment and already includes routing, preflight, LiteLLM-entry E2E, structured logs, and route-error budget gates. It is still in production validation and documentation polish; public-release packaging, license polish, local-path cleanup, and release metadata should be handled after the operational baseline is stable. From 2a51feabd4232ea3888b98aaf3660071606dee6d Mon Sep 17 00:00:00 2001 From: raystorm <2557058999@qq.com> Date: Wed, 6 May 2026 12:35:58 +0800 Subject: [PATCH 3/4] docs: tighten README compatibility and log wording --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 7f8d4ff..f484e80 100644 --- a/README.md +++ b/README.md @@ -7,14 +7,14 @@ status: local validation entry: semantic-router gateway: LiteLLM compatible - logs: no prompts + logs: no prompts or tokens

[English](README.en.md) ## 一句话 -IntentMux 是一个本地优先的 OpenAI/LiteLLM-compatible 路由 sidecar:客户端仍然请求原来的 LiteLLM 入口,只把模型名切到 `semantic-router`,IntentMux 根据请求意图选择 `route_id`,再映射到实际部署中的 `target_model`。 +IntentMux 是一个本地优先的 OpenAI-compatible / LiteLLM-compatible 路由 sidecar:客户端仍然请求原来的 LiteLLM 入口,只把模型名切到 `semantic-router`,IntentMux 根据请求意图选择 `route_id`,再映射到实际部署中的 `target_model`。 @@ -157,7 +157,7 @@ IntentMux 只统计结构化 JSON 路由日志: - `stream` - `upstream_status` -不会记录 prompt 或 bearer token。 +不会记录 prompt、completion、token usage 或 bearer token。`request_id` 只用于跨层关联,可能来自请求头、`metadata.semantic_router_request_id`、`user` 字段,或由 IntentMux 生成。 12 小时窗口 summary: From 082ff2b358fe6a8b3db81e28da33f2608c692ecf Mon Sep 17 00:00:00 2001 From: raystorm <2557058999@qq.com> Date: Wed, 6 May 2026 12:37:11 +0800 Subject: [PATCH 4/4] docs: tighten English README compatibility and log wording --- README.en.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.en.md b/README.en.md index ee5a472..fad42c5 100644 --- a/README.en.md +++ b/README.en.md @@ -7,14 +7,14 @@ status: local validationentry: semantic-routergateway: LiteLLM compatible - logs: no prompts + logs: no prompts or tokens

[中文](README.md) ## One Line -IntentMux is a local-first OpenAI/LiteLLM-compatible routing sidecar. Clients keep using the existing LiteLLM endpoint and opt in with `model=semantic-router`; IntentMux selects a `route_id` from request intent, then resolves that route to the deployment-specific `target_model`. +IntentMux is a local-first OpenAI-compatible / LiteLLM-compatible routing sidecar. Clients keep using the existing LiteLLM endpoint and opt in with `model=semantic-router`; IntentMux selects a `route_id` from request intent, then resolves that route to the deployment-specific `target_model`.
@@ -103,7 +103,7 @@ docker logs --since 12h gateway_semantic_router 2>&1 \ --max-upstream-status-rate 400=0 ``` -Structured logs count `route_id`, `target_model`, `policy_id`, `reason`, `stream`, and `upstream_status`, while avoiding prompt and bearer-token logging. +Structured logs count `route_id`, `target_model`, `policy_id`, `reason`, `request_id`, `request_id_source`, `stream`, and `upstream_status`, while avoiding prompts, completions, token usage, and bearer tokens. `request_id` is only for cross-layer correlation and may come from headers, `metadata.semantic_router_request_id`, the `user` field, or IntentMux itself. ## Decision Preview