diff --git a/README.en.md b/README.en.md index fad42c5..08a4668 100644 --- a/README.en.md +++ b/README.en.md @@ -4,10 +4,16 @@ > Select a `route_id` from request intent, then resolve it to your local LiteLLM model group.

- status: local validation - entry: semantic-router - gateway: LiteLLM compatible - logs: no prompts or tokens + runtime Python 3.11+ + entry semantic-router + gateway LiteLLM compatible + logs no prompt or token +

+

+ built with FastAPI + config YAML + tests pytest + package uv

[中文](README.md) @@ -37,7 +43,9 @@ model=semantic-router -> route_id -> target_model -> LiteLLM model group All other model names pass through. -The default sample config uses product-level route ids such as `fast`, `strong`, and `experimental`, mapped to local LiteLLM model groups such as `cheap-router`, `pro-router`, and `free-probe-router`. These `target_model` values are deployment names, not product API names. +The default sample config uses product-level route ids such as `fast`, `strong`, and `experimental`, mapped to LiteLLM model groups such as `cheap-router`, `pro-router`, and `free-probe-router`. These `target_model` values are deployment names, not product API names. + +Deploy IntentMux as a sidecar next to LiteLLM and keep provider secrets, tokens, `.env` files, and mounted LiteLLM data outside this repository. ## Quick Start @@ -65,7 +73,7 @@ client -> LiteLLM :4000, model=semantic-router -> LiteLLM model group ``` -`semantic-router` is the compatibility entry name. It does not have to match the product name. LiteLLM's native `smart-router` should remain separate. +Configure `semantic-router` in LiteLLM as a model entry that points to the IntentMux sidecar. Requests that use that model name are routed by intent; other model names pass through unchanged. ## Verification @@ -115,6 +123,15 @@ curl http://127.0.0.1:4001/v1/semantic-router/decision \ This returns the selected `route_id`, resolved `target_model`, `policy_id`, reason, rewrite flag, and scores without forwarding to LiteLLM. -## Status +## Runtime Behavior + +Run IntentMux as a sidecar in the same deployment boundary as LiteLLM. + +- Docker health uses `/health` to avoid readiness flapping restart loops. +- `/ready` checks router, LiteLLM, and embedding availability. +- When embeddings are unavailable, chat requests fail open to `fallback_route_id` and log `reason=embedding_error`. +- LiteLLM/upstream `5xx` responses or connection errors fail closed as redacted `502` responses and log `route_error`. + +## Current Capabilities -IntentMux is built for a real local deployment and already includes routing, preflight, LiteLLM-entry E2E, structured logs, and route-error budget gates. It is still in production validation and documentation polish; public-release packaging, license polish, local-path cleanup, and release metadata should be handled after the operational baseline is stable. +IntentMux includes basic routing, preflight checks, LiteLLM-entry E2E, structured logs, and route-error budget gates for lightweight intent-routing validation in local or private gateway deployments. diff --git a/README.md b/README.md index f484e80..c4225f8 100644 --- a/README.md +++ b/README.md @@ -4,10 +4,16 @@ > 按请求意图选择 `route_id`,再映射到你的本地 LiteLLM 模型组。

- status: local validation - entry: semantic-router - gateway: LiteLLM compatible - logs: no prompts or tokens + runtime Python 3.11+ + entry semantic-router + gateway LiteLLM compatible + logs no prompt or token +

+

+ built with FastAPI + config YAML + tests pytest + package uv

[English](README.en.md) @@ -37,9 +43,9 @@ model=semantic-router -> route_id -> target_model -> LiteLLM model group 其他模型名默认透传给 LiteLLM。 -当前示例配置使用 `fast`、`strong`、`experimental` 三个产品级 route id,并映射到本机 LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些 `target_model` 是部署名,不是产品接口。 +默认示例配置使用 `fast`、`strong`、`experimental` 三个产品级 route id,并映射到 LiteLLM 模型组 `cheap-router`、`pro-router`、`free-probe-router`。这些 `target_model` 是部署名,不是产品接口。 -本仓库和 `/home/raystorm/gateway/litellm` 保持边界清晰。不要把 LiteLLM 挂载目录、token、`.env` 或 provider 凭据加入本仓库。 +部署时建议把 IntentMux 作为 LiteLLM 旁路 sidecar 独立管理;不要把 LiteLLM 挂载目录、token、`.env` 或 provider 凭据加入本仓库。 ## 适合什么场景 @@ -87,9 +93,7 @@ client -> LiteLLM :4000, model=semantic-router -> LiteLLM model group ``` -`semantic-router` 是兼容入口名,不等于项目品牌名。项目叫 IntentMux;入口名保留 `semantic-router`,是为了降低现有部署迁移成本。 - -LiteLLM 原生 `smart-router` 应保持独立:它仍表示 LiteLLM 的 complexity router;IntentMux 的 `semantic-router` 表示本项目的意图分流入口。 +在 LiteLLM 中把 `semantic-router` 配置为指向 IntentMux sidecar 的模型入口后,客户端即可通过这个模型名触发意图分流。未命中该入口的模型名会保持透传。 ## 配置模型 @@ -220,19 +224,15 @@ uv run python scripts/import_review_samples.py \ 每条 JSONL 必须设置 `redacted: true`,并用 route id 作为 `expect`。 -## 生命周期 +## 运行行为 推荐把 IntentMux 作为 LiteLLM compose project 里的并列 sidecar,而不是塞进 LiteLLM 挂载目录或服务内部。 -当前行为: - - Docker health 使用 `/health`,避免 readiness 抖动触发重启循环。 - `/ready` 检查 router、LiteLLM、embedding 三层。 - embedding 不可用时,聊天请求 fail-open 到 `fallback_route_id`,并记录 `reason=embedding_error`。 - LiteLLM/upstream `5xx` 或连接异常 fail-closed 为脱敏 `502`,并记录 `route_error`。 -未来是否把 sidecar 生命周期更强地绑定到 LiteLLM 本体服务,是单独的设计项,不在当前运行时里隐式实现。 - -## 项目状态 +## 当前能力 -IntentMux 当前服务真实本地需求,已具备基本路由、preflight、E2E、结构化日志和 error-budget gate。仓库仍处于生产验证和文档打磨阶段,许可证、public-release 文档、本地路径统一和发布包装会在稳定后再处理。 +IntentMux 已具备基本路由、preflight、LiteLLM-entry E2E、结构化日志和 error-budget gate,适合在本地或私有网关环境中做轻量意图分流验证。