From f4e98fe3ab4b35e60d115aa2111ef1204531c6d7 Mon Sep 17 00:00:00 2001 From: raystorm <2557058999@qq.com> Date: Tue, 5 May 2026 21:16:24 +0800 Subject: [PATCH 1/2] docs: refresh project identity and README --- README.md | 714 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 535 insertions(+), 179 deletions(-) diff --git a/README.md b/README.md index 801af0a..9ac00bd 100644 --- a/README.md +++ b/README.md @@ -1,190 +1,481 @@ -# Gateway Semantic Router +# Cynosure Router + +> 面向 LLM Gateway 的意图分流控制面。 +> Intent-aware routing sidecar for LiteLLM / OpenAI-compatible gateways. + +Cynosure Router 是一个轻量、本地优先、可审计的 LLM 路由 sidecar。它不替代 LiteLLM,也不重新发明模型网关;它只负责在请求进入模型执行层之前,根据用户意图选择合适的模型通道,并把语义入口模型改写为部署环境里的真实目标模型。 + +当前项目主要面向中文-heavy 的个人 / 小团队 agent 流量,例如代码审查、debug、架构分析、线上故障判断、模型探活、低风险问答和混合自动化工作流。 + +--- + +## 为什么需要它 + +很多通用 LLM router 更关注英文 benchmark、强弱模型成本优化,或者直接把路由、评估、服务、执行层打包成一套重系统。 + +但在本地 LiteLLM 网关场景里,真正的问题通常更具体: + +- 中文技术请求经常被低估复杂度; +- 简单闲聊、翻译、格式转换不应该消耗强模型额度; +- 代码审查、线上故障、架构权衡、权限安全类问题必须进强模型; +- 免费端点、实验模型、探活请求应该进入隔离通道; +- 路由决策必须能解释、能回放、能统计,而不是黑盒; +- LiteLLM 仍然应该保留 provider order、fallback、cooldown、key 管理和真实执行层职责。 + +Cynosure Router 的定位就是:**在 LiteLLM 前面增加一层可控、可观测、中文友好的意图分流层。** + +--- + +## 核心定位 + +```text +Client / Agent / IDE / Automation + │ + │ OpenAI-compatible request + ▼ +Cynosure Router + - 读取 latest user message + - 支持显式 route metadata + - 支持中文 hard rules + - 使用 embedding 做语义匹配 + - 低置信度安全回退 + - 改写 model 字段 + - 记录结构化路由日志 + │ + │ rewritten model + ▼ +LiteLLM Gateway + - provider order + - fallback + - cooldown + - auth / key management + - actual model execution + │ + ▼ +Model Providers +``` + +Cynosure Router 只做路由决策和 `model` rewrite。真实模型调用、密钥、provider fallback 和供应商编排仍然交给 LiteLLM。 + +--- + +## 当前能力 + +### OpenAI-compatible Chat Proxy + +支持: + +- `POST /v1/chat/completions` +- 非流式响应 +- `stream=true` SSE 流式响应 +- 仅改写请求中的 `model` 字段 +- 保留上游 LiteLLM 响应体 +- 注入路由观测 headers + +示例 headers: + +```text +x-router-request-id +x-router-target-model +x-router-reason +``` + +### 语义入口模型 + +客户端请求一个语义入口模型,例如: + +```json +{ + "model": "semantic-router", + "messages": [ + { + "role": "user", + "content": "帮我审一下这个 PR 有没有竞态问题" + } + ] +} +``` + +Cynosure Router 会把它改写成真实目标模型,例如: + +```text +pro-router +``` + +其他非入口模型会原样透传,不进入语义路由。LiteLLM 原生的 `smart-router` 也被刻意保留为单独的上游模型组,避免概念混淆。 + +### Route 抽象 + +默认示例 route: -Lightweight, local-first OpenAI/LiteLLM-compatible routing sidecar for -`/v1/chat/completions`. +| route_id | 目标模型示例 | 用途 | +|---|---|---| +| `fast` | `cheap-router` | 普通问答、解释、翻译、轻量总结 | +| `strong` | `pro-router` | 代码、debug、架构、多步推理、高风险判断 | +| `experimental` | `free-probe-router` | 免费端点探活、实验模型试探、低价值样例比较 | -It rewrites the configured semantic entry model, currently -`model=semantic-router`, by selecting a configured `route_id` and resolving that -route to a deployment-specific `target_model`. +这些目标模型只是当前 LiteLLM 部署里的示例名字。真正的目标模型由 `config/routes.yaml` 映射决定。 -The checked-in sample config uses route ids such as `fast`, `strong`, and -`experimental`, mapped to local example LiteLLM targets such as `cheap-router`, -`pro-router`, and `free-probe-router`. Those target names are examples from this -machine's LiteLLM setup, not product-level route names. +### 决策优先级 -Runtime config validation enforces that the semantic entry model itself cannot -appear as a route target and that the fallback route exists, which prevents -recursive forwarding back to `semantic-router`. +一次 routed 请求的决策顺序: -All other model names pass through unchanged. LiteLLM's native `smart-router` -is intentionally kept as a separate upstream model group. +1. 非入口模型:直接 passthrough; +2. `metadata.route` / `metadata.target_route` 显式指定 route; +3. 中文 hard rules 命中高风险关键词; +4. embedding 语义匹配; +5. 低置信度或 embedding 异常时回退到 `fallback_route_id`。 -Both non-streaming and `stream=true` SSE chat completions are proxied. The -sidecar rewrites only the request model field, then preserves the upstream -LiteLLM response body and routing headers. +这使路由行为既能自动判断,也能被上层 agent / workflow 显式控制。 -This repository is intentionally separate from `/home/raystorm/gateway/litellm`. -Do not add LiteLLM mount files, tokens, or `.env` material here. +### 安全回退 -The project is not public-release ready yet. Public repository visibility, -license polish, and release documentation are deferred until the configurable -route abstraction, observability contract, and redacted eval workflow have been -audited together. +Embedding 故障被视为可降级问题: -## Local Run +- `/ready` 会报告 embedding degraded; +- routed chat 请求不会直接失败; +- 请求会 fallback 到配置里的 `fallback_route_id`; +- 路由日志中记录 `reason=embedding_error`。 + +LiteLLM 或上游模型失败则不同:上游异常会被包装成受控的 `502`,并记录为 `route_error`。 + +--- + +## 本地运行 + +安装依赖: + +```bash +uv sync +``` + +启动 router: ```bash uv run python -m router.app ``` -## Container Lifecycle +默认端口: + +```text +Router: http://127.0.0.1:4001 +LiteLLM: http://127.0.0.1:4000 +Embedding: http://127.0.0.1:1234/v1/embeddings +``` + +--- + +## 容器运行 -The router is packaged with `Dockerfile` and is intended to run as a sibling -service in the LiteLLM compose project, not as an ad-hoc local process. +Router 带有 `Dockerfile`,建议作为 LiteLLM compose 项目的 sibling service 运行,而不是临时本地进程。 -It remains a third-party sidecar. Future lifecycle coupling may bind it more -closely to the LiteLLM service readiness/restart lifecycle, but that coupling is -still a design item rather than current behavior. See `docs/roadmap.md`. +推荐形态: -The compose service should use: +```text +LiteLLM :4000 +Cynosure Router :4001 +LM Studio Embedding :1234 +``` + +Compose service 通常需要: - build context: `/home/raystorm/gateway/gateway-semantic-router` - upstream LiteLLM URL: `http://litellm:4000` -- embedding URL from container to host LM Studio: - `http://host.docker.internal:1234/v1/embeddings` +- embedding URL from container to host LM Studio: `http://host.docker.internal:1234/v1/embeddings` - exposed router port: `4001` -- optional generated semantic asset mount: - `/home/raystorm/gateway/gateway-semantic-router/data/semantic_sets:/app/data/semantic_sets:ro` +- optional generated semantic asset mount: `/home/raystorm/gateway/gateway-semantic-router/data/semantic_sets:/app/data/semantic_sets:ro` + +当前仍然是第三方 sidecar。未来可以更紧密地绑定 LiteLLM service readiness / restart lifecycle,但这是 roadmap 项,不是当前行为。 + +--- + +## 配置 + +主配置文件: + +```text +config/routes.yaml +``` + +关键配置: + +```yaml +route_model: semantic-router +fallback_route_id: fast +threshold: 0.55 +margin: 0.04 + +embedding_url: http://127.0.0.1:1234/v1/embeddings +embedding_model: text-embedding-jina-embeddings-v5-text-small-retrieval@q8_0 + +litellm_base_url: http://127.0.0.1:4000 +listen_host: 127.0.0.1 +listen_port: 4001 +``` + +环境变量覆盖: + +```text +ROUTER_HOST +ROUTER_PORT +ROUTER_LITELLM_BASE_URL +ROUTER_LITELLM_TIMEOUT +ROUTER_EMBEDDING_URL +ROUTER_EMBEDDING_MODEL +ROUTER_ACCESS_LOG +ROUTER_READINESS_TIMEOUT +``` + +`ROUTER_ACCESS_LOG` 默认为 `false`。只有确实需要原始 HTTP access log 时才建议打开。 + +--- + +## 与 LiteLLM 的关系 + +Cynosure Router 是 LiteLLM 的旁路控制面,不是 LiteLLM fork。 + +两种接入方式: + +### 方式一:客户端直接打 Router + +客户端 base URL 指向: + +```text +http://127.0.0.1:4001 +``` + +请求: + +```text +model=semantic-router +``` + +### 方式二:作为 LiteLLM model entry + +低侵入生产方向是保留客户端 base URL 为 LiteLLM: + +```text +http://127.0.0.1:4000 +``` + +然后在 LiteLLM 中暴露一个模型入口,让 `model=semantic-router` 进入 sidecar。这样客户端只需要改 model,不需要改 base URL。 + +LiteLLM 的原生 `smart-router` 应保持独立: + +- `smart-router`:LiteLLM 内置 complexity router; +- `semantic-router`:Cynosure Router 的语义任务路由入口。 + +当前证明和验收标准见: + +```text +docs/superpowers/specs/2026-05-03-litellm-semantic-router-entry-design.md +``` + +--- + +## 健康检查 + +本地 liveness: + +```bash +curl http://127.0.0.1:4001/health +``` + +分层 readiness: + +```bash +curl http://127.0.0.1:4001/ready +``` + +`/ready` 会分别检查: -Default endpoints: +- router +- LiteLLM upstream +- embedding upstream -- Router: `http://127.0.0.1:4001` -- LiteLLM upstream: `http://127.0.0.1:4000` -- Embedding upstream: `http://127.0.0.1:1234/v1/embeddings` +Docker health check 建议使用 `/health`,避免 embedding 或 LiteLLM 短暂 degraded 导致容器反复重启。`/ready` 更适合人工检查、部署门禁和运行状态观测。 -Environment overrides: +--- -- `ROUTER_HOST` -- `ROUTER_PORT` -- `ROUTER_LITELLM_BASE_URL` -- `ROUTER_LITELLM_TIMEOUT` -- `ROUTER_EMBEDDING_URL` -- `ROUTER_EMBEDDING_MODEL` -- `ROUTER_ACCESS_LOG` (`false` by default; set `true` only when raw HTTP - access logs are needed) -- `ROUTER_READINESS_TIMEOUT` +## 决策预览 -## LiteLLM Entry Design +只查看路由决策,不转发到 LiteLLM: -The low-intrusion production direction is to keep upstream clients on the -LiteLLM base URL and expose the sidecar as a LiteLLM model entry named -`semantic-router`. In that shape, clients keep `http://127.0.0.1:4000` and opt -in by changing only the model name. +```bash +curl http://127.0.0.1:4001/v1/semantic-router/decision \ + -H "Content-Type: application/json" \ + -d '{"model":"semantic-router","messages":[{"role":"user","content":"这个线上 bug 为什么偶发?"}]}' +``` + +返回内容包括: + +- `source_model` +- `route_id` +- `target_model` +- `policy_id` +- `reason` +- `rewrite` +- `score` +- `second_score` + +这个 endpoint 适合做 route 质量审查、灰度前验证、agent workflow 调试、eval case 复核,以及不消耗模型调用的 dry-run。 + +--- + +## 可观测性 + +每个 routed 请求都会写入结构化日志,例如: + +```json +{ + "event": "route_complete", + "request_id": "...", + "request_id_source": "x-request-id", + "source_model": "semantic-router", + "route_id": "strong", + "target_model": "pro-router", + "policy_id": "embedding", + "reason": "embedding", + "rewrite": true, + "stream": false, + "upstream_status": 200, + "score": 0.812341, + "second_score": 0.421133, + "duration_ms": 123.45 +} +``` + +日志不会记录 prompt 或 bearer token。 + +Sidecar 会接受以下 request identity sources: + +- `x-request-id` +- `x-correlation-id` +- W3C `traceparent` +- `metadata.semantic_router_request_id` +- `user` -LiteLLM's native `smart-router` should remain separate. It continues to mean -LiteLLM's built-in complexity router, while `semantic-router` means this -sidecar's semantic task router. +最终 request id 会注入到上游 `x-request-id` header,方便 sidecar 到 LiteLLM 的跨层关联。 -Current proof and acceptance criteria are documented in -`docs/superpowers/specs/2026-05-03-litellm-semantic-router-entry-design.md`. +--- -## Verification +## 验证 + +基础测试: ```bash uv run python -m pytest -q uv run python scripts/eval_routes.py --mock-embeddings ``` -## CI - -GitHub Actions PR CI runs only the baseline automated checks: +CI 当前只运行基线自动化检查: - `uv run python -m pytest -q` - `uv run python scripts/eval_routes.py --mock-embeddings` -This CI is intentionally minimal and does not claim full production validation. -Live preflight, LiteLLM-entry E2E, Docker log summary/review, and route-error -budget checks remain operator/local-production checks. +这套 CI 只证明基础行为没有回归,不宣称完整生产验证。Live preflight、LiteLLM-entry E2E、Docker log summary/review 和 route-error budget 仍然是 operator / local-production 检查。 + +--- + +## Production Preflight + +对运行中的 router 做 preflight: + +```bash +uv run python scripts/preflight.py \ + --router-base-url http://127.0.0.1:4001 +``` + +需要设置: + +```bash +export LITELLM_MASTER_KEY=... +``` -Production preflight against a running router: +也可以显式传入: ```bash -uv run python scripts/preflight.py --router-base-url http://127.0.0.1:4001 +uv run python scripts/preflight.py \ + --router-base-url http://127.0.0.1:4001 \ + --api-key "$LITELLM_MASTER_KEY" +``` + +Preflight 会检查: + +- `/health` +- `/ready` +- 非流式 chat route +- 流式 SSE route +- 路由 headers +- 基础响应形状 + +当 readiness degraded 时,脚本会打印 degraded component detail,例如: + +```text +ready=False degraded=embedding:ConnectError ``` -The preflight requires `LITELLM_MASTER_KEY` in the environment or `--api-key`. -It checks liveness, layered readiness, non-streaming route headers, streaming -route headers, and SSE body shape without printing secrets or prompts. When -readiness is degraded, it prints the degraded component detail, for example -`ready=False degraded=embedding:ConnectError`. Readiness is retried briefly by -default; use `--ready-attempts` and `--ready-interval` to tune that gate. +--- + +## LiteLLM Entry E2E + +通过 LiteLLM 入口验证 `model=semantic-router`: + +```bash +uv run python scripts/e2e_litellm_entry.py \ + --litellm-base-url http://127.0.0.1:4000 +``` -Runtime probes: +E2E 会验证: -- `/health` is a local liveness check for container health. -- `/ready` is a layered readiness check. It reports `router`, `litellm`, and - `embedding` components separately and returns `503` when any layer is - degraded. Docker health intentionally still uses `/health` so readiness can be - observed without causing restart loops. +- 非流式响应; +- 流式响应; +- sidecar route logs; +- 示例 `fast` / `strong` / `experimental` route; +- route id 与 configured target model 是否一致。 -Embedding degraded mode is intentionally fail-open for routed chat requests: -when the embedding component is unavailable, `/ready` returns `503`, but -`model=semantic-router` requests fall back to `fallback_route_id` with -`reason=embedding_error`. LiteLLM/upstream proxy failures are different: they -fail closed as redacted `502` responses and are logged as `route_error`. +LiteLLM model-entry 路径目前不一定保留 client-supplied correlation id 到 sidecar,所以脚本会先尝试 request-id matching,再 fallback 到 recent route shape matching。 -Production E2E through the LiteLLM entrypoint: +如果验证路径预期必须端到端保留 request id,可使用: ```bash -uv run python scripts/e2e_litellm_entry.py --litellm-base-url http://127.0.0.1:4000 -``` - -The E2E checks `model=semantic-router` through LiteLLM `:4000`, verifies -non-streaming and streaming responses, and confirms sidecar route logs for the -sample `fast`, `strong`, and `experimental` route ids plus their configured -target models. LiteLLM's model-entry -path does not currently preserve client-supplied correlation IDs to the sidecar, -so the script first tries request-id matching and then falls back to recent route -shape matching. The script prints `RUN` lines before each probe so slow upstream -requests can be localized; use `--timeout` to tune HTTP timeouts or -`--quiet-progress` to suppress progress lines. Use -`--require-request-id-log-match` when validating a deployment path that is -expected to preserve request IDs end to end; failed probes are not allowed to -pass route-log checks by matching old route-shape logs. - -Within the sidecar, route logs include `route_id`, `target_model`, `policy_id`, -`request_id`, and `request_id_source`. The sidecar accepts `x-request-id`, -`x-correlation-id`, W3C `traceparent`, `metadata.semantic_router_request_id`, -and `user` as request identity sources, then injects the final value into the -upstream `x-request-id` header. This makes sidecar-to-upstream correlation -stable even when LiteLLM's model-entry layer does not preserve the original -client id. - -Production route-log summary from sidecar logs: +uv run python scripts/e2e_litellm_entry.py \ + --litellm-base-url http://127.0.0.1:4000 \ + --require-request-id-log-match +``` + +--- + +## Route Log Summary + +从 sidecar 日志生成路由统计: ```bash docker logs --since 12h gateway_semantic_router 2>&1 \ | uv run python scripts/router_log_summary.py ``` -The summary parser ignores uvicorn access lines and only counts structured -`route_complete` / `route_error` JSON records. Upstream route exceptions and -HTTP `5xx` statuses are returned as `502` with a redacted JSON error body and -are logged as `route_error`; HTTP status failures include `upstream_status` in -the structured log and `upstream_statuses` in the summary. Route ids, deployment -targets, and route reasons are counted separately, so degraded embedding -fallback shows up as -`reasons: embedding_error=N`. Prompts and bearer tokens are not logged. -When malformed JSON, missing-event JSON records, or unknown-event JSON records -are present after the first `{` in a log line, the summary adds an -`ignored_records` line so operators can distinguish parser/log-shape drift from -real routed traffic. Plain access-log lines without JSON objects are still -ignored silently. Non-200 upstream statuses are also grouped by status, target, -reason, and stream mode under `upstream_non_200` so an operator can quickly see -deployment patterns such as `status=400 target=cheap-router -reason=low_confidence`. - -Production route-error budget gate: +摘要会统计: + +- routed 请求总数; +- completed / error; +- stream / non-stream; +- route_id 分布; +- target_model 分布; +- reason 分布; +- error_type 分布; +- upstream_status 分布; +- 非 200 上游状态; +- 最大耗时; +- 被忽略的异常日志记录。 + +解析器会忽略 uvicorn access lines,只统计结构化 `route_complete` / `route_error` JSON 记录。Prompts 和 bearer tokens 不会进入日志。 + +--- + +## Route Error Budget + +上线前或灰度后检查路由错误预算: ```bash docker logs --since 12h gateway_semantic_router 2>&1 \ @@ -196,31 +487,19 @@ docker logs --since 12h gateway_semantic_router 2>&1 \ --max-upstream-status-rate 400=0 ``` -The budget gate prints a stable PASS/FAIL report and exits non-zero when the -selected log window has too few route events or exceeds the total/per-target -`route_error` thresholds. Optional `--max-reason-rate REASON=RATE` checks -bounded degradation such as `embedding_error` fallback even when requests still -complete. Optional `--max-upstream-status-rate STATUS=RATE` catches completed -requests where the upstream still returned a specific status such as `400`. Use -this after preflight/E2E and before keeping a new router build in production -traffic. - -For a live sidecar request, pass the same LiteLLM `Authorization` header to -`http://127.0.0.1:4001/v1/chat/completions`. +这个 gate 用来防止 router 看起来能跑,但实际已经出现: -Routing decision preview without upstream forwarding: +- 某个 target 持续失败; +- embedding_error 过多; +- 上游返回大量 400 / 500; +- 日志结构漂移; +- eval 没覆盖到的线上异常。 -```bash -curl http://127.0.0.1:4001/v1/semantic-router/decision \ - -H "Content-Type: application/json" \ - -d '{"model":"semantic-router","messages":[{"role":"user","content":"这个线上 bug 为什么偶发?"}]}' -``` +脚本会输出稳定的 PASS / FAIL 报告,并在预算超限时返回非零 exit code。 -Use this endpoint for route quality review and gray-mode evaluation. It returns -the selected `route_id`, resolved `target_model`, `policy_id`, reason, rewrite -flag, and scores, but does not call LiteLLM or any model backend. +--- -Streaming smoke test: +## Streaming Smoke Test ```bash curl -N http://127.0.0.1:4001/v1/chat/completions \ @@ -229,23 +508,24 @@ curl -N http://127.0.0.1:4001/v1/chat/completions \ -d '{"model":"semantic-router","stream":true,"messages":[{"role":"user","content":"这个线上 bug 为什么偶发?只回答 OK"}],"max_tokens":8}' ``` +--- + ## Semantic Assets -Runtime routing stays dependency-light. Larger semantic assets are built offline -from declared sources in `config/route_sources.yaml`. +Runtime routing 保持 dependency-light。较大的 semantic assets 通过离线脚本生成,来源声明在: + +```text +config/route_sources.yaml +``` -The initial source manifest references mature datasets rather than hand-written -keyword expansion: +当前 source manifest 包括: -- MASSIVE zh-CN / zh-TW official JSONL tarball for general assistant and - utility utterances. The current Hugging Face `datasets` loader cannot load - `AmazonScience/massive` directly because that dataset still uses a dataset - script, so the builder reads the official release archive instead. -- SWE-bench issue statements for repository-level software engineering tasks. -- MBPP and HumanEval for code-generation prompts. -- Local JSONL samples for model-probe traffic. +- MASSIVE zh-CN / zh-TW official JSONL tarball,用于通用 assistant 与 utility utterances; +- SWE-bench issue statements,用于 repository-level software engineering tasks; +- MBPP 和 HumanEval,用于 code-generation prompts; +- local JSONL samples,用于 model-probe traffic。 -Build dependencies are isolated from runtime: +构建依赖与 runtime 隔离: ```bash uv sync --group assets @@ -253,23 +533,18 @@ uv run python scripts/build_route_bank.py uv run python scripts/build_eval_bank.py --per-route-limit 100 ``` -Generated route banks should retain each utterance's source name so eval -failures remain auditable. - -Runtime loading is conservative: `config/routes.yaml` declares -`route_bank_path: data/semantic_sets/route_bank.yaml`, and `load_settings()` -merges that generated bank with the seed utterances only when the file exists. -If the bank is absent, the router keeps using the checked-in seed routes. - -The generated eval bank is also kept out of git. A 200+ case regression run can -be reproduced after building the route bank: +运行扩展 eval: ```bash -uv run python scripts/eval_routes.py --cases data/semantic_sets/eval_bank.yaml +uv run python scripts/eval_routes.py \ + --cases data/semantic_sets/eval_bank.yaml ``` -Redacted production review samples can be promoted into eval cases without -putting raw prompts in logs or git: +Runtime loading 是保守的:`config/routes.yaml` 声明 `route_bank_path: data/semantic_sets/route_bank.yaml`,`load_settings()` 仅在文件存在时合并 generated bank 和 checked-in seed utterances。没有生成资产时,router 继续使用 seed routes。 + +### Redacted Production Samples + +可把脱敏生产 review 样例提升为 eval cases: ```bash uv run python scripts/import_review_samples.py \ @@ -281,6 +556,87 @@ uv run python scripts/build_eval_bank.py \ --per-route-limit 100 ``` -Each JSONL sample must set `redacted: true`, include `text`, and set `expect` -to a configured route id such as `fast` or `strong`. The importer rejects -unredacted samples by default. +每条 JSONL sample 必须: + +- `redacted: true` +- 包含 `text` +- `expect` 指向已配置 route id,例如 `fast` 或 `strong` + +Importer 默认拒绝未脱敏样例。 + +--- + +## 项目边界 + +Cynosure Router 不做这些事: + +- 不保存 API key; +- 不管理 provider order; +- 不实现 provider fallback; +- 不替代 LiteLLM; +- 不记录原始 prompt; +- 不提交 LiteLLM mount、tokens 或 `.env`; +- 不追求训练一个通用 LLM router 模型; +- 不把 route 质量伪装成不可解释黑盒。 + +它只做: + +```text +intent → route_id → target_model → auditable rewrite +``` + +本仓库仍然应与 `/home/raystorm/gateway/litellm` 保持隔离。不要把本地 LiteLLM mount 文件、tokens、`.env` 或供应商密钥材料加入这里。 + +--- + +## 当前状态 + +项目处于本地生产化打磨阶段,尚未 public-release ready。公开发布前还需要统一审计: + +- configurable route abstraction; +- observability contract; +- redacted eval workflow; +- license 与 release documentation; +- README / GitHub metadata / repository name 的最终一致性。 + +已经具备: + +- OpenAI-compatible chat proxy; +- streaming / non-streaming 转发; +- 配置化 route; +- 显式 route metadata; +- 中文 hard rules; +- embedding semantic match; +- readiness / liveness; +- decision preview; +- structured route logs; +- route summary; +- route error budget gate; +- mock eval; +- preflight; +- Docker sidecar 运行形态。 + +仍需继续打磨: + +- 更严格的 LiteLLM model-entry E2E; +- 更完整的 route bank 生成和审查流程; +- 基于真实 redacted 样例的 eval 扩充; +- 生命周期耦合策略; +- 公共发布前的 license / release 文档整理。 + +--- + +## Name + +Cynosure 的含义是“指引方向的中心点”。这个名字对应本项目的职责:不执行模型、不替代网关,而是在模型流量进入执行层前,给出可解释、可审计、可回退的方向选择。 + +```text +Cynosure Router += the guiding point for model traffic +``` + +GitHub 仓库标题、描述、重命名等平台元数据建议不在本 PR 中直接修改;建议先作为文件记录进入 review。详见: + +```text +docs/PROJECT_IDENTITY.md +``` From 7cbd740f28a4b2d1ae4e9332de2de9d5636c6271 Mon Sep 17 00:00:00 2001 From: raystorm <2557058999@qq.com> Date: Tue, 5 May 2026 21:29:01 +0800 Subject: [PATCH 2/2] docs: add repository identity proposal --- docs/PROJECT_IDENTITY.md | 122 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 docs/PROJECT_IDENTITY.md diff --git a/docs/PROJECT_IDENTITY.md b/docs/PROJECT_IDENTITY.md new file mode 100644 index 0000000..bfe9432 --- /dev/null +++ b/docs/PROJECT_IDENTITY.md @@ -0,0 +1,122 @@ +# Project Identity Proposal + +本文档记录仓库名称、GitHub 标题和 About 描述建议。此文件只是项目内文档,不会修改 GitHub 仓库元数据。 + +## 推荐名称 + +```text +Cynosure Router +``` + +## 推荐仓库名 + +```text +cynosure-router +``` + +保留 `router` 后缀是为了降低识别成本:项目本质仍然是 LLM gateway 前面的 routing sidecar。`Cynosure` 提供品牌识别,`Router` 提供功能锚点。 + +不建议继续使用: + +```text +gateway-semantic-router +``` + +原因: + +- 名字过于描述性,缺少产品识别度; +- `semantic-router` 容易和 LiteLLM / 其他项目里的 generic semantic routing 概念混淆; +- 无法表达本项目真正的差异点:中文-heavy agent traffic、可审计结构化日志、decision preview、error budget gate、LiteLLM 控制面 sidecar; +- 未来如果加入 response/chat-completion shim、route quality workflow、traffic audit 等能力,旧名会显得过窄。 + +## GitHub repository title 建议 + +```text +Cynosure Router +``` + +## GitHub About description 建议 + +```text +Intent-aware model routing sidecar for LiteLLM/OpenAI-compatible gateways, built for Chinese-heavy agent traffic, auditable decisions, and safe fallback. +``` + +备选短版: + +```text +Auditable intent router for LiteLLM and OpenAI-compatible model gateways. +``` + +## 一句话定位 + +```text +Cynosure Router is the intent-aware control plane that decides where model traffic should go before LiteLLM executes it. +``` + +中文版本: + +```text +Cynosure Router 是 LiteLLM 执行模型前的一层意图分流控制面。 +``` + +## 命名理由 + +`Cynosure` 原意接近“指引方向的中心点”。这个词适合本项目,因为项目本身不执行模型、不管理 provider,也不替代 LiteLLM,而是在流量进入执行层前给出方向: + +```text +intent → route_id → target_model → auditable rewrite +``` + +这个名字比 `gateway-semantic-router` 更适合长期演进: + +- 不被 `semantic` 这个单一实现方式绑定; +- 不和 LiteLLM 原生 `smart-router` 或其他 semantic router 概念打架; +- 能容纳 hard rules、metadata override、embedding、eval、observability、error budget 等多种控制面能力; +- 有品牌感,但仍然通过 `Router` 保留功能可读性。 + +## 建议的 README 标题结构 + +```markdown +# Cynosure Router + +> 面向 LLM Gateway 的意图分流控制面。 +> Intent-aware routing sidecar for LiteLLM / OpenAI-compatible gateways. +``` + +## 建议的后续平台元数据修改 + +当本 PR 合并并确认文档方向后,可手动修改 GitHub 平台元数据: + +- Repository name: `cynosure-router` +- Repository title / display name: `Cynosure Router` +- About description: 使用本文推荐长版或短版 +- Topics 可考虑: + - `llm-gateway` + - `litellm` + - `openai-compatible` + - `model-routing` + - `semantic-routing` + - `agent-infra` + - `observability` + +本 PR 不执行这些平台级修改。 + +## 迁移注意事项 + +如果后续真正重命名仓库,需要同步检查: + +- README 中的本地路径示例; +- compose build context; +- Docker service name; +- CI badge 或 workflow 文案; +- 外部脚本、Codex / agent 配置里的 repo URL; +- 本地 clone 路径; +- LiteLLM compose 中指向 sidecar 的路径或服务名。 + +当前 README 仍保留部分本地路径示例,例如: + +```text +/home/raystorm/gateway/gateway-semantic-router +``` + +这些路径反映当前部署状态。仓库真正重命名后,再统一改为新的本地目录名会更安全。