From f4e98fe3ab4b35e60d115aa2111ef1204531c6d7 Mon Sep 17 00:00:00 2001
From: raystorm <2557058999@qq.com>
Date: Tue, 5 May 2026 21:16:24 +0800
Subject: [PATCH 1/2] docs: refresh project identity and README

---
 README.md | 714 ++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 535 insertions(+), 179 deletions(-)

diff --git a/README.md b/README.md
index 801af0a..9ac00bd 100644
--- a/README.md
+++ b/README.md
@@ -1,190 +1,481 @@
-# Gateway Semantic Router
+# Cynosure Router
+
+> 面向 LLM Gateway 的意图分流控制面。  
+> Intent-aware routing sidecar for LiteLLM / OpenAI-compatible gateways.
+
+Cynosure Router 是一个轻量、本地优先、可审计的 LLM 路由 sidecar。它不替代 LiteLLM，也不重新发明模型网关；它只负责在请求进入模型执行层之前，根据用户意图选择合适的模型通道，并把语义入口模型改写为部署环境里的真实目标模型。
+
+当前项目主要面向中文-heavy 的个人 / 小团队 agent 流量，例如代码审查、debug、架构分析、线上故障判断、模型探活、低风险问答和混合自动化工作流。
+
+---
+
+## 为什么需要它
+
+很多通用 LLM router 更关注英文 benchmark、强弱模型成本优化，或者直接把路由、评估、服务、执行层打包成一套重系统。
+
+但在本地 LiteLLM 网关场景里，真正的问题通常更具体：
+
+- 中文技术请求经常被低估复杂度；
+- 简单闲聊、翻译、格式转换不应该消耗强模型额度；
+- 代码审查、线上故障、架构权衡、权限安全类问题必须进强模型；
+- 免费端点、实验模型、探活请求应该进入隔离通道；
+- 路由决策必须能解释、能回放、能统计，而不是黑盒；
+- LiteLLM 仍然应该保留 provider order、fallback、cooldown、key 管理和真实执行层职责。
+
+Cynosure Router 的定位就是：**在 LiteLLM 前面增加一层可控、可观测、中文友好的意图分流层。**
+
+---
+
+## 核心定位
+
+```text
+Client / Agent / IDE / Automation
+        │
+        │ OpenAI-compatible request
+        ▼
+Cynosure Router
+  - 读取 latest user message
+  - 支持显式 route metadata
+  - 支持中文 hard rules
+  - 使用 embedding 做语义匹配
+  - 低置信度安全回退
+  - 改写 model 字段
+  - 记录结构化路由日志
+        │
+        │ rewritten model
+        ▼
+LiteLLM Gateway
+  - provider order
+  - fallback
+  - cooldown
+  - auth / key management
+  - actual model execution
+        │
+        ▼
+Model Providers
+```
+
+Cynosure Router 只做路由决策和 `model` rewrite。真实模型调用、密钥、provider fallback 和供应商编排仍然交给 LiteLLM。
+
+---
+
+## 当前能力
+
+### OpenAI-compatible Chat Proxy
+
+支持：
+
+- `POST /v1/chat/completions`
+- 非流式响应
+- `stream=true` SSE 流式响应
+- 仅改写请求中的 `model` 字段
+- 保留上游 LiteLLM 响应体
+- 注入路由观测 headers
+
+示例 headers：
+
+```text
+x-router-request-id
+x-router-target-model
+x-router-reason
+```
+
+### 语义入口模型
+
+客户端请求一个语义入口模型，例如：
+
+```json
+{
+  "model": "semantic-router",
+  "messages": [
+    {
+      "role": "user",
+      "content": "帮我审一下这个 PR 有没有竞态问题"
+    }
+  ]
+}
+```
+
+Cynosure Router 会把它改写成真实目标模型，例如：
+
+```text
+pro-router
+```
+
+其他非入口模型会原样透传，不进入语义路由。LiteLLM 原生的 `smart-router` 也被刻意保留为单独的上游模型组，避免概念混淆。
+
+### Route 抽象
+
+默认示例 route：
 
-Lightweight, local-first OpenAI/LiteLLM-compatible routing sidecar for
-`/v1/chat/completions`.
+| route_id | 目标模型示例 | 用途 |
+|---|---|---|
+| `fast` | `cheap-router` | 普通问答、解释、翻译、轻量总结 |
+| `strong` | `pro-router` | 代码、debug、架构、多步推理、高风险判断 |
+| `experimental` | `free-probe-router` | 免费端点探活、实验模型试探、低价值样例比较 |
 
-It rewrites the configured semantic entry model, currently
-`model=semantic-router`, by selecting a configured `route_id` and resolving that
-route to a deployment-specific `target_model`.
+这些目标模型只是当前 LiteLLM 部署里的示例名字。真正的目标模型由 `config/routes.yaml` 映射决定。
 
-The checked-in sample config uses route ids such as `fast`, `strong`, and
-`experimental`, mapped to local example LiteLLM targets such as `cheap-router`,
-`pro-router`, and `free-probe-router`. Those target names are examples from this
-machine's LiteLLM setup, not product-level route names.
+### 决策优先级
 
-Runtime config validation enforces that the semantic entry model itself cannot
-appear as a route target and that the fallback route exists, which prevents
-recursive forwarding back to `semantic-router`.
+一次 routed 请求的决策顺序：
 
-All other model names pass through unchanged. LiteLLM's native `smart-router`
-is intentionally kept as a separate upstream model group.
+1. 非入口模型：直接 passthrough；
+2. `metadata.route` / `metadata.target_route` 显式指定 route；
+3. 中文 hard rules 命中高风险关键词；
+4. embedding 语义匹配；
+5. 低置信度或 embedding 异常时回退到 `fallback_route_id`。
 
-Both non-streaming and `stream=true` SSE chat completions are proxied. The
-sidecar rewrites only the request model field, then preserves the upstream
-LiteLLM response body and routing headers.
+这使路由行为既能自动判断，也能被上层 agent / workflow 显式控制。
 
-This repository is intentionally separate from `/home/raystorm/gateway/litellm`.
-Do not add LiteLLM mount files, tokens, or `.env` material here.
+### 安全回退
 
-The project is not public-release ready yet. Public repository visibility,
-license polish, and release documentation are deferred until the configurable
-route abstraction, observability contract, and redacted eval workflow have been
-audited together.
+Embedding 故障被视为可降级问题：
 
-## Local Run
+- `/ready` 会报告 embedding degraded；
+- routed chat 请求不会直接失败；
+- 请求会 fallback 到配置里的 `fallback_route_id`；
+- 路由日志中记录 `reason=embedding_error`。
+
+LiteLLM 或上游模型失败则不同：上游异常会被包装成受控的 `502`，并记录为 `route_error`。
+
+---
+
+## 本地运行
+
+安装依赖：
+
+```bash
+uv sync
+```
+
+启动 router：
 
 ```bash
 uv run python -m router.app
 ```
 
-## Container Lifecycle
+默认端口：
+
+```text
+Router:    http://127.0.0.1:4001
+LiteLLM:   http://127.0.0.1:4000
+Embedding: http://127.0.0.1:1234/v1/embeddings
+```
+
+---
+
+## 容器运行
 
-The router is packaged with `Dockerfile` and is intended to run as a sibling
-service in the LiteLLM compose project, not as an ad-hoc local process.
+Router 带有 `Dockerfile`，建议作为 LiteLLM compose 项目的 sibling service 运行，而不是临时本地进程。
 
-It remains a third-party sidecar. Future lifecycle coupling may bind it more
-closely to the LiteLLM service readiness/restart lifecycle, but that coupling is
-still a design item rather than current behavior. See `docs/roadmap.md`.
+推荐形态：
 
-The compose service should use:
+```text
+LiteLLM :4000
+Cynosure Router :4001
+LM Studio Embedding :1234
+```
+
+Compose service 通常需要：
 
 - build context: `/home/raystorm/gateway/gateway-semantic-router`
 - upstream LiteLLM URL: `http://litellm:4000`
-- embedding URL from container to host LM Studio:
-  `http://host.docker.internal:1234/v1/embeddings`
+- embedding URL from container to host LM Studio: `http://host.docker.internal:1234/v1/embeddings`
 - exposed router port: `4001`
-- optional generated semantic asset mount:
-  `/home/raystorm/gateway/gateway-semantic-router/data/semantic_sets:/app/data/semantic_sets:ro`
+- optional generated semantic asset mount: `/home/raystorm/gateway/gateway-semantic-router/data/semantic_sets:/app/data/semantic_sets:ro`
+
+当前仍然是第三方 sidecar。未来可以更紧密地绑定 LiteLLM service readiness / restart lifecycle，但这是 roadmap 项，不是当前行为。
+
+---
+
+## 配置
+
+主配置文件：
+
+```text
+config/routes.yaml
+```
+
+关键配置：
+
+```yaml
+route_model: semantic-router
+fallback_route_id: fast
+threshold: 0.55
+margin: 0.04
+
+embedding_url: http://127.0.0.1:1234/v1/embeddings
+embedding_model: text-embedding-jina-embeddings-v5-text-small-retrieval@q8_0
+
+litellm_base_url: http://127.0.0.1:4000
+listen_host: 127.0.0.1
+listen_port: 4001
+```
+
+环境变量覆盖：
+
+```text
+ROUTER_HOST
+ROUTER_PORT
+ROUTER_LITELLM_BASE_URL
+ROUTER_LITELLM_TIMEOUT
+ROUTER_EMBEDDING_URL
+ROUTER_EMBEDDING_MODEL
+ROUTER_ACCESS_LOG
+ROUTER_READINESS_TIMEOUT
+```
+
+`ROUTER_ACCESS_LOG` 默认为 `false`。只有确实需要原始 HTTP access log 时才建议打开。
+
+---
+
+## 与 LiteLLM 的关系
+
+Cynosure Router 是 LiteLLM 的旁路控制面，不是 LiteLLM fork。
+
+两种接入方式：
+
+### 方式一：客户端直接打 Router
+
+客户端 base URL 指向：
+
+```text
+http://127.0.0.1:4001
+```
+
+请求：
+
+```text
+model=semantic-router
+```
+
+### 方式二：作为 LiteLLM model entry
+
+低侵入生产方向是保留客户端 base URL 为 LiteLLM：
+
+```text
+http://127.0.0.1:4000
+```
+
+然后在 LiteLLM 中暴露一个模型入口，让 `model=semantic-router` 进入 sidecar。这样客户端只需要改 model，不需要改 base URL。
+
+LiteLLM 的原生 `smart-router` 应保持独立：
+
+- `smart-router`：LiteLLM 内置 complexity router；
+- `semantic-router`：Cynosure Router 的语义任务路由入口。
+
+当前证明和验收标准见：
+
+```text
+docs/superpowers/specs/2026-05-03-litellm-semantic-router-entry-design.md
+```
+
+---
+
+## 健康检查
+
+本地 liveness：
+
+```bash
+curl http://127.0.0.1:4001/health
+```
+
+分层 readiness：
+
+```bash
+curl http://127.0.0.1:4001/ready
+```
+
+`/ready` 会分别检查：
 
-Default endpoints:
+- router
+- LiteLLM upstream
+- embedding upstream
 
-- Router: `http://127.0.0.1:4001`
-- LiteLLM upstream: `http://127.0.0.1:4000`
-- Embedding upstream: `http://127.0.0.1:1234/v1/embeddings`
+Docker health check 建议使用 `/health`，避免 embedding 或 LiteLLM 短暂 degraded 导致容器反复重启。`/ready` 更适合人工检查、部署门禁和运行状态观测。
 
-Environment overrides:
+---
 
-- `ROUTER_HOST`
-- `ROUTER_PORT`
-- `ROUTER_LITELLM_BASE_URL`
-- `ROUTER_LITELLM_TIMEOUT`
-- `ROUTER_EMBEDDING_URL`
-- `ROUTER_EMBEDDING_MODEL`
-- `ROUTER_ACCESS_LOG` (`false` by default; set `true` only when raw HTTP
-  access logs are needed)
-- `ROUTER_READINESS_TIMEOUT`
+## 决策预览
 
-## LiteLLM Entry Design
+只查看路由决策，不转发到 LiteLLM：
 
-The low-intrusion production direction is to keep upstream clients on the
-LiteLLM base URL and expose the sidecar as a LiteLLM model entry named
-`semantic-router`. In that shape, clients keep `http://127.0.0.1:4000` and opt
-in by changing only the model name.
+```bash
+curl http://127.0.0.1:4001/v1/semantic-router/decision \
+  -H "Content-Type: application/json" \
+  -d '{"model":"semantic-router","messages":[{"role":"user","content":"这个线上 bug 为什么偶发？"}]}'
+```
+
+返回内容包括：
+
+- `source_model`
+- `route_id`
+- `target_model`
+- `policy_id`
+- `reason`
+- `rewrite`
+- `score`
+- `second_score`
+
+这个 endpoint 适合做 route 质量审查、灰度前验证、agent workflow 调试、eval case 复核，以及不消耗模型调用的 dry-run。
+
+---
+
+## 可观测性
+
+每个 routed 请求都会写入结构化日志，例如：
+
+```json
+{
+  "event": "route_complete",
+  "request_id": "...",
+  "request_id_source": "x-request-id",
+  "source_model": "semantic-router",
+  "route_id": "strong",
+  "target_model": "pro-router",
+  "policy_id": "embedding",
+  "reason": "embedding",
+  "rewrite": true,
+  "stream": false,
+  "upstream_status": 200,
+  "score": 0.812341,
+  "second_score": 0.421133,
+  "duration_ms": 123.45
+}
+```
+
+日志不会记录 prompt 或 bearer token。
+
+Sidecar 会接受以下 request identity sources：
+
+- `x-request-id`
+- `x-correlation-id`
+- W3C `traceparent`
+- `metadata.semantic_router_request_id`
+- `user`
 
-LiteLLM's native `smart-router` should remain separate. It continues to mean
-LiteLLM's built-in complexity router, while `semantic-router` means this
-sidecar's semantic task router.
+最终 request id 会注入到上游 `x-request-id` header，方便 sidecar 到 LiteLLM 的跨层关联。
 
-Current proof and acceptance criteria are documented in
-`docs/superpowers/specs/2026-05-03-litellm-semantic-router-entry-design.md`.
+---
 
-## Verification
+## 验证
+
+基础测试：
 
 ```bash
 uv run python -m pytest -q
 uv run python scripts/eval_routes.py --mock-embeddings
 ```
 
-## CI
-
-GitHub Actions PR CI runs only the baseline automated checks:
+CI 当前只运行基线自动化检查：
 
 - `uv run python -m pytest -q`
 - `uv run python scripts/eval_routes.py --mock-embeddings`
 
-This CI is intentionally minimal and does not claim full production validation.
-Live preflight, LiteLLM-entry E2E, Docker log summary/review, and route-error
-budget checks remain operator/local-production checks.
+这套 CI 只证明基础行为没有回归，不宣称完整生产验证。Live preflight、LiteLLM-entry E2E、Docker log summary/review 和 route-error budget 仍然是 operator / local-production 检查。
+
+---
+
+## Production Preflight
+
+对运行中的 router 做 preflight：
+
+```bash
+uv run python scripts/preflight.py \
+  --router-base-url http://127.0.0.1:4001
+```
+
+需要设置：
+
+```bash
+export LITELLM_MASTER_KEY=...
+```
 
-Production preflight against a running router:
+也可以显式传入：
 
 ```bash
-uv run python scripts/preflight.py --router-base-url http://127.0.0.1:4001
+uv run python scripts/preflight.py \
+  --router-base-url http://127.0.0.1:4001 \
+  --api-key "$LITELLM_MASTER_KEY"
+```
+
+Preflight 会检查：
+
+- `/health`
+- `/ready`
+- 非流式 chat route
+- 流式 SSE route
+- 路由 headers
+- 基础响应形状
+
+当 readiness degraded 时，脚本会打印 degraded component detail，例如：
+
+```text
+ready=False degraded=embedding:ConnectError
 ```
 
-The preflight requires `LITELLM_MASTER_KEY` in the environment or `--api-key`.
-It checks liveness, layered readiness, non-streaming route headers, streaming
-route headers, and SSE body shape without printing secrets or prompts. When
-readiness is degraded, it prints the degraded component detail, for example
-`ready=False degraded=embedding:ConnectError`. Readiness is retried briefly by
-default; use `--ready-attempts` and `--ready-interval` to tune that gate.
+---
+
+## LiteLLM Entry E2E
+
+通过 LiteLLM 入口验证 `model=semantic-router`：
+
+```bash
+uv run python scripts/e2e_litellm_entry.py \
+  --litellm-base-url http://127.0.0.1:4000
+```
 
-Runtime probes:
+E2E 会验证：
 
-- `/health` is a local liveness check for container health.
-- `/ready` is a layered readiness check. It reports `router`, `litellm`, and
-  `embedding` components separately and returns `503` when any layer is
-  degraded. Docker health intentionally still uses `/health` so readiness can be
-  observed without causing restart loops.
+- 非流式响应；
+- 流式响应；
+- sidecar route logs；
+- 示例 `fast` / `strong` / `experimental` route；
+- route id 与 configured target model 是否一致。
 
-Embedding degraded mode is intentionally fail-open for routed chat requests:
-when the embedding component is unavailable, `/ready` returns `503`, but
-`model=semantic-router` requests fall back to `fallback_route_id` with
-`reason=embedding_error`. LiteLLM/upstream proxy failures are different: they
-fail closed as redacted `502` responses and are logged as `route_error`.
+LiteLLM model-entry 路径目前不一定保留 client-supplied correlation id 到 sidecar，所以脚本会先尝试 request-id matching，再 fallback 到 recent route shape matching。
 
-Production E2E through the LiteLLM entrypoint:
+如果验证路径预期必须端到端保留 request id，可使用：
 
 ```bash
-uv run python scripts/e2e_litellm_entry.py --litellm-base-url http://127.0.0.1:4000
-```
-
-The E2E checks `model=semantic-router` through LiteLLM `:4000`, verifies
-non-streaming and streaming responses, and confirms sidecar route logs for the
-sample `fast`, `strong`, and `experimental` route ids plus their configured
-target models. LiteLLM's model-entry
-path does not currently preserve client-supplied correlation IDs to the sidecar,
-so the script first tries request-id matching and then falls back to recent route
-shape matching. The script prints `RUN` lines before each probe so slow upstream
-requests can be localized; use `--timeout` to tune HTTP timeouts or
-`--quiet-progress` to suppress progress lines. Use
-`--require-request-id-log-match` when validating a deployment path that is
-expected to preserve request IDs end to end; failed probes are not allowed to
-pass route-log checks by matching old route-shape logs.
-
-Within the sidecar, route logs include `route_id`, `target_model`, `policy_id`,
-`request_id`, and `request_id_source`. The sidecar accepts `x-request-id`,
-`x-correlation-id`, W3C `traceparent`, `metadata.semantic_router_request_id`,
-and `user` as request identity sources, then injects the final value into the
-upstream `x-request-id` header. This makes sidecar-to-upstream correlation
-stable even when LiteLLM's model-entry layer does not preserve the original
-client id.
-
-Production route-log summary from sidecar logs:
+uv run python scripts/e2e_litellm_entry.py \
+  --litellm-base-url http://127.0.0.1:4000 \
+  --require-request-id-log-match
+```
+
+---
+
+## Route Log Summary
+
+从 sidecar 日志生成路由统计：
 
 ```bash
 docker logs --since 12h gateway_semantic_router 2>&1 \
   | uv run python scripts/router_log_summary.py
 ```
 
-The summary parser ignores uvicorn access lines and only counts structured
-`route_complete` / `route_error` JSON records. Upstream route exceptions and
-HTTP `5xx` statuses are returned as `502` with a redacted JSON error body and
-are logged as `route_error`; HTTP status failures include `upstream_status` in
-the structured log and `upstream_statuses` in the summary. Route ids, deployment
-targets, and route reasons are counted separately, so degraded embedding
-fallback shows up as
-`reasons: embedding_error=N`. Prompts and bearer tokens are not logged.
-When malformed JSON, missing-event JSON records, or unknown-event JSON records
-are present after the first `{` in a log line, the summary adds an
-`ignored_records` line so operators can distinguish parser/log-shape drift from
-real routed traffic. Plain access-log lines without JSON objects are still
-ignored silently. Non-200 upstream statuses are also grouped by status, target,
-reason, and stream mode under `upstream_non_200` so an operator can quickly see
-deployment patterns such as `status=400 target=cheap-router
-reason=low_confidence`.
-
-Production route-error budget gate:
+摘要会统计：
+
+- routed 请求总数；
+- completed / error；
+- stream / non-stream；
+- route_id 分布；
+- target_model 分布；
+- reason 分布；
+- error_type 分布；
+- upstream_status 分布；
+- 非 200 上游状态；
+- 最大耗时；
+- 被忽略的异常日志记录。
+
+解析器会忽略 uvicorn access lines，只统计结构化 `route_complete` / `route_error` JSON 记录。Prompts 和 bearer tokens 不会进入日志。
+
+---
+
+## Route Error Budget
+
+上线前或灰度后检查路由错误预算：
 
 ```bash
 docker logs --since 12h gateway_semantic_router 2>&1 \
@@ -196,31 +487,19 @@ docker logs --since 12h gateway_semantic_router 2>&1 \
       --max-upstream-status-rate 400=0
 ```
 
-The budget gate prints a stable PASS/FAIL report and exits non-zero when the
-selected log window has too few route events or exceeds the total/per-target
-`route_error` thresholds. Optional `--max-reason-rate REASON=RATE` checks
-bounded degradation such as `embedding_error` fallback even when requests still
-complete. Optional `--max-upstream-status-rate STATUS=RATE` catches completed
-requests where the upstream still returned a specific status such as `400`. Use
-this after preflight/E2E and before keeping a new router build in production
-traffic.
-
-For a live sidecar request, pass the same LiteLLM `Authorization` header to
-`http://127.0.0.1:4001/v1/chat/completions`.
+这个 gate 用来防止 router 看起来能跑，但实际已经出现：
 
-Routing decision preview without upstream forwarding:
+- 某个 target 持续失败；
+- embedding_error 过多；
+- 上游返回大量 400 / 500；
+- 日志结构漂移；
+- eval 没覆盖到的线上异常。
 
-```bash
-curl http://127.0.0.1:4001/v1/semantic-router/decision \
-  -H "Content-Type: application/json" \
-  -d '{"model":"semantic-router","messages":[{"role":"user","content":"这个线上 bug 为什么偶发？"}]}'
-```
+脚本会输出稳定的 PASS / FAIL 报告，并在预算超限时返回非零 exit code。
 
-Use this endpoint for route quality review and gray-mode evaluation. It returns
-the selected `route_id`, resolved `target_model`, `policy_id`, reason, rewrite
-flag, and scores, but does not call LiteLLM or any model backend.
+---
 
-Streaming smoke test:
+## Streaming Smoke Test
 
 ```bash
 curl -N http://127.0.0.1:4001/v1/chat/completions \
@@ -229,23 +508,24 @@ curl -N http://127.0.0.1:4001/v1/chat/completions \
   -d '{"model":"semantic-router","stream":true,"messages":[{"role":"user","content":"这个线上 bug 为什么偶发？只回答 OK"}],"max_tokens":8}'
 ```
 
+---
+
 ## Semantic Assets
 
-Runtime routing stays dependency-light. Larger semantic assets are built offline
-from declared sources in `config/route_sources.yaml`.
+Runtime routing 保持 dependency-light。较大的 semantic assets 通过离线脚本生成，来源声明在：
+
+```text
+config/route_sources.yaml
+```
 
-The initial source manifest references mature datasets rather than hand-written
-keyword expansion:
+当前 source manifest 包括：
 
-- MASSIVE zh-CN / zh-TW official JSONL tarball for general assistant and
-  utility utterances. The current Hugging Face `datasets` loader cannot load
-  `AmazonScience/massive` directly because that dataset still uses a dataset
-  script, so the builder reads the official release archive instead.
-- SWE-bench issue statements for repository-level software engineering tasks.
-- MBPP and HumanEval for code-generation prompts.
-- Local JSONL samples for model-probe traffic.
+- MASSIVE zh-CN / zh-TW official JSONL tarball，用于通用 assistant 与 utility utterances；
+- SWE-bench issue statements，用于 repository-level software engineering tasks；
+- MBPP 和 HumanEval，用于 code-generation prompts；
+- local JSONL samples，用于 model-probe traffic。
 
-Build dependencies are isolated from runtime:
+构建依赖与 runtime 隔离：
 
 ```bash
 uv sync --group assets
@@ -253,23 +533,18 @@ uv run python scripts/build_route_bank.py
 uv run python scripts/build_eval_bank.py --per-route-limit 100
 ```
 
-Generated route banks should retain each utterance's source name so eval
-failures remain auditable.
-
-Runtime loading is conservative: `config/routes.yaml` declares
-`route_bank_path: data/semantic_sets/route_bank.yaml`, and `load_settings()`
-merges that generated bank with the seed utterances only when the file exists.
-If the bank is absent, the router keeps using the checked-in seed routes.
-
-The generated eval bank is also kept out of git. A 200+ case regression run can
-be reproduced after building the route bank:
+运行扩展 eval：
 
 ```bash
-uv run python scripts/eval_routes.py --cases data/semantic_sets/eval_bank.yaml
+uv run python scripts/eval_routes.py \
+  --cases data/semantic_sets/eval_bank.yaml
 ```
 
-Redacted production review samples can be promoted into eval cases without
-putting raw prompts in logs or git:
+Runtime loading 是保守的：`config/routes.yaml` 声明 `route_bank_path: data/semantic_sets/route_bank.yaml`，`load_settings()` 仅在文件存在时合并 generated bank 和 checked-in seed utterances。没有生成资产时，router 继续使用 seed routes。
+
+### Redacted Production Samples
+
+可把脱敏生产 review 样例提升为 eval cases：
 
 ```bash
 uv run python scripts/import_review_samples.py \
@@ -281,6 +556,87 @@ uv run python scripts/build_eval_bank.py \
   --per-route-limit 100
 ```
 
-Each JSONL sample must set `redacted: true`, include `text`, and set `expect`
-to a configured route id such as `fast` or `strong`. The importer rejects
-unredacted samples by default.
+每条 JSONL sample 必须：
+
+- `redacted: true`
+- 包含 `text`
+- `expect` 指向已配置 route id，例如 `fast` 或 `strong`
+
+Importer 默认拒绝未脱敏样例。
+
+---
+
+## 项目边界
+
+Cynosure Router 不做这些事：
+
+- 不保存 API key；
+- 不管理 provider order；
+- 不实现 provider fallback；
+- 不替代 LiteLLM；
+- 不记录原始 prompt；
+- 不提交 LiteLLM mount、tokens 或 `.env`；
+- 不追求训练一个通用 LLM router 模型；
+- 不把 route 质量伪装成不可解释黑盒。
+
+它只做：
+
+```text
+intent → route_id → target_model → auditable rewrite
+```
+
+本仓库仍然应与 `/home/raystorm/gateway/litellm` 保持隔离。不要把本地 LiteLLM mount 文件、tokens、`.env` 或供应商密钥材料加入这里。
+
+---
+
+## 当前状态
+
+项目处于本地生产化打磨阶段，尚未 public-release ready。公开发布前还需要统一审计：
+
+- configurable route abstraction；
+- observability contract；
+- redacted eval workflow；
+- license 与 release documentation；
+- README / GitHub metadata / repository name 的最终一致性。
+
+已经具备：
+
+- OpenAI-compatible chat proxy；
+- streaming / non-streaming 转发；
+- 配置化 route；
+- 显式 route metadata；
+- 中文 hard rules；
+- embedding semantic match；
+- readiness / liveness；
+- decision preview；
+- structured route logs；
+- route summary；
+- route error budget gate；
+- mock eval；
+- preflight；
+- Docker sidecar 运行形态。
+
+仍需继续打磨：
+
+- 更严格的 LiteLLM model-entry E2E；
+- 更完整的 route bank 生成和审查流程；
+- 基于真实 redacted 样例的 eval 扩充；
+- 生命周期耦合策略；
+- 公共发布前的 license / release 文档整理。
+
+---
+
+## Name
+
+Cynosure 的含义是“指引方向的中心点”。这个名字对应本项目的职责：不执行模型、不替代网关，而是在模型流量进入执行层前，给出可解释、可审计、可回退的方向选择。
+
+```text
+Cynosure Router
+= the guiding point for model traffic
+```
+
+GitHub 仓库标题、描述、重命名等平台元数据建议不在本 PR 中直接修改；建议先作为文件记录进入 review。详见：
+
+```text
+docs/PROJECT_IDENTITY.md
+```

From 7cbd740f28a4b2d1ae4e9332de2de9d5636c6271 Mon Sep 17 00:00:00 2001
From: raystorm <2557058999@qq.com>
Date: Tue, 5 May 2026 21:29:01 +0800
Subject: [PATCH 2/2] docs: add repository identity proposal

---
 docs/PROJECT_IDENTITY.md | 122 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 122 insertions(+)
 create mode 100644 docs/PROJECT_IDENTITY.md

diff --git a/docs/PROJECT_IDENTITY.md b/docs/PROJECT_IDENTITY.md
new file mode 100644
index 0000000..bfe9432
--- /dev/null
+++ b/docs/PROJECT_IDENTITY.md
@@ -0,0 +1,122 @@
+# Project Identity Proposal
+
+本文档记录仓库名称、GitHub 标题和 About 描述建议。此文件只是项目内文档，不会修改 GitHub 仓库元数据。
+
+## 推荐名称
+
+```text
+Cynosure Router
+```
+
+## 推荐仓库名
+
+```text
+cynosure-router
+```
+
+保留 `router` 后缀是为了降低识别成本：项目本质仍然是 LLM gateway 前面的 routing sidecar。`Cynosure` 提供品牌识别，`Router` 提供功能锚点。
+
+不建议继续使用：
+
+```text
+gateway-semantic-router
+```
+
+原因：
+
+- 名字过于描述性，缺少产品识别度；
+- `semantic-router` 容易和 LiteLLM / 其他项目里的 generic semantic routing 概念混淆；
+- 无法表达本项目真正的差异点：中文-heavy agent traffic、可审计结构化日志、decision preview、error budget gate、LiteLLM 控制面 sidecar；
+- 未来如果加入 response/chat-completion shim、route quality workflow、traffic audit 等能力，旧名会显得过窄。
+
+## GitHub repository title 建议
+
+```text
+Cynosure Router
+```
+
+## GitHub About description 建议
+
+```text
+Intent-aware model routing sidecar for LiteLLM/OpenAI-compatible gateways, built for Chinese-heavy agent traffic, auditable decisions, and safe fallback.
+```
+
+备选短版：
+
+```text
+Auditable intent router for LiteLLM and OpenAI-compatible model gateways.
+```
+
+## 一句话定位
+
+```text
+Cynosure Router is the intent-aware control plane that decides where model traffic should go before LiteLLM executes it.
+```
+
+中文版本：
+
+```text
+Cynosure Router 是 LiteLLM 执行模型前的一层意图分流控制面。
+```
+
+## 命名理由
+
+`Cynosure` 原意接近“指引方向的中心点”。这个词适合本项目，因为项目本身不执行模型、不管理 provider，也不替代 LiteLLM，而是在流量进入执行层前给出方向：
+
+```text
+intent → route_id → target_model → auditable rewrite
+```
+
+这个名字比 `gateway-semantic-router` 更适合长期演进：
+
+- 不被 `semantic` 这个单一实现方式绑定；
+- 不和 LiteLLM 原生 `smart-router` 或其他 semantic router 概念打架；
+- 能容纳 hard rules、metadata override、embedding、eval、observability、error budget 等多种控制面能力；
+- 有品牌感，但仍然通过 `Router` 保留功能可读性。
+
+## 建议的 README 标题结构
+
+```markdown
+# Cynosure Router
+
+> 面向 LLM Gateway 的意图分流控制面。  
+> Intent-aware routing sidecar for LiteLLM / OpenAI-compatible gateways.
+```
+
+## 建议的后续平台元数据修改
+
+当本 PR 合并并确认文档方向后，可手动修改 GitHub 平台元数据：
+
+- Repository name: `cynosure-router`
+- Repository title / display name: `Cynosure Router`
+- About description: 使用本文推荐长版或短版
+- Topics 可考虑：
+  - `llm-gateway`
+  - `litellm`
+  - `openai-compatible`
+  - `model-routing`
+  - `semantic-routing`
+  - `agent-infra`
+  - `observability`
+
+本 PR 不执行这些平台级修改。
+
+## 迁移注意事项
+
+如果后续真正重命名仓库，需要同步检查：
+
+- README 中的本地路径示例；
+- compose build context；
+- Docker service name；
+- CI badge 或 workflow 文案；
+- 外部脚本、Codex / agent 配置里的 repo URL；
+- 本地 clone 路径；
+- LiteLLM compose 中指向 sidecar 的路径或服务名。
+
+当前 README 仍保留部分本地路径示例，例如：
+
+```text
+/home/raystorm/gateway/gateway-semantic-router
+```
+
+这些路径反映当前部署状态。仓库真正重命名后，再统一改为新的本地目录名会更安全。