lifefloating · lifefloating · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026
diff --git a/.env.example b/.env.example
@@ -1,6 +1,27 @@
-# Copy to .env and fill in. .env is gitignored.
+# 复制为 .env 后填写；.env 已在 .gitignore 中忽略
 TCAPTCHA_BASE_URL=https://t.captcha.qq.com
-TCAPTCHA_LLM_API_KEY=sk-your-relay-key-here
-TCAPTCHA_LLM_BASE_URL=https://your-relay.example.com
+
+# --- LLM 视觉求解器 ---
+# 只有 image_select pipeline 需要；word_click 已切换到本地 YOLO + Siamese
+# （见 `--extra word-click`），除非你要用 image_select，否则下面几项留空即可。
+TCAPTCHA_LLM_API_KEY=
+TCAPTCHA_LLM_BASE_URL=
 TCAPTCHA_LLM_MODEL=gpt-5.4
 TCAPTCHA_LLM_TIMEOUT=30
+
+# --- word_click / ONNX Runtime 调优 ---
+# 执行后端：默认 "auto" 按 CUDA > ROCm > DML > CoreML > CPU 的顺序挑选。
+# macOS 下 CoreML 首次图编译较慢，通常固定为 "cpu" 更快。
+# 可选值：auto | cpu | cuda | rocm | dml | coreml
+# TCAPTCHA_ORT_BACKEND=cpu
+#
+# ORT intra-op 线程数，默认 min(4, os.cpu_count())。
+# 52×52 的 Siamese 模型过 4 线程后会反向变慢，没实测过就别动。
+# TCAPTCHA_ORT_INTRA_OP_THREADS=4
+
+# --- serve 模式（crack-tcaptcha serve） ---
+# POST /solve 的共享密钥；设置后客户端必须在请求头带 `X-SK`。
+# TCAPTCHA_SERVE_SK=change-me
+# TCAPTCHA_SERVE_HOST=127.0.0.1
+# TCAPTCHA_SERVE_PORT=9991
+# TCAPTCHA_SERVE_WORKERS=4
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,2 @@
+src/crack_tcaptcha/solvers/models/*.onnx filter=lfs diff=lfs merge=lfs -text
+src/crack_tcaptcha/solvers/models/*.ttf filter=lfs diff=lfs merge=lfs -text
diff --git a/.gitignore b/.gitignore
@@ -13,5 +13,4 @@ site
 *.so
 *.whl
 .env
-*.onnx
 origin_papers/
diff --git a/AGENTS.md b/AGENTS.md
@@ -24,7 +24,8 @@ Python >= 3.10, `uv` is the canonical package manager.
 uv sync
 
 # Install with optional extras
-uv sync --extra icon-click   # adds ddddocr + onnxruntime (needed for icon_click and word_click)
+uv sync --extra icon-click   # ddddocr + onnxruntime (icon_click pipeline)
+uv sync --extra word-click   # onnxruntime + opencv-headless + ddddocr (word_click pipeline, local YOLO+Siamese)
 uv sync --extra dev          # pytest, respx, ruff, hypothesis
 uv sync --extra docs         # mkdocs-material
 
@@ -40,9 +41,15 @@ uv run pytest tests/pipelines/ -q         # a single directory
 uv run ruff check .
 uv run ruff format .
 
-# CLI
+# CLI — one-shot
 uv run crack-tcaptcha solve --appid YOUR_APPID --entry-url https://your-site.example/login
 
+# CLI — long-running HTTP service (recommended for repeated use; models load once)
+uv run crack-tcaptcha serve --port 9991 --workers 4
+#   POST http://127.0.0.1:9991/solve  {"appid":"YOUR_APPID","retries":3}
+#   GET  http://127.0.0.1:9991/health
+#   set TCAPTCHA_SERVE_SK to require an X-SK header.
+
 # Docs
 uv run mkdocs serve
 ```
@@ -53,7 +60,8 @@ uv run mkdocs serve
 src/crack_tcaptcha/
 ├── __init__.py          # public API: solve()
 ├── captcha_type.py      # pure-function classifier (dyn_show_info → type)
-├── cli.py               # argparse entry point
+├── cli.py               # argparse entry point (solve / serve subcommands)
+├── server.py            # long-running HTTP service (stdlib http.server)
 ├── client.py            # HTTP three-phase + JSONP unwrap (scrapling / curl_cffi)
 ├── exceptions.py        # NetworkError, SolveError, PowError, TDCError
 ├── models.py            # pydantic models for prehandle / verify responses
@@ -64,10 +72,13 @@ src/crack_tcaptcha/
 │   ├── _common.py       # run_async, finish_with_verify (shared tail)
 │   ├── slide.py         # NCC template match
 │   ├── icon_click.py    # ddddocr detect + template match
-│   ├── word_click.py    # ddddocr detect + LLM vision (+ OCR fallback)
+│   ├── word_click.py    # YOLO detect + Siamese match (local ONNX); ddddocr OCR fallback
 │   └── image_select.py  # LLM region matching
 ├── solvers/
-│   └── llm_vision.py    # OpenAI-compatible vision client
+│   ├── ort_provider.py  # ONNX Runtime execution-provider selection
+│   ├── word_ocr.py      # YOLOv8 + Siamese solver for word_click (fast path)
+│   ├── llm_vision.py    # OpenAI-compatible vision client (image_select only)
+│   └── models/          # bundled ONNX models + font.ttf (force-included in wheel)
 └── tdc/
     ├── provider.py      # TDCProvider Protocol (DI point)
     ├── nodejs_jsdom.py  # Node.js subprocess implementation
@@ -77,6 +88,8 @@ src/crack_tcaptcha/
 Dependency direction is strictly top-down: `pipelines/` depends on
 `solvers/`, `tdc/`, `client.py`, `pow.py`, `trajectory.py`. `solvers/` and
 `tdc/` are independent of each other and must not import from `pipelines/`.
+`server.py` depends on `__init__.solve` and may trigger `solvers/word_ocr.warmup`
+at startup — it must not import from `pipelines/` directly.
 
 ## 4. Key Conventions
 
@@ -124,9 +137,19 @@ Dependency direction is strictly top-down: `pipelines/` depends on
   `DynAnswerType_UC`, `elem_id=""`, `data="<region_id>"`.
 - **Trajectory jitter.** Ease-in-out cubic with ±1 px jitter currently
   passes. Perfectly smooth trajectories get detected.
-- **LLM retry semantics.** `locate_chars` / `match_region` each retry once
+- **LLM retry semantics.** `match_region` (image_select) retries once
   internally on transport errors. Outer retries are the pipeline's
   `max_retries` (entire prehandle → verify loop).
+- **word_click model files are bundled.** `src/crack_tcaptcha/solvers/models/`
+  ships `word_click_detector.onnx` (YOLOv8, 10 MB),
+  `word_click_matcher.onnx` (Siamese, 29 MB), and `font.ttf` (4.6 MB).
+  These are `force-include`d into the wheel via hatch config. Don't
+  rename them without updating `word_ocr.py` and `pyproject.toml`.
+- **ORT cold-start hides behind warmup.** `crack-tcaptcha solve` spawns a
+  background thread that calls `solvers.word_ocr.warmup()` while the
+  first HTTP round-trip is in flight; `crack-tcaptcha serve` warms up at
+  boot. On macOS, `TCAPTCHA_ORT_BACKEND=cpu` is usually faster than the
+  default CoreML auto-pick because CoreML pays a one-off graph compile.
 
 ## 6. Testing Guidelines
 
@@ -173,11 +196,15 @@ Dependency direction is strictly top-down: `pipelines/` depends on
 
 - **Node.js >= 18** for the TDC.js bridge (`tdc/js/tdc_executor.js`,
   runs `tdc.js` inside jsdom). Install deps with `cd src/crack_tcaptcha/tdc/js && npm install`.
-- **`ddddocr`** (optional extra `icon-click`) for icon/character
-  detection. Required by `icon_click` and `word_click` pipelines. Pulls
-  in `onnxruntime`.
-- **OpenAI-compatible LLM relay** for `word_click` (recommended) and
-  `image_select` (required). Configure via `TCAPTCHA_LLM_API_KEY`,
+- **`ddddocr`** (optional extra `icon-click`, and part of `word-click`)
+  for icon / character detection. Required by `icon_click` and used as
+  the `word_click` fallback path. Pulls in `onnxruntime`.
+- **`onnxruntime` + `opencv-python-headless`** (optional extra
+  `word-click`, alongside `ddddocr`). Required for the primary
+  `word_click` path (local YOLOv8 detector + Siamese matcher shipped
+  under `solvers/models/`). No external API calls.
+- **OpenAI-compatible LLM relay** for `image_select` (required). No
+  longer required for `word_click`. Configure via `TCAPTCHA_LLM_API_KEY`,
   `TCAPTCHA_LLM_BASE_URL`, `TCAPTCHA_LLM_MODEL`, `TCAPTCHA_LLM_TIMEOUT`
   in `.env`. Any `/v1/chat/completions` endpoint that accepts
   `image_url` content blocks works.

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -65,3 +65,21 @@ after — don't batch updates.
   against live risk-control signals)
 - Deleting or renaming files under `tdc/js/` (tdc.js is vendored
   intentionally)
+- Deleting, renaming, or re-quantizing files under
+  `src/crack_tcaptcha/solvers/models/` (bundled ONNX models + font are
+  force-included into the wheel; a rename means editing `word_ocr.py`
+  and `pyproject.toml` in lockstep)
+
+### word_click / serve mode
+
+- Primary `word_click` path is local (YOLO detector + Siamese matcher
+  ONNX models under `solvers/models/`). LLM is no longer required.
+- When iterating on `solvers/word_ocr.py`, prefer the serve mode to
+  avoid per-run ONNX cold-start:
+  ```bash
+  uv run crack-tcaptcha serve --port 9991 --workers 2
+  # then hit POST /solve repeatedly
+  ```
+- On macOS, if solve feels slow, check provider selection: CoreML EP
+  pays a per-process graph-compile cost. Force CPU with
+  `TCAPTCHA_ORT_BACKEND=cpu` when benchmarking.
diff --git a/README.md b/README.md
@@ -5,26 +5,29 @@
 
 ![verify ok](images/word-click-success.png)
 
-> 上图为 word_click（文字点选）流水线的真实运行日志：从 `prehandle` → `getcapbysig` 下载背景图 → LLM 视觉给出点击坐标 → `nodejs_jsdom` 采集 TDC collect / eks / pow → `cap_union_new_verify` 一次通过，`ok=true`。
+> 上图为 word_click（文字点选）流水线的真实运行日志：从 `prehandle` → `getcapbysig` 下载背景图 → **本地 YOLO + Siamese 模型**给出点击坐标 → `nodejs_jsdom` 采集 TDC collect / eks / pow → `cap_union_new_verify` 一次通过，`ok=true`。
 
 ## 特性
 
 - **4 种验证类型**：`slider`（滑块）、`icon_click`（图标点击）、`word_click`（文字点选）、`image_select`（图像选择）
 - **无头浏览器依赖**：`nodejs_jsdom` 在 Node.js 进程里用 jsdom 跑官方 TDC.js，生成 `collect / eks / tokenid / pow_answer`
-- **策略化求解器**：滑块使用 OpenCV 模板匹配；点击类支持 `ddddocr` / 任意 OpenAI 兼容的 LLM vision
+- **策略化求解器**：滑块使用 OpenCV 模板匹配；`word_click` 走本地 **YOLOv8 检测 + Siamese 匹配**（纯 ONNX Runtime，单次 ~200 ms）；`icon_click` 使用 `ddddocr`；`image_select` 使用 OpenAI 兼容 LLM vision
+- **常驻 HTTP 服务**：`crack-tcaptcha serve` 让模型只加载一次，每次求解只付推理时间（零进程冷启动）
 - **工程化**：pydantic-settings 配置、结构化日志、CLI、pytest，类型完整
 
 ## 当前测试状态
 
 | 类型 | 状态 | 备注 |
 |---|---|---|
-| `word_click`（文字点选） | ✅ 已跑通（见上图） | LLM vision 映射字→bbox，一次通过 |
+| `word_click`（文字点选） | ✅ 已跑通（见上图） | 本地 YOLO + Siamese 模型，一次通过 |
 | `slider`（滑块） | 🧪 未充分验证 | pipeline 已实现，仅做过少量手工测试 |
 | `icon_click`（图标点击） | 🧪 未充分验证 | pipeline 已实现，依赖 `ddddocr`，待回归 |
 | `image_select`（图像选择） | 🧪 未充分验证 | pipeline 已实现，待回归 |
 
 > 目前项目重点打磨 `word_click`，其它类型欢迎 PR 补测试样本 / 回归用例。
 
+> 📎 **历史方案**：早期 `word_click` 使用 GPT / OpenAI 兼容 LLM vision 接口识别文字坐标的实现，已保留在 [`legacy-llm-vision`](../../tree/legacy-llm-vision) 分支，供参考或回退使用。主分支已切换为本地 YOLOv8 + Siamese 纯 ONNX 方案，无需任何外部大模型 API。
+
 ## 安装
 
 ### 按需求选择
@@ -33,26 +36,32 @@
 # 最小安装：仅 slider pipeline（HTTP + 轨迹生成，无 ML 依赖）
 uv add crack-tcaptcha
 
-# 推荐：图标点击 + 文字点选（word_click 也依赖 ddddocr）
+# 文字点选（本地 YOLO + Siamese 模型；含 ddddocr 作为 OCR 兜底）
+uv add "crack-tcaptcha[word-click]"
+
+# 图标点击（仅 ddddocr）
 uv add "crack-tcaptcha[icon-click]"
 
 # 中文图像选择（cn-clip / torch，下载模型约数百 MB）
 uv add "crack-tcaptcha[clip]"
 
-# 全功能一键装（= icon-click + clip）
+# 全功能一键装（= word-click + icon-click + clip）
 uv add "crack-tcaptcha[all]"
 ```
 
-也可以用 `pip` 替代 `uv add`，语法一致：`pip install 'crack-tcaptcha[icon-click]'`。
+也可以用 `pip` 替代 `uv add`，语法一致：`pip install 'crack-tcaptcha[word-click]'`。
 
 | Extra | 引入依赖 | 启用的 pipeline |
 |---|---|---|
-| _(none)_ | 仅 httpx / pydantic / numpy / Pillow | `slider` |
-| `icon-click` | `ddddocr`（+ onnxruntime） | `icon_click`、`word_click` |
+| _(none)_ | httpx / pydantic / numpy / Pillow / scrapling | `slider` |
+| `icon-click` | `ddddocr`（+ onnxruntime） | `icon_click` |
+| `word-click` | `onnxruntime` + `opencv-python-headless` + `ddddocr` | `word_click`（本地 YOLO + Siamese，OCR 兜底） |
 | `clip` | `cn2an`、`cn-clip`、`torch` | `image_select`（CLIP backend） |
 | `all` | 以上全部 | 所有 pipeline |
 
-> 运行 `word_click` / `icon_click` 前未装 `[icon-click]` 会得到清晰的 ModuleNotFoundError 提示。
+> 未装 `[word-click]` / `[icon-click]` 时，对应 pipeline 抛出清晰的 `SolveError` 提示应装哪个 extra。
+
+> `word_click` 的本地模型（`word_click_detector.onnx` 10 MB、`word_click_matcher.onnx` 29 MB、`font.ttf` 4.6 MB）已随 wheel 打包，安装后开箱即用，无需额外下载。
 
 ### 前置要求
 
@@ -83,13 +92,30 @@ if result.ok:
 ### 命令行
 
 ```bash
-# 通用求解：--appid 替换为你自己的 APP_ID
+# 一次性求解：--appid 替换为你自己的 APP_ID
 crack-tcaptcha solve --appid YOUR_APPID --retries 3 --json
 
 # 指定来源页（会带上对应 Referer / Origin）
 crack-tcaptcha solve --appid YOUR_APPID --entry-url https://example.com/login --json
 ```
 
+### 常驻 HTTP 服务（推荐用于重复调用）
+
+一次性 CLI 每次都要冷启动 Python + 加载 ONNX 模型，首次可能要花几秒。常驻模式模型只加载一次，后续请求只付推理时间：
+
+```bash
+# 启动（鉴权可选：导出 TCAPTCHA_SERVE_SK 后客户端需带 X-SK header）
+export TCAPTCHA_SERVE_SK=change-me
+crack-tcaptcha serve --port 9991 --workers 4
+
+# 客户端：POST /solve
+curl -H 'X-SK: change-me' -X POST http://127.0.0.1:9991/solve \
+    -d '{"appid":"YOUR_APPID","retries":3}'
+
+# 健康检查
+curl http://127.0.0.1:9991/health
+```
+
 > 命令行示例中的 `YOUR_APPID` 仅为占位符，请替换为你自己的 appid；仓库不提供任何真实业务 appid。
 
 ## 本地测试页
@@ -111,18 +137,24 @@ crack-tcaptcha solve --appid YOUR_APPID --entry-url http://localhost:8765/tcap2_
 
 ```
 src/crack_tcaptcha/
-├── client.py              # HTTPX 客户端：prehandle / getcapbysig / verify
+├── client.py              # HTTP 三段式：prehandle / getcapbysig / verify
+├── cli.py                 # argparse 入口（solve / serve 子命令）
+├── server.py              # 常驻 HTTP 服务
 ├── pow.py                 # PoW 求解
-├── trajectory.py          # 轨迹/点击序列合成
+├── trajectory.py          # 轨迹 / 点击序列合成
 ├── captcha_type.py        # 类型分发路由
 ├── pipelines/             # 每种验证类型一个 pipeline
 │   ├── slide.py
 │   ├── icon_click.py
-│   ├── word_click.py      # 文字点选（对应截图演示）
+│   ├── word_click.py      # 本地 YOLO 检测 + Siamese 匹配（含 ddddocr 兜底）
 │   └── image_select.py
-├── solvers/llm_vision.py  # OpenAI 兼容 LLM 视觉求解器
+├── solvers/
+│   ├── ort_provider.py    # ORT execution-provider 选择（CUDA/ROCm/DML/CoreML/CPU）
+│   ├── word_ocr.py        # YOLO + Siamese 求解器（word_click 主路径）
+│   ├── llm_vision.py      # OpenAI 兼容 LLM vision（image_select 用）
+│   └── models/            # 打包的 ONNX 模型 + font.ttf
 └── tdc/                   # TDC.js 桥
-    ├── js/                # npm install 后放 node_modules
+    ├── js/                # npm install 后的 node_modules
     └── nodejs_jsdom.py    # jsdom NodeProvider
 ```
 
@@ -141,9 +173,18 @@ src/crack_tcaptcha/
 | `TCAPTCHA_TDC_TIMEOUT` | `60.0` | TDC.js 桥超时 |
 | `TCAPTCHA_TDC_DEBUG` | `false` | 打开后保留 jsdom 调试日志 |
 | `TCAPTCHA_PROXY` | `None` | `http://user:pass@host:port` |
-| `TCAPTCHA_LLM_API_KEY` | `""` | LLM vision 求解器（`image_select` / `word_click`） |
+| `TCAPTCHA_LLM_API_KEY` | `""` | LLM vision 求解器（仅 `image_select` 需要） |
 | `TCAPTCHA_LLM_BASE_URL` | `""` | OpenAI 兼容接口根 |
 | `TCAPTCHA_LLM_MODEL` | `gpt-5.4` | 模型名 |
+| `TCAPTCHA_ORT_BACKEND` | `auto` | ONNX 执行后端：`auto` / `cpu` / `cuda` / `rocm` / `dml` / `coreml` |
+| `TCAPTCHA_ORT_INTRA_OP_THREADS` | `min(4, cpu_count)` | ORT 线程数（Siamese 在 >4 时反而更慢） |
+| `TCAPTCHA_SERVE_SK` | `""` | 常驻服务鉴权 secret；非空时请求必须带 `X-SK` header |
+| `TCAPTCHA_SERVE_HOST` | `127.0.0.1` | `serve` 子命令监听地址 |
+| `TCAPTCHA_SERVE_PORT` | `9991` | `serve` 子命令监听端口 |
+| `TCAPTCHA_SERVE_WORKERS` | `4` | `serve` 并发 solve 上限 |
+
+> macOS 下若首次求解明显慢，通常是 CoreML 后端的图编译开销；
+> 导出 `TCAPTCHA_ORT_BACKEND=cpu` 往往比默认更快。
 
 ## 开发
 
@@ -154,47 +195,6 @@ uv run pytest -x -ra
 uv run pytest -m "not network"   # 跳过联网用例
 ```
 
-## 推荐 — 用本地模型替换 LLM vision
-
-当前 `word_click` / `image_select` 走 OpenAI 兼容接口，单次推理 **1~3 s** 起步（受网络、排队、token 数影响），是整条链路里最慢的一步。
-本地模型可以把这一步压到 **≤200 ms**，且无调用成本 / 限流 / 数据出站风险。
-
-两类任务本质都是 **"把一张图映射到一个确定的类别 / 索引"**，不需要真正的生成式 VLM：
-
-### 方案 A：PaddleOCR + 轻量匹配（推荐）
-
-| 子任务 | 本地替代 |
-|---|---|
-| `word_click`：识别背景图 3 个 bbox 里各是什么汉字 | **PaddleOCR** (`ch_PP-OCRv4`)，单字裁剪后 OCR → 与指令中的字做字符串匹配 |
-| `image_select`：在 N 宫格里挑"哪个是苹果" | **PaddleClas PP-LCNet / PP-ShiTu** 或 **cn-clip ViT-B/16**（已列在 `[clip]` extras） |
-
-优点：CPU 可跑、模型 <20 MB、推理 10~50 ms；PaddleOCR 对中文场景文字鲁棒性很好。
-
-### 方案 B：CLIP 类零样本匹配
-
-直接复用仓库里已经声明过的 `cn-clip` 依赖：
-
-```bash
-uv add "crack-tcaptcha[clip]"
-```
-
-- `word_click`：把每个 bbox 裁剪图与 "一张写着'X'字的图" 做 image-text 相似度 argmax（但中文单字 CLIP 准确率一般，建议配合 OCR 投票）
-- `image_select`：把指令"请选出所有包含苹果的图片"直接作为 text query，对 N 个格子打分排序，取 top-k
-
-优点：一个模型吃下所有"图→文"匹配场景；缺点：模型 ~400 MB，冷启动有成本。
-
-### 方案 C：ddddocr + 本地分类头（最轻）
-
-- `icon_click` 已经在用 `ddddocr`；`word_click` 的 bbox 识别也可以换 `ddddocr.DdddOcr(det=False)`（纯 OCR 模式）
-- 对 `image_select` 训一个 **PP-LCNet** 分类头（常见类别就那几类：动物、交通工具、食物...）+ "其它"兜底走 CLIP
-
-### 落地建议
-
-1. 在 `solvers/` 下新增 `paddle_ocr.py` 和 `cn_clip.py`，实现与 `llm_vision.py` 同签名（`match_region` / `locate_chars`）
-2. 在 `settings.py` 加 `solver_backend: Literal["llm", "paddle", "clip", "ddddocr"] = "llm"`
-3. pipeline 启动时根据 backend 路由，保留 LLM 作为兜底（本地模型置信度 < 阈值时回退）
-4. 评估指标：单验证码平均耗时、端到端通过率、CPU / 显存占用，基准样本集可用 `tests/samples/`
-
 ## 免责声明
 
 本项目 **仅用于个人安全研究、技术学习与学术交流**，不代表任何商业机构的立场。
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		src/crack_tcaptcha/solvers/models/*.onnx filter=lfs diff=lfs merge=lfs -text
		src/crack_tcaptcha/solvers/models/*.ttf filter=lfs diff=lfs merge=lfs -text
-Original file line number
+Diff line change
@@ Expand Up / @@ -13,5 +13,4 @@ site @@
     *.so
     *.whl
     .env
-    *.onnx
     origin_papers/