-
Notifications
You must be signed in to change notification settings - Fork 72
Add plugin OCR 翻译 v1.0.0 #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
MermaidFAR
wants to merge
3
commits into
ZToolsCenter:main
Choose a base branch
from
MermaidFAR:plugin/ocr-new
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| # Logs | ||
| logs | ||
| *.log | ||
| npm-debug.log* | ||
| yarn-debug.log* | ||
| yarn-error.log* | ||
| pnpm-debug.log* | ||
| lerna-debug.log* | ||
|
|
||
| node_modules | ||
| dist | ||
| dist.zip | ||
| dist-ssr | ||
| *.local | ||
| public/local-ocr-runtime/win32/ | ||
| local-ocr-runtime/win32/ | ||
| .runtime-build/ | ||
|
|
||
| # Editor directories and files | ||
| .vscode/* | ||
| !.vscode/extensions.json | ||
| .idea | ||
| .DS_Store | ||
| *.suo | ||
| *.ntvs* | ||
| *.njsproj | ||
| *.sln | ||
| *.sw? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,146 @@ | ||
| # OCR 翻译 | ||
|
|
||
| 识别图片中的文字,并把识别结果跳转到 ZTools 中的翻译指令。 | ||
|
|
||
| 这是一个使用 **Vue 3 + Vite + TypeScript** 构建的 ZTools 插件。 | ||
|
|
||
| ## 功能特性 | ||
|
|
||
| ### OCR 翻译 | ||
|
|
||
| - 触发指令:`OCR` / `文字识别` / `图片翻译` | ||
| - 触发指令:`OCR 文字识别+复制` / `OCR 文字识别+翻译` | ||
| - 触发指令:`截图文字识别` / `截图文字识别+复制` / `截图文字识别+翻译` | ||
| - 匹配指令:图片 → `OCR 文字识别` / `文字识别+复制` / `文字识别+翻译`,图片文件 → `识别图片文字` | ||
| - 可在 `OCR 设置` 中固定单个引擎,也可以选择“按优先级”并配置 1-4 顺位 | ||
| - 固定单个云端引擎时,失败会直接报错;按优先级模式会按顺位尝试下一个引擎 | ||
| - 按优先级模式会跳过未配置密钥的百度、腾讯和 OpenAI-compatible,避免无意义等待 | ||
| - 云端请求支持 5-180 秒超时设置,避免长时间卡在某个供应商 | ||
| - 如果接口返回 `unknown variant image_url`,说明当前 endpoint 或模型不支持图片输入,需要换视觉模型 | ||
| - 可在 `OCR 设置` 中切换为 OCR.Space;留空 API Key 会使用 OCR.Space 测试 key | ||
| - 可在 `OCR 设置` 中配置百度 OCR、腾讯云 OCR 密钥,适配常见国内免费试用额度 | ||
| - 可在 `OCR 设置` 中配置 OpenAI-compatible 视觉接口,适配云端供应商或自建 DeepSeek-OCR/vLLM 服务 | ||
| - 可在设置页一键安装/启动本地 RapidOCR 服务,并选择 `本地 RapidOCR/PaddleOCR` | ||
| - 本地 OCR 默认识别中文和英文,图片会先进行放大、灰度和对比度增强后再识别 | ||
| - 默认开启文本清理,会修正 `ZTools Al`、`OpenAl`、行首 `。` 误作项目符号等常见误识别;可在设置中关闭 | ||
| - 普通 `OCR` 入口只展示识别结果,只有 `+复制`、`+翻译` 命令或按钮会继续执行复制/跳转 | ||
| - 翻译流程会复制识别结果,并通过 `window.ztools.redirect()` 跳转到翻译指令 | ||
| - 翻译目标默认是 `翻译`,旧的 `翻译文本` 设置会自动迁移为 `翻译` | ||
| - 翻译目标也可以改成 `插件标题/指令名`,例如 `翻译插件/翻译` | ||
| - OCR 引擎使用 `tesseract.js`,首次识别会加载 OCR 资源和语言数据 | ||
|
|
||
| ## 项目结构 | ||
|
|
||
| ``` | ||
| . | ||
| ├── public/ | ||
| │ ├── logo.png # 插件图标 | ||
| │ ├── local-ocr-server/ # 打包进插件的一键安装服务脚本 | ||
| │ ├── plugin.json # 插件配置文件 | ||
| │ └── preload/ # Preload 脚本目录 | ||
| │ ├── package.json # Preload 依赖配置 | ||
| │ └── services.js # Node.js 能力扩展 | ||
| ├── src/ | ||
| │ ├── main.ts # 入口文件 | ||
| │ ├── main.css # 全局样式 | ||
| │ ├── App.vue # 根组件 | ||
| │ ├── env.d.ts # 类型声明 | ||
| │ └── Ocr/ # OCR 功能组件 | ||
| │ └── index.vue | ||
| ├── index.html # HTML 模板 | ||
| ├── vite.config.js # Vite 配置 | ||
| ├── tsconfig.json # TypeScript 配置 | ||
| ├── package.json # 项目依赖 | ||
| └── README.md # 项目文档 | ||
| ``` | ||
|
|
||
| ## 快速开始 | ||
|
|
||
| ### 安装依赖 | ||
|
|
||
| ```bash | ||
| npm install | ||
| ``` | ||
|
|
||
| ### 开发模式 | ||
|
|
||
| ```bash | ||
| npm run dev | ||
| ``` | ||
|
|
||
| 开发服务器将在 `http://localhost:5173` 启动。ZTools 会自动加载开发版本。 | ||
|
|
||
| ### 构建生产版本 | ||
|
|
||
| ```bash | ||
| npm run build | ||
| ``` | ||
|
|
||
| 构建产物将输出到 `dist/` 目录。 | ||
|
|
||
| ## 工作方式 | ||
|
|
||
| 图片通过 ZTools 的 `img` / `files` 匹配指令进入插件,文件路径会在 `public/preload/services.js` 中读取为 Data URL,再交给渲染进程中的 `tesseract.js` 识别。插件也支持 `screenCapture()` 截图和 Electron 剪贴板图片读取。 | ||
|
|
||
| 识别完成后,插件会调用: | ||
|
|
||
| ```ts | ||
| window.ztools.copyText(text) | ||
| window.ztools.redirect(target, text) | ||
| ``` | ||
|
|
||
| `target` 可以是单个指令名,例如 `翻译`;也可以是 `插件标题/指令名`,用于精准跳转到指定插件。 | ||
|
|
||
| ## 云端 OCR 配置 | ||
|
|
||
| - ZTools AI:在 ZTools 的 AI 模型设置中添加支持图片的模型,然后在插件设置页选择 `ZTools AI`。 | ||
| - OCR.Space:到 `ocr.space/ocrapi` 注册免费 API Key;插件设置页里提供了可点击链接,留空会使用测试 Key,仅适合临时验证。 | ||
| - 百度 OCR:在百度智能云开通 OCR,填写 API Key 和 Secret Key。 | ||
| - 腾讯云 OCR:在腾讯云开通 OCR,填写 SecretId、SecretKey 和地域。 | ||
| - OpenAI-compatible:选择支持视觉输入的供应商,填 `Base URL`、模型名和 API Key。 | ||
| - DeepSeek-OCR:这是 MIT 开源模型,不是官方免费云 API;自建 vLLM/OpenAI-compatible 服务后,把 endpoint 设为类似 `http://127.0.0.1:8000/v1`,模型名设为 `deepseek-ai/DeepSeek-OCR`。 | ||
|
|
||
| ## 本地 RapidOCR/PaddleOCR | ||
|
|
||
| 项目内提供了一个轻量本地服务,源码路径为 [local-ocr-server](local-ocr-server)。当前 Windows 构建已把 RapidOCR 运行时打进插件,产物位于 `local-ocr-runtime/win32/`,会随 `dist/` 一起复制到 ZTools 插件目录。安装后可在设置页直接点击 `启动服务` 和 `停止服务`。 | ||
|
|
||
| 本地服务默认常驻后台,不会随插件窗口关闭自动退出;这是为了避免每次识别都重新加载模型。也可以在设置页把 `服务策略` 改为 `退出插件时自动停止`。需要手动关闭时,点击 `停止服务`。 | ||
|
|
||
| 如果插件已经包含内置 RapidOCR 运行时,用户电脑不需要 Python;设置页会显示 `运行时已内置`。没有内置运行时时,设置页会提示先安装 Python 3,并提供 `一键安装` 作为开发/备用路径。没有 Python 时仍可使用 OCR.Space、百度 OCR、腾讯云 OCR、ZTools AI 或 OpenAI-compatible 视觉接口。 | ||
|
|
||
| ```powershell | ||
| cd local-ocr-server | ||
| python -m venv .venv | ||
| .\.venv\Scripts\Activate.ps1 | ||
| pip install -r requirements.txt | ||
| python server.py | ||
| ``` | ||
|
|
||
| 默认 Endpoint 是 `http://127.0.0.1:8765/ocr`。 | ||
|
|
||
| ## 构建与发布 | ||
|
|
||
| ### 1. 构建插件 | ||
|
|
||
| ```bash | ||
| npm run build | ||
| ``` | ||
|
|
||
| Windows 下如需同时打包内置 RapidOCR 运行时: | ||
|
|
||
| ```powershell | ||
| npm run build:with-runtime | ||
| ``` | ||
|
|
||
| 当前 Windows runtime 约 265 MB,包含 `rapidocr-server.exe`、ONNX Runtime 和 PP-OCRv4 检测/分类/识别模型。首次启动不再需要下载模型。 | ||
|
|
||
| ### 2. 测试构建产物 | ||
|
|
||
| 将 `dist/` 目录中的所有文件复制到 ZTools 插件目录进行测试。 | ||
|
|
||
| ## 相关资源 | ||
|
|
||
| - [ZTools 官方仓库](https://github.com/ZToolsCenter/ZTools) | ||
| - [ZTools API 类型包](https://www.npmjs.com/package/@ztools-center/ztools-api-types) | ||
| - [Vue 3 文档](https://vuejs.org/) | ||
| - [Vite 文档](https://vitejs.dev/) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| <!doctype html> | ||
| <html lang="en"> | ||
| <head> | ||
| <meta charset="UTF-8" /> | ||
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /> | ||
| </head> | ||
| <body> | ||
| <div id="app"></div> | ||
| <script type="module" src="/src/main.ts"></script> | ||
| </body> | ||
| </html> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| # 本地 OCR 服务 | ||
|
|
||
| 这个服务给插件的 `本地 RapidOCR/PaddleOCR` 引擎使用,默认监听 `http://127.0.0.1:8765/ocr`。 | ||
|
|
||
| ## 安装 | ||
|
|
||
| ```powershell | ||
| cd local-ocr-server | ||
| python -m venv .venv | ||
| .\.venv\Scripts\Activate.ps1 | ||
| pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| ## 启动 | ||
|
|
||
| ```powershell | ||
| python server.py | ||
| ``` | ||
|
|
||
| 启动后在插件设置里选择 `本地 RapidOCR/PaddleOCR`,Endpoint 保持 `http://127.0.0.1:8765/ocr`。插件内一键启动的服务默认常驻后台,也可在设置页改为退出插件时自动停止。 | ||
|
|
||
| 如果插件包含 `local-ocr-runtime` 内置运行时,则不需要 Python,设置页会显示运行时已内置。没有内置运行时时,插件设置页会提示先安装 Python;不想安装 Python 时,可改用 OCR.Space、百度、腾讯等云端 OCR。 | ||
|
|
||
| ## 接口 | ||
|
|
||
| ```http | ||
| POST /ocr | ||
| Content-Type: application/json | ||
|
|
||
| { | ||
| "image": "data:image/png;base64,...", | ||
| "language": "chi_sim+eng" | ||
| } | ||
| ``` | ||
|
|
||
| 返回: | ||
|
|
||
| ```json | ||
| { | ||
| "ok": true, | ||
| "text": "识别结果" | ||
| } | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| rapidocr>=3.8.1 | ||
| onnxruntime>=1.18.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import base64 | ||
| import json | ||
| import re | ||
| import tempfile | ||
| from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer | ||
| from pathlib import Path | ||
| from typing import Any | ||
|
|
||
|
|
||
| HOST = "127.0.0.1" | ||
| PORT = 8765 | ||
|
|
||
|
|
||
| def load_engine() -> Any: | ||
| try: | ||
| import onnxruntime # noqa: F401 | ||
| from rapidocr_onnxruntime import RapidOCR | ||
| except ImportError: | ||
| import onnxruntime # noqa: F401 | ||
| from rapidocr import RapidOCR | ||
|
|
||
| return RapidOCR() | ||
|
|
||
|
|
||
| ENGINE = load_engine() | ||
|
|
||
|
|
||
| def decode_image(data_url: str) -> tuple[bytes, str]: | ||
| match = re.match(r"^data:image/([a-zA-Z0-9.+-]+);base64,(.*)$", data_url, re.DOTALL) | ||
| if match: | ||
| suffix = match.group(1).lower().replace("jpeg", "jpg") | ||
| payload = match.group(2) | ||
| else: | ||
| suffix = "png" | ||
| payload = data_url | ||
|
|
||
| return base64.b64decode(payload), suffix | ||
|
|
||
|
|
||
| def extract_text(result: Any) -> str: | ||
| if hasattr(result, "txts"): | ||
| return "\n".join(str(text) for text in result.txts if str(text).strip()) | ||
|
|
||
| data = result[0] if isinstance(result, tuple) else result | ||
| lines: list[str] = [] | ||
|
|
||
| if isinstance(data, list): | ||
| for item in data: | ||
| if isinstance(item, (list, tuple)) and len(item) >= 2: | ||
| lines.append(str(item[1])) | ||
| elif isinstance(item, dict): | ||
| text = item.get("text") or item.get("rec_text") or item.get("txt") | ||
| if text: | ||
| lines.append(str(text)) | ||
|
|
||
| return "\n".join(line.strip() for line in lines if line.strip()) | ||
|
|
||
|
|
||
| def recognize(data_url: str) -> str: | ||
| image_bytes, suffix = decode_image(data_url) | ||
| with tempfile.NamedTemporaryFile(suffix=f".{suffix}", delete=False) as image_file: | ||
| image_file.write(image_bytes) | ||
| image_path = Path(image_file.name) | ||
|
|
||
| try: | ||
| return extract_text(ENGINE(str(image_path))) | ||
| finally: | ||
| image_path.unlink(missing_ok=True) | ||
|
|
||
|
|
||
| class Handler(BaseHTTPRequestHandler): | ||
| def do_GET(self) -> None: | ||
| if self.path == "/health": | ||
| self.write_json({"ok": True, "engine": "rapidocr"}) | ||
| return | ||
| self.write_json({"ok": False, "error": "Not found"}, 404) | ||
|
|
||
| def do_POST(self) -> None: | ||
| if self.path != "/ocr": | ||
| self.write_json({"ok": False, "error": "Not found"}, 404) | ||
| return | ||
|
|
||
| try: | ||
| content_length = int(self.headers.get("Content-Length", "0")) | ||
| body = self.rfile.read(content_length).decode("utf-8") | ||
| payload = json.loads(body or "{}") | ||
| image = payload.get("image") | ||
| if not image: | ||
| raise ValueError("missing image") | ||
|
|
||
| self.write_json({"ok": True, "text": recognize(str(image))}) | ||
| except Exception as error: # noqa: BLE001 | ||
| self.write_json({"ok": False, "error": str(error)}, 500) | ||
|
|
||
| def write_json(self, payload: dict[str, Any], status: int = 200) -> None: | ||
| data = json.dumps(payload, ensure_ascii=False).encode("utf-8") | ||
| self.send_response(status) | ||
| self.send_header("Content-Type", "application/json; charset=utf-8") | ||
| self.send_header("Content-Length", str(len(data))) | ||
| self.end_headers() | ||
| self.wfile.write(data) | ||
|
|
||
| def log_message(self, format: str, *args: Any) -> None: | ||
| print(f"[{self.log_date_time_string()}] {format % args}") | ||
|
|
||
|
|
||
| def main() -> None: | ||
| server = ThreadingHTTPServer((HOST, PORT), Handler) | ||
| print(f"Local OCR server listening on http://{HOST}:{PORT}/ocr") | ||
| server.serve_forever() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
直接使用
int()转换Content-Length可能会因为请求头缺失或格式非法而抛出ValueError。建议增加校验或异常处理以提高服务器的健壮性。