feat(scripts): 新增 notion-i18n 中英双向翻译脚本#3993
Open
LQ458 wants to merge 3 commits into
Open
Conversation
为多语言 NotionNext 博客提供两个语言数据库(如 Leo Qin's Blog 与
LeoQin的博客)之间的自动双向同步,避免手工搬运与翻译。
核心思路
- 以页面所在数据库判定源语言,无需在每篇文章上额外维护 lang 字段
- 通过 paired_with 文本属性记录跨库的孪生页面 UUID(双向链接)
- 通过 source_hash(SHA-256)判定漂移,未变化时跳过,幂等可重入
- translation_locked 复选框可锁定人工已修订的目标页面
数据库 schema 需新增三个属性(中英库均需添加,名称一致)
- paired_with Text 对端孪生页面的 UUID
- translation_locked Checkbox 锁定后不再被脚本覆盖
- source_hash Text 源页面可翻译内容的 SHA-256
CLI(package.json scripts)
- yarn translate <pageId|URL> 单页翻译
- yarn translate:all 批量翻译(支持 --from / --include-drafts /
--include-paired / --dry-run / --force)
- yarn translate:check 列出已发生漂移的页面
- yarn translate:backfill 交互式跨库人工翻译配对
- yarn translate:diagnose 检查 Notion 集成是否能访问到两个目标库
翻译提供方
- DeepSeek(默认,OpenAI 兼容;deepseek-chat ≈ $0.27/M 输入、$1.10/M 输出)
- 智谱 GLM-4 备选;providers/ 下接口很小({text, sourceLang, targetLang,
glossary, hint} → {text, inputTokens, outputTokens}),可继续扩展
块处理策略
- 翻译:paragraph、heading_1/2/3、bulleted_list_item、numbered_list_item、
quote、callout、toggle、to_do(保留 rich_text 上的 bold/italic/链接)
- 原样:code、equation、image、video、file、embed、divider、bookmark 等
- 仅翻译标签:mermaid / plantuml 代码块(向 LLM 注入语法保留提示)
- 跳过:column_list、column、table、table_row、synced_block(创建接口要求
在 body 中携带子节点,扁平拉取无法包含)
工程细节
- 配置走 Next.js 既有的 .env.local 习惯,零依赖 .env 解析器在 load-env.js
- 单页内块翻译并发可调(TRANSLATOR_CONCURRENCY,默认 8)
- TRANSLATOR_BUDGET_TOKENS_PER_RUN 提供单次运行硬性上限
- Notion API 调用统一附带指数退避(429/5xx/超时),偶发 502 不会中断批量
- category-map.json 提供 select / multi_select 双向映射(开放式留空模板)
测试(__tests__/scripts/translate)
- state.test.js: SHA-256 决定性、漂移检测、块类型分类不交叉
- block-mapper.test.js: 段落翻译、代码块原样保留、mermaid 标签翻译、
column_list 跳过、image 透传、rich_text 重建(共 12 个用例)
依赖
- 仅新增 @notionhq/client(devDependency),不影响 Next.js 生产构建
|
@LQ458 is attempting to deploy a commit to the tangly1024's projects Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
为多语言 NotionNext 博客提供两个语言数据库(如英文库与中文库)之间的自动双向同步,避免手工搬运与翻译。
已知问题
NOTION_PAGE_ID=id1,en:id2形式),但两个语言数据库之间的内容同步完全靠人工搬运与翻译,篇幅一多维护成本极高。解决方案
scripts/translate/下新增 CLI 翻译模块notion-i18n-translator,以页面所在数据库判定源语言,将翻译结果写入对端语言数据库。paired_with(Text) 对端孪生页面的 UUID,双向互链translation_locked(Checkbox) 锁定后不再被脚本覆盖source_hash(Text) 源页面可翻译内容的 SHA-256,用于漂移检测deepseek-chat≈ $0.27/M 输入 / $1.10/M 输出)与智谱 GLM-4,providers/下接口很小可继续扩展。paragraph/heading_*/ 列表项 /quote/callout/toggle/to_do翻译并保留 rich_text 注释;code/equation/image/embed等原样保留;mermaid / plantuml 代码块单独走「仅翻译标签、保留语法」的语法保留提示;column_list/table/synced_block创建接口要求嵌套 children 而扁平 fetch 拿不到,故跳过。.env.local习惯,新增的零依赖load-env.js直接读取项目根目录.env.local/.env,不引入 dotenv。改动收益
@notionhq/client仅在 devDependencies)。scripts/风格一致(直接读process.env.*、单一根目录.env.local、无独立 env 文件),只新增 1 个 devDependency。TRANSLATOR_BUDGET_TOKENS_PER_RUN硬性上限。具体改动
scripts/translate/(新增 13 个文件)index.js— CLI 入口(轻量参数解析)pipeline.js— 主流程:读取 → 比对哈希 → 翻译 → 写入 → 双向链接notion-client.js—@notionhq/client封装,附带指数退避重试block-mapper.js— 按块类型的翻译与字段净化(含 mermaid/plantuml 特例)state.js— SHA-256 与块类型分类集合config.js— 语言↔数据库 解析、分类/标签映射工具backfill.js— 交互式跨库人工翻译配对(token-Jaccard 相似度,含中英文边界切分)diagnose.js— 列出集成可访问的所有数据库,定位权限问题load-env.js— 零依赖.env.local/.env加载器glossary.json— 不翻译术语清单category-map.json—select/multi_select双向映射模板(开放式留空)providers/{deepseek,glm,_http,index}.js— 翻译提供方实现README.md— 完整中文使用文档package.json— 新增translate/translate:all/translate:check/translate:backfill/translate:diagnose五个 npm script,新增@notionhq/client一个 devDependency.env.example— 末尾新增# notion-i18n-translator(CLI 翻译脚本)段落,列出所有翻译相关环境变量(NOTION_TOKEN/NOTION_DB_EN_ID/NOTION_DB_ZH_ID/DEEPSEEK_API_KEY等),全部注释默认禁用,遵循上游既有的「单一.env.example+ 按用途分段」习惯__tests__/scripts/translate/(新增 12 个单元测试)state.test.js:SHA-256 决定性、漂移检测、块类型分类不交叉、column_list 跳过、mermaid 标识(5 用例)block-mapper.test.js:段落翻译、代码块原样保留、mermaid 标签翻译、column_list 跳过、image 透传、rich_text 重建、字符串拼接(7 用例)测试确认
__tests__/scripts/translate/,无网络/API 依赖):yarn jest __tests__/scripts/translateyarn build通过(脚本不进入 Next.js 构建图,对生产构建无影响)node scripts/translate/index.js --help正常输出全部子命令与参数