From 1b3fa2b13b45b8dc85e09740bc3aa8f830176fc5 Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Wed, 18 Mar 2026 16:32:31 +0800 Subject: [PATCH 01/10] docs: add AGENTS.md project knowledge base for AI agents Add comprehensive project documentation tailored for AI agents and developers. Includes project overview, directory structure, code map, and development conventions. - Define core stack (TypeScript, WebSocket, Jest, pnpm) - Map tasks to specific file locations - List code symbols and their roles - Document TypeScript and testing configurations - Specify anti-patterns and unique styles (SSML, WebSocket) - Include SSML reference documentation for speech synthesis This ensures consistent understanding of the codebase architecture and constraints during automated development or refactoring. --- AGENTS.md | 138 ++++++++++++++++++++ docs/ssml-pronunciation.md | 199 +++++++++++++++++++++++++++++ docs/ssml-structure.md | 252 +++++++++++++++++++++++++++++++++++++ docs/ssml-voice.md | 226 +++++++++++++++++++++++++++++++++ 4 files changed, 815 insertions(+) create mode 100644 AGENTS.md create mode 100644 docs/ssml-pronunciation.md create mode 100644 docs/ssml-structure.md create mode 100644 docs/ssml-voice.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..4538d17 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,138 @@ +# PROJECT KNOWLEDGE BASE + +**Generated:** 2026-03-18 +**Commit:** main branch +**Branch:** main + +## OVERVIEW + +Microsoft Edge TTS 文本转语音库 - 使用 Azure Speech Service (Microsoft Edge Read Aloud API) 的 Node.js/TypeScript 模块。支持语音合成、SSML、多种音频格式输出。 + +**核心栈**: TypeScript, WebSocket, Jest (测试), pnpm (包管理器) + +## STRUCTURE + +``` +./ +├── src/ # 全部源代码 +│ ├── index.ts # 主入口点(barrel exports) +│ ├── MsEdgeTTS.ts # 核心 TTS 类(457 行) +│ ├── Output.ts # 音频输出格式枚举 +│ ├── Prosody.ts # 语速/音调/音量选项 +│ ├── utils.ts # 工具函数(路径拼接) +│ └── MsEdgeTTS.spec.ts # 单元测试 +├── .github/workflows/ +│ └── deploy_docs.yml # CI/CD:文档部署到 gh-pages +├── package.json # 依赖 + Jest 配置 +├── tsconfig.json # TypeScript 编译配置 +└── README.md # API 文档 +``` + +## WHERE TO LOOK + +| 任务 | 位置 | 说明 | +|------|------|------| +| 添加新功能 | `src/` | 直接在同级创建 `.ts` 文件 | +| 修改核心逻辑 | `src/MsEdgeTTS.ts` | WebSocket 通信、SSML 处理 | +| 添加音频格式 | `src/Output.ts` | `OUTPUT_FORMAT` 枚举 | +| 修改语音选项 | `src/Prosody.ts` | `ProsodyOptions` 类 | +| 添加测试 | `src/*.spec.ts` | 测试与源码同目录 | +| 修改 CI/CD | `.github/workflows/` | 仅文档部署流程 | +| 配置 Jest | `package.json` | Jest 配置内联在 package.json | + +## CODE MAP + +| Symbol | Type | Location | Role | +|--------|------|----------|------| +| `MsEdgeTTS` | Class | `src/MsEdgeTTS.ts` | 主类:WebSocket 连接、语音合成 | +| `OUTPUT_FORMAT` | Enum | `src/Output.ts` | 支持的音频输出格式(MP3, WEBM) | +| `OUTPUT_EXTENSIONS` | Const | `src/Output.ts` | 格式到文件扩展名映射 | +| `ProsodyOptions` | Class | `src/Prosody.ts` | 语速/音调/音量配置选项 | +| `RATE` | Enum | `src/Prosody.ts` | 语速预设(x-slow 到 x-fast) | +| `PITCH` | Enum | `src/Prosody.ts` | 音调预设(x-low 到 x-high) | +| `VOLUME` | Enum | `src/Prosody.ts` | 音量预设(silent 到 x-LOUD) | +| `Voice` | Type | `src/MsEdgeTTS.ts` | 语音元数据结构 | +| `MetadataOptions` | Class | `src/MsEdgeTTS.ts` | 边界元数据选项(句子/单词) | +| `joinPath` | Function | `src/utils.ts` | 路径拼接工具 | + +## CONVENTIONS + +**TypeScript 配置**: +- `target`: ESNext +- `module`: CommonJS +- `outDir`: dist/ +- 跳过库检查(skipLibCheck: true) + +**测试约定**: +- 测试文件:`*.spec.ts` 与源码同目录 +- Jest 配置内联在 `package.json` +- 测试超时:15 秒 + +**包管理器**: +- 强制使用 `pnpm`(preinstall 钩子) +- 版本锁定:pnpm-lock.yaml + +## ANTI-PATTERNS (THIS PROJECT) + +- ❌ **不要** 使用 npm/yarn - 项目强制使用 pnpm +- ❌ **不要** 将测试移至独立目录 - 保持 `*.spec.ts` 与源码同级 +- ❌ **不要** 修改 tsconfig 的 module/moduleResolution - 依赖 CommonJS +- ❌ **不要** 在浏览器中使用 - API 需要 Edge User-Agent(仅限服务器端) +- ❌ **不要** 删除 `dist/` 外的文件 - 发布仅包含 dist 目录 + +## UNIQUE STYLES + +**SSML 模板**: +- 默认模板:`` → `` → `` +- 仅支持 `speak`, `voice`, `prosody` 元素 +- 不支持完整 SSML + +**WebSocket 通信**: +- 使用 `isomorphic-ws` 实现浏览器/Node 兼容 +- 自定义 UUID 生成(非 crypto.randomUUID) +- Sec-MS-GEC 哈希认证机制 + +**日志系统**: +- 可选 logger(enableLogger 选项) +- 仅记录连接状态、消息收发 + +## COMMANDS + +```bash +# 安装依赖 +pnpm install + +# 开发(构建 + 运行测试) +pnpm run dev + +# 编译 TypeScript +pnpm run build + +# 运行测试 +pnpm test + +# 测试(监听模式) +pnpm run test:watch + +# 测试(覆盖率) +pnpm run test:cov + +# 发布到 npm +pnpm run publish +``` + +## NOTES + +**关键限制**: +- 2025 年 12 月更新:API 需要 Edge User-Agent,**浏览器中无法使用** +- 仅支持 Promise API,不支持回调 +- 语音列表需要可信客户端 Token(硬编码在源码中) + +**已知问题**: +- package.json 中的 `src/test/test.ts` 和 `src/test/jest-e2e.json` 不存在(遗留配置) +- CI 仅部署文档,不运行测试 + +**发布流程**: +1. `pnpm run build` 编译到 dist/ +2. `pnpm publish --access=public` +3. 文档自动部署到 gh-pages(通过 GitHub Actions) diff --git a/docs/ssml-pronunciation.md b/docs/ssml-pronunciation.md new file mode 100644 index 0000000..548245f --- /dev/null +++ b/docs/ssml-pronunciation.md @@ -0,0 +1,199 @@ +# 语音合成标记语言 (SSML) 的发音 - 语音服务 - Foundry Tools | Microsoft Learn + +可以将语音合成标记语言 (SSML) 与 text to speech 一起使用,以指定语音的发音方式。 例如,可以将 SSML 与音素和自定义词典配合使用来改进发音。 + +## 音素元素 + +`phoneme` 元素用于 SSML 文档中的发音。 始终提供人类可读的语音作为备用方案。 + +| Attribute | 说明 | 必需还是可选 | +| --- | --- | --- | +| `alphabet` | 音标字母表。 支持:`ipa`, `sapi`, `ups`, `x-sampa`。 | 可选 | +| `ph` | 包含用于指定单词发音的音素字符串。 | 必选 | + +### 音素示例 + +使用 IPA 字母表: + +```xml + + + tomato + + +``` + +使用 SAPI 字母表: + +```xml + + + en-US + + +``` + +使用 x-sampa 字母表: + +```xml + + + hello + + +``` + +## 自定义词典 + +使用 `lexicon` 元素引用自定义词典 XML 文件来定义多个实体的发音。 + +| Attribute | 说明 | 必需还是可选 | +| --- | --- | --- | +| `uri` | 自定义词典 XML 文件的 URI(`.xml` 或 `.pls`)。 | 必选 | + +### 自定义词典示例 + +```xml + + + + BTW, we will be there probably at 8:00 tomorrow morning. + + +``` + +### 自定义词典文件格式 + +```xml + + + + BTW + By the way + + + Benigni + bɛˈniːnji + + + 😀 + test emoji + + +``` + +**限制**: +- 文件大小最大 100 KB +- 词典缓存 15 分钟刷新 +- 一个词典仅限一种区域设置 + +## Say-as 元素 + +指示元素文本的内容类型(如数字、日期等)。 + +| Attribute | 说明 | 必需还是可选 | +| --- | --- | --- | +| `interpret-as` | 内容类型。 支持:`characters`, `cardinal`, `ordinal`, `date`, `time`, `currency`, `telephone` 等。 | 必选 | +| `format` | 精确格式(如 `mdy`, `hms12` 等)。 | 可选 | +| `detail` | 朗读细节层次。 | 可选 | + +### Say-as 示例 + +```xml + + +

+ Your 1st request was for 1 room + on 10/19/2010 , + with early arrival at 12:35pm . +

+
+
+``` + +### 支持的 interpret-as 值 + +| interpret-as | 说明 | +| --- | --- | +| `characters`, `spell-out` | 逐字母拼写 | +| `alphanumeric` | 字母数字混合拼写 | +| `cardinal`, `number` | 基数 | +| `ordinal` | 序数 | +| `number_digit` | 单个数字序列 | +| `fraction` | 分数 | +| `date` | 日期 | +| `time` | 时间 | +| `duration` | 持续时间 | +| `telephone` | 电话号码 | +| `currency` | 货币 | +| `unit` | 单位 | +| `address` | 地址 | +| `name` | 人名 | + +## Sub 元素 + +使用 `sub` 元素指定别名文本代替原元素文本。 + +```xml + + + W3C + + +``` + +## 数学表达式的阅读 + +### 方法 1:纯文本数学表达式 + +```xml + + + + x = (-b ± √(b² - 4ac)) / 2a + + +``` + +读出括号: + +```xml + + + + x = (-b ± √(b² - 4ac)) / 2a + + +``` + +### 方法 2:使用 MathML + +```xml + + + + + a + 2 + + + + + b + 2 + + = + + c + 2 + + + + +``` + +输出:"a squared 加 b squared 等于 c squared" diff --git a/docs/ssml-structure.md b/docs/ssml-structure.md new file mode 100644 index 0000000..0bb966a --- /dev/null +++ b/docs/ssml-structure.md @@ -0,0 +1,252 @@ +# 语音合成标记语言 (SSML) 文档结构和事件 - 语音服务 - Foundry Tools | Microsoft Learn + +语音合成标记语言(SSML)连同输入文本一起,决定了文本转语音输出的结构、内容和其他特征。 例如,可以使用 SSML 来定义段落、句子、中断/暂停或静音。 可以使用事件标记(例如书签或视素)来包装文本,这些标记可以稍后由应用程序处理。 + +有关如何在 SSML 文档中构建元素的详细信息,请参阅以下部分。 + +注意 + +除了 Foundry Tools 中的 Azure 语音神经(非高清)语音外,你还可以使用 [Foundry Tools 中的 Azure 语音高清 (HD) 语音](high-definition-voices)和 [Azure OpenAI 神经(高清和非高清)语音](openai-voices)。 HD 语音为更多样化的场景提供更高的质量。 + +某些语音不支持所有 [语音合成标记语言 (SSML)](speech-synthesis-markup-structure) 标记。 这包括神经网络文本转语音高清语音、个性化语音和嵌入语音。 + +- 对于 Azure 高清(HD)语音,请检查此处的 SSML 支持。 +- 对于个人语音,可以在 [此处](personal-voice-how-to-use#supported-and-unsupported-ssml-elements-for-personal-voice) 找到 SSML 支持。 +- 有关嵌入式声音,请在 [此处](embedded-speech#embedded-voices-capabilities) 查看 SSML 支持。 + +## 文档结构 + +SSML 的语音服务实现基于万维网联合会的 [语音合成标记语言版本 1.0](https://www.w3.org/TR/2004/REC-speech-synthesis-20040907/)。 语音服务支持的元素可能与 W3C 标准不同。 + +每个 SSML 文档是使用 SSML 元素(或标记)创建的。 这些元素用于调整语音、风格、音节、韵律、音量等。 + +下面是 SSML 文档的基本结构和语法的子集: + +```xml + + + + + + + + + + + + + + + + +

+ + + + + +
+
+``` + +以下列表描述了每个元素中允许的一些内容示例: + +- `audio`:如果音频文件不存在或无法播放,可在 `audio` 元素的正文中包含可讲述的纯文本或 SSML 标记。 `audio` 元素还包含文本和以下元素:`audio`、`break`、`p`、`s`、`phoneme`、`prosody`、`say-as` 和 `sub`。 +- `bookmark`:此元素不能包含文本或任何其他元素。 +- `break`:此元素不能包含文本或任何其他元素。 +- `emphasis`:此元素可包含文本和以下元素:`audio`、`break`、`emphasis`、`lang`、`phoneme`、`prosody`、`say-as` 和 `sub`。 +- `lang`:此元素可包含除 `mstts:backgroundaudio`、`voice` 和 `speak` 以外的所有其他元素。 +- `lexicon`:此元素不能包含文本或任何其他元素。 +- `math`:此元素只能包含文本和 MathML 元素。 +- `mstts:audioduration`:此元素不能包含文本或任何其他元素。 +- `mstts:backgroundaudio`:此元素不能包含文本或任何其他元素。 +- ``:此元素不能包含文本或任何其他元素。 它指定语音转换的源音频 URL。 +- `mstts:embedding`:此元素可包含文本和以下元素:`audio`、`break`、`emphasis`、`lang`、`phoneme`、`prosody`、`say-as` 和 `sub`。 +- `mstts:express-as`:此元素可包含文本和以下元素:`audio`、`break`、`emphasis`、`lang`、`phoneme`、`prosody`、`say-as` 和 `sub`。 +- `mstts:silence`:此元素不能包含文本或任何其他元素。 +- `mstts:viseme`:此元素不能包含文本或任何其他元素。 +- `p`:此元素可包含文本和以下元素:`audio`、`break`、`phoneme`、`prosody`、`say-as`、`sub`、`mstts:express-as` 和 `s`。 +- `phoneme`:此元素只能包含文本,不能包含任何其他元素。 +- `prosody`:此元素可包含文本和以下元素:`audio`、`break`、`p`、`phoneme`、`prosody`、`say-as`、`sub` 和 `s`。 +- `s`:此元素可包含文本和以下元素:`audio`、`break`、`phoneme`、`prosody`、`say-as`、`mstts:express-as` 和 `sub`。 +- `say-as`:此元素只能包含文本,不能包含任何其他元素。 +- `sub`:此元素只能包含文本,不能包含任何其他元素。 +- `speak`:SSML 文档的根元素。 此元素可包含以下元素:`mstts:backgroundaudio` 和 `voice`。 +- `voice`:此元素可包含除 `mstts:backgroundaudio` 和 `speak` 以外的所有其他元素。 + +语音服务可自动适当处理停顿(例如,在句号后面暂停片刻),或者在以问号结尾的句子中使用正确的音调。 + +## 特殊字符 + +若要在 SSML 元素的值或文本中使用字符 `&`、`<` 和 `>`,则必须使用实体格式。 具体而言,必须使用 `&` 而不是 `&`,使用 `<` 而不是 `<`,使用 `>` 而不是 `>`。 否则,无法正确分析 SSML。 + +例如,请指定 `green & yellow` 而不是 `green & yellow`。 系统会正确分析以下 SSML: + +```xml + + + My favorite colors are green & yellow. + + +``` + +特殊字符(例如引号、撇号和括号)必须经过转义。 有关详细信息,请参阅 [可扩展标记语言 (XML) 1.0:附录 D](https://www.w3.org/TR/xml/#sec-entexpand)。 + +属性值必须用双引号或单引号括起来。 例如,`` 和 `` 是格式正确的有效元素,但无法识别 ``。 + +## Speak 根元素 + +`speak` 元素包含版本、语言和标记词汇定义等信息。 `speak` 元素是所有 SSML 文档必需的根元素。 你必须在 `speak` 元素内指定默认语言,无论是否在其他地方调整该语言,例如在 [`lang`](speech-synthesis-markup-voice#use-voice-elements) 元素中。 + +下面是 `speak` 元素的语法: + +```xml + +``` + +| Attribute | 说明 | 必需还是可选 | +| --- | --- | --- | +| `version` | 指示用于解释文档标记的 SSML 规范的版本。 当前版本为"1.0"。 | 必选 | +| `xml:lang` | 根文档的语言。 该值可以包含语言代码(如 `en` (英语))或本地化信息,如 `en-US` (英语 - 美国)。 | 必选 | +| `xmlns` | 用于定义 SSML 文档的标记词汇(元素类型和属性名称)的文档的 URI。 当前 URI 为 "http://www.w3.org/2001/10/synthesis"。 | 必选 | + +`speak` 元素必须至少包含一个 [语音元素](speech-synthesis-markup-voice#use-voice-elements)。 + +### 演讲示例 + +`speak`介绍了 元素属性支持的值。 + +#### 单一声音示例 + +本示例使用 `en-US-Ava:DragonHDLatestNeural` 语音。 有关更多示例,请参阅 [语音示例](speech-synthesis-markup-voice#voice-examples)。 + +```xml + + + This is the text that is spoken. + + +``` + +## 添加停顿 + +使用 `break` 元素替代单词之间的默认中断或暂停行为。 否则,语音服务会自动插入暂停。 + +下表描述了 `break` 元素的属性用法。 + +| Attribute | 说明 | 必需还是可选 | +| --- | --- | --- | +| `strength` | 暂停的相对持续时间,使用以下值之一:
- x-weak
- weak
- medium(默认值)
- strong
- x-strong | 可选 | +| `time` | 暂停的绝对持续时间,以秒为单位(例如 `2s`)或以毫秒为单位(例如 `500ms`)。 有效值的范围为 0 到 20000 毫秒。 如果设置的值大于支持的最大值,则服务将使用 `20000ms`。 如果设置了 `time` 属性,则会忽略 `strength` 属性。 | 可选 | + +下面是有关该 `strength` 属性的更多详细信息。 + +| Strength | 相对持续时间 | +| --- | --- | +| X-weak | 250 毫秒 | +| Weak | 500 毫秒 | +| 中型 | 750 毫秒 | +| 非常 | 1,000 毫秒 | +| X-strong | 1,250 毫秒 | + +### 停顿示例 + +`break`介绍了 元素属性支持的值。 以下三种方式都会增加 750 毫秒的中断。 + +```xml + + + Welcome to text to speech. + Welcome to text to speech. + Welcome to text to speech. + + +``` + +## 添加静音 + +使用 `mstts:silence` 元素在文本前后,或者在两个相邻句子之间添加暂停。 + +`mstts:silence` 和 `break` 之间的差别之一是,`break` 元素可以插入到文本中的任意位置。 静音仅适用于输入文本的开头或结尾,或者两个相邻句子的分界处。 + +静默设置应用于其所在 `voice` 元素内的所有输入文本。 若要再次重置或更改静音设置,必须使用包含相同或不同语音的新 `voice` 元素。 + +下表描述了 `mstts:silence` 元素的属性用法。 + +| Attribute | 说明 | 必需还是可选 | +| --- | --- | --- | +| `type` | 指定添加静音的位置和方式。 支持以下静音类型:
- `Leading` – 文本开头的附加静音。 设置的值添加到文本开头前的自然静音。
- `Leading-exact` – 文本开头的静音。 该值是绝对静音长度。
- `Tailing` – 文本末尾的附加静音。 设置的值添加到最后一个单词后的自然静音中。
- `Tailing-exact` – 文本末尾的静音。 该值是绝对静音长度。
- `Sentenceboundary` – 相邻句子之间的附加静音。 此类型的实际静音长度包括上一个句子中最后一个单词后的自然静音、为此类型设置的值,以及下一个句子中起始单词之前的自然静音。
- `Sentenceboundary-exact` - 相邻句子之间的静音。 该值是绝对静音长度。
- `Comma-exact` - 半角或全角格式的逗号处的静音。 该值是绝对静音长度。
- `Semicolon-exact` - 半角或全角格式的分号处的静音。 该值是绝对静音长度。
- `Enumerationcomma-exact` - 全角格式的枚举逗号处的静音。 该值是绝对静音长度。

绝对静音类型(带有 `-exact` 后缀)会替换任何其他自然的前导或尾随静音。 绝对静音类型优先于相应的非绝对类型。 例如,如果同时设置了 `Leading` 和 `Leading-exact` 类型,则 `Leading-exact` 类型将生效。 [WordBoundary 事件](how-to-speech-synthesis#subscribe-to-synthesizer-events) 优先于标点符号相关的静音设置,包括 `Comma-exact`、`Semicolon-exact` 或 `Enumerationcomma-exact`。 同时使用 `WordBoundary` 事件和与标点符号相关的静音设置时,与标点符号相关的静音设置不会生效。 | 必选 | +| `value` | 暂停持续时间,以秒为单位(例如 `2s`)或以毫秒为单位(例如 `500ms`)。 有效值的范围为 0 到 20000 毫秒。 如果设置的值大于支持的最大值,则服务将使用 `20000ms`。 | 必选 | + +### MSTTS 静音示例 + +`mstts:silence`介绍了 元素属性支持的值。 + +在本例中,`mstts:silence` 用于在两个句子之间添加 200 毫秒的静音。 + +```xml + + + +If we're home schooling, the best we can do is roll with what each day brings and try to have fun along the way. +A good place to start is by trying out the slew of educational apps that are helping children stay happy and smash their schooling at the same time. + + +``` + +在此示例中,`mstts:silence` 用于在逗号处添加 50 毫秒的静音,在分号处添加 100 毫秒的静音,在枚举逗号处添加 150 毫秒的静音。 + +```xml + + +你好呀,云希、晓晓;你好呀。 + + +``` + +## 指定段落和句子 + +`p` 和 `s` 元素分别用于表示段落和句子。 如果缺少这些元素,则语音服务会自动确定 SSML 文档的结构。 + +### 段落和句子示例 + +以下示例定义了两个段落,其中每个段落包含句子。 在第二个段落中,语音服务会自动确定句子结构,因为它们未在 SSML 文档中定义。 + +```xml + + +

+ Introducing the sentence element. + Used to mark individual sentences. +

+

+ Another simple paragraph. + Sentence structure in this paragraph is not explicitly marked. +

+
+
+``` + +## Bookmark 元素 + +可以使用 SSML 中的 `bookmark` 元素来引用文本或标签序列中的特定位置。 然后使用语音 SDK 并订阅 `BookmarkReached` 事件以获取音频流中每个标记的偏移量。 `bookmark` 元素没有被读出。 有关详细信息,请参阅 [订阅合成器事件](how-to-speech-synthesis#subscribe-to-synthesizer-events)。 + +下表描述了 `bookmark` 元素的属性用法。 + +| Attribute | 说明 | 必需还是可选 | +| --- | --- | --- | +| `mark` | `bookmark` 元素的引用文本。 | 必选 | + +### Bookmark 示例 + +`bookmark`介绍了 元素属性支持的值。 + +你可能想知道以下代码片断中每个与花相关的词的时间偏移量: + +```xml + + + We are selling roses and daisies. + + +``` diff --git a/docs/ssml-voice.md b/docs/ssml-voice.md new file mode 100644 index 0000000..72db6ce --- /dev/null +++ b/docs/ssml-voice.md @@ -0,0 +1,226 @@ +# 语音合成标记语言 (SSML) 的语音和声音 - 语音服务 - Foundry Tools | Microsoft Learn + +可以使用语音合成标记语言 (SSML) 为语音输出指定文本转语音的声音、语言、名称、风格和角色。 还可以在单个 SSML 文档中使用多种语音,并调整重音、语速、音调和音量。 此外,SSML 还能够插入预先录制的音频,例如音效或音符。 + +本文介绍了如何使用 SSML 元素来指定语音和声音。 有关 SSML 语法的详细信息,请参阅 [SSML 文档结构和事件](speech-synthesis-markup-structure)。 + +## 使用语音元素 + +必须在每个 SSML `voice` 元素中至少指定一个 元素。 此元素可确定用于文本转语音的声音。 + +可以在单个 SSML 文档中包含多个 `voice` 元素。 每个 `voice` 元素可以指定不同的语音。 还可以通过不同的设置多次使用同一语音,例如,当 [更改句子之间的静音持续时间](speech-synthesis-markup-structure#add-silence) 时。 + +下表介绍 `voice` 元素的属性的用法: + +| Attribute | 说明 | 必需还是可选 | +| --- | --- | --- | +| `name` | 用于文本转语音输出的声音。 有关支持的标准语音的完整列表,请参阅 [语言支持](language-support?tabs=tts)。 | 必选 | +| `effect` | 音频效果处理器,用于在设备上针对特定方案优化合成语音输出的质量。 对于生产环境中的某些方案,听觉体验可能会因某些设备上的播放失真而降级。 例如,由于扬声器响应、房间混响和背景噪音等环境因素,来自汽车扬声器的合成语音可能会听起来迟钝而低沉。 乘客可能必须调高音量才能听得更清楚。 为了避免在这种情况下进行手动操作,音频效果处理器可以通过补偿播放失真来让声音更清晰。支持以下值:
- `eq_car` - 在汽车、公共汽车和其他封闭车辆中提供高保真语音时,优化听觉体验。
- `eq_telecomhp8k` - 优化电信或电话方案中窄带语音的听觉体验。 应使用 8 kHz 的采样率。 如果采样率不是 8 kHz,则不会优化输出语音的听觉质量。

如果值缺失或无效,则会忽略此属性,而不会应用任何效果。 | 可选 | + +### 语音示例 + +#### 单一声音示例 + +```xml + + + This is the text that is spoken. + + +``` + +#### 多个语音的示例 + +```xml + + + Good morning! + + + Good morning to you too Ava! + + +``` + +#### 音频效果示例 + +```xml + + + This is the text that is spoken. + + +``` + +#### 多讲话人语音示例 + +```xml + + + + Hello, Andrew! How's your day going? + Hey Ava! It's been great, just exploring some AI advancements in communication. + + + +``` + +## 使用说话风格和角色 + +默认情况下,神经网络声音采用中性讲话风格。 可在句子层面调整讲话风格、风格强度和角色。 + +下表介绍 `mstts:express-as` 元素的属性的用法: + +| Attribute | 说明 | 必需还是可选 | +| --- | --- | --- | +| `style` | 特定声音的说话风格。 可以表达快乐、同情和平静等情绪。 | 必选 | +| `styledegree` | 讲话风格的强度。 可接受值的范围为:`0.01` 到 `2`(含)。 默认值为 `1`。 | 可选 | +| `role` | 说话时的角色扮演。 声音可以模仿不同的年龄和性别。 | 可选 | + +### 支持的风格 (Style) + +| Style | 说明 | +| --- | --- | +| `advertisement_upbeat` | 用兴奋和精力充沛的语气推广产品或服务。 | +| `affectionate` | 以较高的音调和音量表达温暖而亲切的语气。 | +| `angry` | 表达生气和厌恶的语气。 | +| `assistant` | 以温暖且轻松的语气说话,用于数字助手。 | +| `calm` | 以沉着冷静的态度说话。 | +| `chat` | 表达轻松随意的语气。 | +| `cheerful` | 表达积极愉快的语气。 | +| `customerservice` | 以友好热情的语气为客户提供支持。 | +| `depressed` | 调低音调和音量来表达忧郁、沮丧的语气。 | +| `documentary-narration` | 用轻松、感兴趣和信息丰富的风格讲述纪录片。 | +| `empathetic` | 表达关心和理解。 | +| `excited` | 表达乐观和充满希望的语气。 | +| `fearful` | 以较高的音调、较高的音量和较快的语速来表达恐惧。 | +| `friendly` | 表达一种愉快、怡人且温暖的语气。 | +| `gentle` | 以较低的音调和音量表达温和、礼貌和愉快的语气。 | +| `hopeful` | 以温暖和向往的语气说话。 | +| `lyrical` | 以优美又带感伤的方式表达情感。 | +| `narration-professional` | 以专业、客观的语气朗读内容。 | +| `narration-relaxed` | 以舒缓且悦耳的语气说话,用于内容朗读。 | +| `newscast` | 以正式专业的语气叙述新闻。 | +| `newscast-casual` | 以通用、随意的语气发布一般新闻。 | +| `newscast-formal` | 以正式、自信和权威的语气发布新闻。 | +| `poetry-reading` | 在读诗时表达出带情感和节奏的语气。 | +| `sad` | 表达悲伤语气。 | +| `serious` | 表达严肃和命令的语气。 | +| `shouting` | 以一种听起来好像语音在远处或在另一个位置说话。 | +| `sports_commentary` | 表达一种既轻松又感兴趣的语气,用于播报体育赛事。 | +| `sports_commentary_excited` | 用快速且充满活力的语气播报体育赛事精彩瞬间。 | +| `whispering` | 以试图发出轻柔、温和声音的柔和语气说话。 | +| `terrified` | 表达一种害怕的语气,语速快且声音颤抖。 | +| `unfriendly` | 表达一种冷淡无情的语气。 | + +### 支持的角色 (Role) + +| 角色 | 说明 | +| --- | --- | +| `Girl` | 声音模仿女孩。 | +| `Boy` | 声音模仿男孩。 | +| `YoungAdultFemale` | 声音模仿年轻的成年女性。 | +| `YoungAdultMale` | 声音模仿年轻的成年男性。 | +| `OlderAdultFemale` | 声音模仿年长的成年女性。 | +| `OlderAdultMale` | 声音模仿年长的成年男性。 | +| `SeniorFemale` | 声音模仿年老女性。 | +| `SeniorMale` | 声音模仿年老男性。 | + +### 风格和程度示例 + +```xml + + + + 快走吧,路上一定要注意安全,早去早回。 + + + +``` + +### 角色示例 + +```xml + + + 女儿看见父亲走了进来,问道: + + "您来的挺快的,怎么过来的?" + + 父亲放下手提包,说: + + "刚打车过来的,路上还挺顺畅。" + + + +``` + +## 调整讲话语言 + +使用 `` 元素调整多语言语音的说话语言。 + +```xml + + + + Wir freuen uns auf die Zusammenarbeit mit Ihnen! + + + +``` + +## 调整韵律 + +使用 `prosody` 元素指定音高、语调、范围、速率和音量的变化。 + +| Attribute | 说明 | +| --- | --- | +| `contour` | 升降曲线表示音高的变化。 | +| `pitch` | 基线音节。 可用值:`x-low`, `low`, `medium`, `high`, `x-high`, 或相对值(如 `+20Hz`, `-2st`)。 | +| `range` | 音节范围。 | +| `rate` | 语速。 可用值:`x-slow`, `slow`, `medium`, `fast`, `x-fast`, 或相对值(如 `+30%`)。 | +| `volume` | 音量。 可用值:`silent`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`, 或相对值(如 `+20`)。 | + +### 韵律示例 + +```xml + + + + Enjoy using text to speech. + + + +``` + +## 添加录制的音频 + +```xml + + + + +``` + +## 添加背景音频 + +```xml + + + + The text provided in this document are spoken over the background audio. + + +``` + +## 语音转换元素 + +```xml + + + + + +``` From fcac7a32ceda36a98e3d26f9040a96c8970e5efd Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Fri, 20 Mar 2026 16:37:13 +0800 Subject: [PATCH 02/10] feat: add example scripts demonstrating SSML and text substitution Add new TypeScript examples to the example directory to demonstrate core library features and API usage patterns. - Add .gitignore to exclude sensitive config and generated audio files - Add dialogue demo showing SSML structure for multi-role conversation - Add text substitution demo showcasing professional term handling These scripts serve as reference implementations for API integration and help users verify functionality with real-world scenarios. --- example/.gitignore | 4 + ...71\350\257\235\346\274\224\347\244\272.ts" | 95 +++++++++++ ...76\345\274\217\350\260\203\347\224\250.ts" | 112 +++++++++++++ ...- \345\207\275\346\225\260\345\274\217.ts" | 112 +++++++++++++ ...16\346\240\274\346\274\224\347\244\272.ts" | 97 +++++++++++ ...47\345\210\266\346\274\224\347\244\272.ts" | 127 +++++++++++++++ ...37\350\203\275\346\274\224\347\244\272.ts" | 150 ++++++++++++++++++ example/README.md | 145 +++++++++++++++++ example/config.example.json | 8 + example/run.sh | 61 +++++++ 10 files changed, 911 insertions(+) create mode 100644 example/.gitignore create mode 100644 "example/00-\347\256\200\345\215\225\345\257\271\350\257\235\346\274\224\347\244\272.ts" create mode 100644 "example/01-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \351\223\276\345\274\217\350\260\203\347\224\250.ts" create mode 100644 "example/02-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \345\207\275\346\225\260\345\274\217.ts" create mode 100644 "example/03-31 \347\247\215\346\203\205\346\204\237\351\243\216\346\240\274\346\274\224\347\244\272.ts" create mode 100644 "example/04-\346\203\205\346\204\237\345\274\272\345\272\246\346\216\247\345\210\266\346\274\224\347\244\272.ts" create mode 100644 "example/05-\346\226\207\346\234\254\346\233\277\346\215\242\345\212\237\350\203\275\346\274\224\347\244\272.ts" create mode 100644 example/README.md create mode 100644 example/config.example.json create mode 100755 example/run.sh diff --git a/example/.gitignore b/example/.gitignore new file mode 100644 index 0000000..ef0eb55 --- /dev/null +++ b/example/.gitignore @@ -0,0 +1,4 @@ +config.json +output/ +*.mp3 +.DS_Store diff --git "a/example/00-\347\256\200\345\215\225\345\257\271\350\257\235\346\274\224\347\244\272.ts" "b/example/00-\347\256\200\345\215\225\345\257\271\350\257\235\346\274\224\347\244\272.ts" new file mode 100644 index 0000000..c686d61 --- /dev/null +++ "b/example/00-\347\256\200\345\215\225\345\257\271\350\257\235\346\274\224\347\244\272.ts" @@ -0,0 +1,95 @@ +import * as fs from "fs"; +import * as path from "path"; + +/** + * 示例 0: 简单对话演示 + * 直接使用给定的 SSML 示例(女儿和父亲对话) + */ +async function main() { + // 输出装饰框 + console.log("╔═══════════════════════════════════════════════╗"); + console.log("║ 示例 0: 简单对话演示 ║"); + console.log("╚═══════════════════════════════════════════════╝"); + console.log(); + + // 读取配置 + const configPath = path.join(__dirname, "config.json"); + if (!fs.existsSync(configPath)) { + console.error("❌ 错误:config.json 不存在"); + console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); + console.error(`📁 示例文件位置:${configPath}`); + process.exit(1); + } + + const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); + + // 给定的 SSML 示例:女儿和父亲对话 + const ssml = ` + + 女儿看见父亲走了进来,问道: + + "您来的挺快的,怎么过来的?" + + 父亲放下手提包,说: + + "刚打车过来的,路上还挺顺畅。" + + +`; + + // 显示完整的 SSML + console.log("使用的 SSML:"); + console.log("┌──────────────────────────────────────────────┐"); + const ssmlLines = ssml.split("\n"); + for (const line of ssmlLines) { + const truncated = line.length > 44 ? line.substring(0, 41) + "..." : line; + console.log(`│ ${truncated.padEnd(44)} │`); + } + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 输出路径 + const outputDir = path.join(__dirname, "output"); + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } + const outputPath = path.join(outputDir, "00-简单对话演示.mp3"); + + // 调用 TTS API + console.log("正在调用 TTS API..."); + + try { + const response = await fetch(config.api_url, { + method: "POST", + headers: { "Content-Type": "application/x-www-form-urlencoded" }, + body: new URLSearchParams({ + user_email: config.user_email, + user_pass: config.user_pass, + ssml: ssml, + kbitrate: config.kbitrate || "audio-16khz-32kbitrate-mono-mp3", + }), + }); + + if (!response.ok) { + throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + } + + // 保存文件 + const buffer = Buffer.from(await response.arrayBuffer()); + fs.writeFileSync(outputPath, buffer); + + // 计算文件大小 + const fileSizeKB = (buffer.length / 1024).toFixed(1); + + console.log("✅ 音频生成成功!"); + console.log(`📁 文件已保存:${outputPath}`); + console.log(`📊 文件大小:${fileSizeKB} KB`); + } catch (error) { + console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + process.exit(1); + } +} + +main(); diff --git "a/example/01-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \351\223\276\345\274\217\350\260\203\347\224\250.ts" "b/example/01-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \351\223\276\345\274\217\350\260\203\347\224\250.ts" new file mode 100644 index 0000000..fcf180b --- /dev/null +++ "b/example/01-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \351\223\276\345\274\217\350\260\203\347\224\250.ts" @@ -0,0 +1,112 @@ +import { DialogueBuilder, buildDialogueSSML, type DialogueTurn } from "../src"; +import * as fs from "fs"; +import * as path from "path"; + +/** + * 示例 1: 多说话人对话 - 链式调用 + * 使用 DialogueBuilder 构建中英混合播客对话 + */ +async function main() { + // 输出装饰框 + console.log("╔═══════════════════════════════════════════════╗"); + console.log("║ 示例 1: 多说话人对话 - 链式调用 ║"); + console.log("╚═══════════════════════════════════════════════╝"); + console.log(); + + // 读取配置 + const configPath = path.join(__dirname, "config.json"); + if (!fs.existsSync(configPath)) { + console.error("❌ 错误:config.json 不存在"); + console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); + console.error(`📁 示例文件位置:${configPath}`); + process.exit(1); + } + + const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); + + // 构建对话:4 个说话人轮次(2 中文 + 2 英文) + const dialogue = new DialogueBuilder() + .addTurn({ + voice: "zh-CN-XiaoxiaoNeural", + text: "大家好!欢迎收听今天的科技播客。", + style: "cheerful", + }) + .addTurn({ + voice: "en-US-AndrewNeural", + text: "Hello everyone! Welcome to today's tech podcast.", + lang: "en-US", + style: "friendly", + }) + .addTurn({ + voice: "zh-CN-YunxiNeural", + text: "今天我们将探讨人工智能的最新发展。", + style: "documentary-narration", + }) + .addTurn({ + voice: "en-US-AriaNeural", + text: "That's right! AI is changing the world faster than ever.", + lang: "en-US", + style: "excited", + }) + .build(); + + console.log(`生成的对话轮次:${dialogue.turns.length} 个`); + console.log(); + + // 生成 SSML + const ssml = buildDialogueSSML(dialogue.turns); + + // SSML 预览 + console.log("SSML 预览:"); + console.log("┌──────────────────────────────────────────────┐"); + const ssmlLines = ssml.split("\n"); + for (const line of ssmlLines) { + const truncated = line.length > 44 ? line.substring(0, 41) + "..." : line; + console.log(`│ ${truncated.padEnd(44)} │`); + } + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 输出路径 + const outputDir = path.join(__dirname, "output"); + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } + const outputPath = path.join(outputDir, "01-播客对话 - 链式调用.mp3"); + + // 调用 TTS API + console.log("正在调用 TTS API..."); + + try { + const response = await fetch(config.api_url, { + method: "POST", + headers: { "Content-Type": "application/x-www-form-urlencoded" }, + body: new URLSearchParams({ + user_email: config.user_email, + user_pass: config.user_pass, + ssml: ssml, + kbitrate: config.kbitrate || "audio-16khz-32kbitrate-mono-mp3", + }), + }); + + if (!response.ok) { + throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + } + + // 保存文件 + const buffer = Buffer.from(await response.arrayBuffer()); + fs.writeFileSync(outputPath, buffer); + + // 计算文件大小 + const fileSizeKB = (buffer.length / 1024).toFixed(1); + + console.log("✅ 音频生成成功!"); + console.log(`📁 文件已保存:${outputPath}`); + console.log(`📊 文件大小:${fileSizeKB} KB`); + } catch (error) { + console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + process.exit(1); + } +} + +main(); diff --git "a/example/02-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \345\207\275\346\225\260\345\274\217.ts" "b/example/02-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \345\207\275\346\225\260\345\274\217.ts" new file mode 100644 index 0000000..3c86c0f --- /dev/null +++ "b/example/02-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \345\207\275\346\225\260\345\274\217.ts" @@ -0,0 +1,112 @@ +import { buildDialogueSSML, type DialogueTurn } from "../src"; +import * as fs from "fs"; +import * as path from "path"; + +/** + * 示例 2: 多说话人对话 - 函数式 + * 使用 buildDialogueSSML 函数构建中英混合客服对话 + */ +async function main() { + // 输出装饰框 + console.log("╔═══════════════════════════════════════════════╗"); + console.log("║ 示例 2: 多说话人对话 - 函数式 ║"); + console.log("╚═══════════════════════════════════════════════╝"); + console.log(); + + // 读取配置 + const configPath = path.join(__dirname, "config.json"); + if (!fs.existsSync(configPath)) { + console.error("❌ 错误:config.json 不存在"); + console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); + console.error(`📁 示例文件位置:${configPath}`); + process.exit(1); + } + + const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); + + // 构建对话:4 个说话人轮次(2 中文客服 + 2 英文客服) + const turns: DialogueTurn[] = [ + { + voice: "zh-CN-XiaoxiaoNeural", + text: "您好!欢迎联系客户服务中心。", + style: "customerservice", + }, + { + voice: "en-US-JennyNeural", + text: "Hello! Welcome to customer service.", + lang: "en-US", + style: "friendly", + }, + { + voice: "zh-CN-YunjianNeural", + text: "请问有什么可以帮助您的?", + style: "assistant", + }, + { + voice: "en-US-GuyNeural", + text: "How can I help you today?", + lang: "en-US", + style: "assistant", + }, + ]; + + console.log(`构建的对话轮次:${turns.length} 个`); + console.log(); + + // 生成 SSML + const ssml = buildDialogueSSML(turns); + + // SSML 预览 + console.log("SSML 预览:"); + console.log("┌──────────────────────────────────────────────┐"); + const ssmlLines = ssml.split("\n"); + for (const line of ssmlLines) { + const truncated = line.length > 44 ? line.substring(0, 41) + "..." : line; + console.log(`│ ${truncated.padEnd(44)} │`); + } + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 输出路径 + const outputDir = path.join(__dirname, "output"); + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } + const outputPath = path.join(outputDir, "02-客服对话 - 函数式.mp3"); + + // 调用 TTS API + console.log("正在调用 TTS API..."); + + try { + const response = await fetch(config.api_url, { + method: "POST", + headers: { "Content-Type": "application/x-www-form-urlencoded" }, + body: new URLSearchParams({ + user_email: config.user_email, + user_pass: config.user_pass, + ssml: ssml, + kbitrate: config.kbitrate || "audio-16khz-32kbitrate-mono-mp3", + }), + }); + + if (!response.ok) { + throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + } + + // 保存文件 + const buffer = Buffer.from(await response.arrayBuffer()); + fs.writeFileSync(outputPath, buffer); + + // 计算文件大小 + const fileSizeKB = (buffer.length / 1024).toFixed(1); + + console.log("✅ 音频生成成功!"); + console.log(`📁 文件已保存:${outputPath}`); + console.log(`📊 文件大小:${fileSizeKB} KB`); + } catch (error) { + console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + process.exit(1); + } +} + +main(); diff --git "a/example/03-31 \347\247\215\346\203\205\346\204\237\351\243\216\346\240\274\346\274\224\347\244\272.ts" "b/example/03-31 \347\247\215\346\203\205\346\204\237\351\243\216\346\240\274\346\274\224\347\244\272.ts" new file mode 100644 index 0000000..146c363 --- /dev/null +++ "b/example/03-31 \347\247\215\346\203\205\346\204\237\351\243\216\346\240\274\346\274\224\347\244\272.ts" @@ -0,0 +1,97 @@ +import { MsEdgeTTS, OUTPUT_FORMAT, buildDialogueSSML, type DialogueTurn } from "../src"; +import * as fs from "fs"; +import * as path from "path"; + +const allStyles = [ + "advertisement_upbeat", "affectionate", "angry", "assistant", + "calm", "chat", "cheerful", "customerservice", + "depressed", "documentary-narration", "empathetic", "excited", + "fearful", "friendly", "gentle", "hopeful", + "lyrical", "narration-professional", "narration-relaxed", "newscast", + "newscast-casual", "newscast-formal", "poetry-reading", "sad", + "serious", "shouting", "sports_commentary", "sports_commentary_excited", + "terrified", "unfriendly", "whispering" +]; + +function printStyleTable(styles: string[]): void { + console.log("\n所有情感风格列表:"); + console.log("┌────┬─────────────────────────────────────┐"); + console.log("│ 序号 │ 风格名称 │"); + console.log("├────┼─────────────────────────────────────┤"); + + styles.forEach((style, index) => { + const num = String(index + 1).padStart(2, ' '); + const paddedStyle = style.padEnd(35, ' '); + console.log(`│ ${num} │ ${paddedStyle}│`); + }); + + console.log("└────┴─────────────────────────────────────┘"); +} + +async function main(): Promise { + console.log("╔═══════════════════════════════════════════════╗"); + console.log("║ 示例 3: 31 种情感风格演示 ║"); + console.log("╚═══════════════════════════════════════════════╝"); + + printStyleTable(allStyles); + + const configPath = path.join(__dirname, "config.json"); + let email: string; + let password: string; + + try { + const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); + email = config.email; + password = config.password; + } catch (error) { + console.error("错误:无法读取 config.json,请确保已创建配置文件"); + console.error("提示:复制 config.example.json 为 config.json 并填写邮箱密码"); + process.exit(1); + } + + const tts = new MsEdgeTTS(); + const voiceName = "zh-CN-XiaoxiaoNeural"; + const outputFormat = OUTPUT_FORMAT.AUDIO_24KHZ_48KBITRATE_MONO_MP3; + + console.log(`\n使用语音:${voiceName}`); + console.log(`输出格式:MP3`); + + const turns: DialogueTurn[] = allStyles.map((style, index) => ({ + voice: voiceName, + text: `这是第${index + 1}种情感风格,${style}。`, + style: style + })); + + const ssml = buildDialogueSSML(turns); + console.log(`\n生成的 SSML 长度:${ssml.length} 字符`); + + try { + await tts.setMetadata(voiceName, outputFormat); + + const outputDir = path.join(__dirname, "output"); + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } + + const outputPath = path.join(outputDir, "03-31 种情感风格演示.mp3"); + + console.log(`\n正在生成音频...`); + const { audioFilePath } = await tts.toFile(outputDir, ssml); + + fs.renameSync(audioFilePath, outputPath); + + console.log(`\n✅ 音频已保存到:${outputPath}`); + console.log(`✅ 共生成 ${allStyles.length} 种情感风格演示`); + + } catch (error) { + console.error("\n❌ 生成音频时出错:"); + if (error instanceof Error) { + console.error(error.message); + } else { + console.error(error); + } + process.exit(1); + } +} + +main().catch(console.error); diff --git "a/example/04-\346\203\205\346\204\237\345\274\272\345\272\246\346\216\247\345\210\266\346\274\224\347\244\272.ts" "b/example/04-\346\203\205\346\204\237\345\274\272\345\272\246\346\216\247\345\210\266\346\274\224\347\244\272.ts" new file mode 100644 index 0000000..d0f08bf --- /dev/null +++ "b/example/04-\346\203\205\346\204\237\345\274\272\345\272\246\346\216\247\345\210\266\346\274\224\347\244\272.ts" @@ -0,0 +1,127 @@ +import { buildDialogueSSML, type DialogueTurn } from "../src"; +import * as fs from "fs"; +import * as path from "path"; + +/** + * 示例 4: 情感强度控制演示 + * 演示 styleDegree 参数(范围 0.01-2.0)对情感表达的影响 + */ +async function main() { + // 输出装饰框 + console.log("╔═══════════════════════════════════════════════╗"); + console.log("║ 示例 4: 情感强度控制演示 ║"); + console.log("╚═══════════════════════════════════════════════╝"); + console.log(); + + // 读取配置 + const configPath = path.join(__dirname, "config.json"); + if (!fs.existsSync(configPath)) { + console.error("❌ 错误:config.json 不存在"); + console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); + console.error(`📁 示例文件位置:${configPath}`); + process.exit(1); + } + + const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); + + // 输出 styleDegree 说明 + console.log("📖 styleDegree 参数说明:"); + console.log("┌──────────────────────────────────────────────┐"); + console.log("│ 范围:0.01 - 2.0 │"); + console.log("│ 0.5: 较弱的情感表达 │"); + console.log("│ 1.0: 正常情感表达(默认) │"); + console.log("│ 2.0: 最强情感表达 │"); + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 构建对话:同一句话,三种不同强度 + const turns: DialogueTurn[] = [ + { + voice: "zh-CN-XiaomoNeural", + text: "这很正常", + style: "sad", + styleDegree: 0.5, // 较弱 + }, + { + voice: "zh-CN-XiaomoNeural", + text: "这真的很令人难过", + style: "sad", + styleDegree: 1.0, // 正常 + }, + { + voice: "zh-CN-XiaomoNeural", + text: "这简直太让人心碎了!", + style: "sad", + styleDegree: 2.0, // 最强 + }, + ]; + + // 显示对话内容 + console.log("📝 对话内容:"); + console.log("┌──────────────────────────────────────────────┐"); + turns.forEach((turn, index) => { + const intensity = turn.styleDegree === 0.5 ? "较弱" : turn.styleDegree === 1.0 ? "正常" : "最强"; + console.log(`│ ${index + 1}. [强度${intensity}] ${turn.text.padEnd(25)} │`); + }); + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 生成 SSML + const ssml = buildDialogueSSML(turns); + + // SSML 预览 + console.log("📄 SSML 预览:"); + console.log("┌──────────────────────────────────────────────┐"); + const ssmlLines = ssml.split("\n"); + for (const line of ssmlLines) { + const truncated = line.length > 44 ? line.substring(0, 41) + "..." : line; + console.log(`│ ${truncated.padEnd(44)} │`); + } + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 输出路径 + const outputDir = path.join(__dirname, "output"); + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } + const outputPath = path.join(outputDir, "04-情感强度控制演示.mp3"); + + // 调用 TTS API + console.log("🎙️ 正在调用 TTS API..."); + + try { + const response = await fetch(config.api_url, { + method: "POST", + headers: { "Content-Type": "application/x-www-form-urlencoded" }, + body: new URLSearchParams({ + user_email: config.user_email, + user_pass: config.user_pass, + ssml: ssml, + kbitrate: config.kbitrate || "audio-16khz-32kbitrate-mono-mp3", + }), + }); + + if (!response.ok) { + throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + } + + // 保存文件 + const buffer = Buffer.from(await response.arrayBuffer()); + fs.writeFileSync(outputPath, buffer); + + // 计算文件大小 + const fileSizeKB = (buffer.length / 1024).toFixed(1); + + console.log("✅ 音频生成成功!"); + console.log(`📁 文件已保存:${outputPath}`); + console.log(`📊 文件大小:${fileSizeKB} KB`); + console.log(); + console.log("💡 提示:播放音频对比三种情感强度的差异"); + } catch (error) { + console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + process.exit(1); + } +} + +main(); diff --git "a/example/05-\346\226\207\346\234\254\346\233\277\346\215\242\345\212\237\350\203\275\346\274\224\347\244\272.ts" "b/example/05-\346\226\207\346\234\254\346\233\277\346\215\242\345\212\237\350\203\275\346\274\224\347\244\272.ts" new file mode 100644 index 0000000..358f54c --- /dev/null +++ "b/example/05-\346\226\207\346\234\254\346\233\277\346\215\242\345\212\237\350\203\275\346\274\224\347\244\272.ts" @@ -0,0 +1,150 @@ +import { buildDialogueSSML, type DialogueTurn } from "../src"; +import * as fs from "fs"; +import * as path from "path"; + +/** + * 示例 5: 文本替换功能演示 + * 演示 substitutions 参数,展示专业术语替换(W3C, HTTP, CEO 等) + */ +async function main() { + // 输出装饰框 + console.log("╔═══════════════════════════════════════════════╗"); + console.log("║ 示例 5: 文本替换功能演示 ║"); + console.log("╚═══════════════════════════════════════════════╝"); + console.log(); + + // 读取配置 + const configPath = path.join(__dirname, "config.json"); + if (!fs.existsSync(configPath)) { + console.error("❌ 错误:config.json 不存在"); + console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); + console.error(`📁 示例文件位置:${configPath}`); + process.exit(1); + } + + const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); + + // 输出 substitutions 说明 + console.log("📖 substitutions 参数说明:"); + console.log("┌──────────────────────────────────────────────┐"); + console.log("│ 格式:{ text: string, alias: string } │"); + console.log("│ text: 原文中的词 │"); + console.log("│ alias: 朗读时使用的别名 │"); + console.log("│ SSML 生成 text 标签 │"); + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 构建对话:演示专业术语替换 + const turns: DialogueTurn[] = [ + { + voice: "zh-CN-XiaoxiaoNeural", + text: "W3C 制定了 Web 标准,API 基于 HTTP 协议", + substitutions: [ + { text: "W3C", alias: "万维网联盟" }, + { text: "Web", alias: "万维网" }, + { text: "HTTP", alias: "超文本传输协议" }, + ], + style: "narration-professional", + }, + { + voice: "en-US-AndrewNeural", + text: "The CEO said: innovation drives success", + substitutions: [ + { text: "CEO", alias: "Chief Executive Officer" }, + ], + style: "newscast-formal", + lang: "en-US", + }, + ]; + + // 显示替换前后的对比 + console.log("📝 替换前后对比:"); + console.log("┌──────────────────────────────────────────────┐"); + console.log("│ 【中文部分】 │"); + console.log("│ 原文:W3C 制定了 Web 标准,API 基于 HTTP 协议 │"); + console.log("│ 朗读:万维网联盟制定了万维网标准,API 基于超文本 │"); + console.log("│ 传输协议 │"); + console.log("├──────────────────────────────────────────────┤"); + console.log("│ 【英文部分】 │"); + console.log("│ 原文:The CEO said: innovation drives success │"); + console.log("│ 朗读:The Chief Executive Officer said: │"); + console.log("│ innovation drives success │"); + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 显示替换规则列表 + console.log("📋 替换规则列表:"); + console.log("┌──────────────────────────────────────────────┐"); + console.log("│ 中文部分替换规则: │"); + turns[0].substitutions?.forEach((sub) => { + const line = `│ "${sub.text}" → "${sub.alias}"`.padEnd(47) + "│"; + console.log(line); + }); + console.log("├──────────────────────────────────────────────┤"); + console.log("│ 英文部分替换规则: │"); + turns[1].substitutions?.forEach((sub) => { + const line = `│ "${sub.text}" → "${sub.alias}"`.padEnd(47) + "│"; + console.log(line); + }); + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 生成 SSML + const ssml = buildDialogueSSML(turns); + + // SSML 预览 + console.log("📄 SSML 预览:"); + console.log("┌──────────────────────────────────────────────┐"); + const ssmlLines = ssml.split("\n"); + for (const line of ssmlLines) { + const truncated = line.length > 44 ? line.substring(0, 41) + "..." : line; + console.log(`│ ${truncated.padEnd(44)} │`); + } + console.log("└──────────────────────────────────────────────┘"); + console.log(); + + // 输出路径 + const outputDir = path.join(__dirname, "output"); + if (!fs.existsSync(outputDir)) { + fs.mkdirSync(outputDir, { recursive: true }); + } + const outputPath = path.join(outputDir, "05-文本替换功能演示.mp3"); + + // 调用 TTS API + console.log("🎙️ 正在调用 TTS API..."); + + try { + const response = await fetch(config.api_url, { + method: "POST", + headers: { "Content-Type": "application/x-www-form-urlencoded" }, + body: new URLSearchParams({ + user_email: config.user_email, + user_pass: config.user_pass, + ssml: ssml, + kbitrate: config.kbitrate || "audio-16khz-32kbitrate-mono-mp3", + }), + }); + + if (!response.ok) { + throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + } + + // 保存文件 + const buffer = Buffer.from(await response.arrayBuffer()); + fs.writeFileSync(outputPath, buffer); + + // 计算文件大小 + const fileSizeKB = (buffer.length / 1024).toFixed(1); + + console.log("✅ 音频生成成功!"); + console.log(`📁 文件已保存:${outputPath}`); + console.log(`📊 文件大小:${fileSizeKB} KB`); + console.log(); + console.log("💡 提示:播放音频对比替换前后的朗读效果"); + } catch (error) { + console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + process.exit(1); + } +} + +main(); diff --git a/example/README.md b/example/README.md new file mode 100644 index 0000000..bcd03da --- /dev/null +++ b/example/README.md @@ -0,0 +1,145 @@ +# TTS Pro API 示例代码 + +## 快速开始 + +### 1. 配置账户信息 + +复制配置模板并填写你的邮箱和密码: + +```bash +cp config.example.json config.json +``` + +编辑 `config.json`: +```json +{ + "user_email": "your-email@example.com", + "user_pass": "your-password", + "api_url": "https://ttspro.cn/getSpeek.php", + "kbitrate": "audio-16khz-32kbitrate-mono-mp3", + "output_format": "binary" +} +``` + +### 2. 编译项目 + +```bash +pnpm run build +``` + +### 3. 运行示例 + +```bash +# 示例 1: 多说话人对话 - 链式调用 +node example/01-多说话人对话 - 链式调用.ts + +# 示例 2: 多说话人对话 - 函数式 +node example/02-多说话人对话 - 函数式.ts + +# 示例 3: 31 种情感风格演示 +node example/03-31 种情感风格演示.ts + +# 示例 4: 情感强度控制演示 +node example/04-情感强度控制演示.ts + +# 示例 5: 文本替换功能演示 +node example/05-文本替换功能演示.ts +``` + +## 示例说明 + +### 示例 1: 多说话人对话 - 链式调用 + +使用 `DialogueBuilder` 类以链式调用方式构建对话。 + +**特点**: +- 链式调用语法 +- 中英混合播客场景 +- 4 个说话人轮次 + +**输出**: `example/output/01-播客对话 - 链式调用.mp3` + +### 示例 2: 多说话人对话 - 函数式 + +使用 `buildDialogueSSML()` 函数直接构建对话。 + +**特点**: +- 函数式语法 +- 多语言客服对话 +- 4 个对话轮次 + +**输出**: `example/output/02-客服对话 - 函数式.mp3` + +### 示例 3: 31 种情感风格演示 + +遍历所有 Microsoft Azure 支持的 31 种情感风格。 + +**特点**: +- 完整的 31 种风格列表 +- 每种风格一句示例 +- 表格形式展示 + +**输出**: `example/output/03-31 种情感风格演示.mp3` + +### 示例 4: 情感强度控制演示 + +演示 `styleDegree` 参数(0.01-2.0 范围)。 + +**特点**: +- 0.5/1.0/2.0 三种强度对比 +- 使用 `sad` 情感 +- 同一语音不同强度 + +**输出**: `example/output/04-情感强度控制演示.mp3` + +### 示例 5: 文本替换功能演示 + +演示 `substitutions` 参数替换专业术语。 + +**特点**: +- W3C → 万维网联盟 +- HTTP → 超文本传输协议 +- CEO → Chief Executive Officer + +**输出**: `example/output/05-文本替换功能演示.mp3` + +## API 参数说明 + +| 参数名 | 必填 | 说明 | 默认值 | +|--------|------|------|--------| +| `user_email` | ✅ | 用户邮箱 | - | +| `user_pass` | ✅ | 用户密码 | - | +| `type` | ❌ | `getSpeek`/`getBig`/`setBig` | `getSpeek` | +| `ssml` | ✅ | SSML 内容 | - | +| `kbitrate` | ❌ | 音频质量 | `audio-16khz-32kbitrate-mono-mp3` | +| `output_format` | ❌ | 返回类型:`二进制`/`url` | `二进制` | + +## 输出目录 + +所有生成的音频文件保存在: +``` +example/output/ +├── 01-播客对话 - 链式调用.mp3 +├── 02-客服对话 - 函数式.mp3 +├── 03-31 种情感风格演示.mp3 +├── 04-情感强度控制演示.mp3 +└── 05-文本替换功能演示.mp3 +``` + +## 注意事项 + +1. **账户安全**: `config.json` 已被 `.gitignore` 忽略,不会提交到 Git +2. **网络连接**: 运行示例需要网络连接以调用 API +3. **编译要求**: 运行前必须先执行 `pnpm run build` +4. **Node 版本**: 需要 Node.js 18+(支持 `fetch` API) + +## 常见问题 + +### Q: 提示 "config.json 不存在" +A: 请复制 `config.example.json` 为 `config.json` 并填写邮箱和密码 + +### Q: 音频生成失败 +A: 检查网络连接,确认邮箱和密码正确 + +### Q: 如何修改音频质量? +A: 编辑 `config.json` 中的 `kbitrate` 字段 diff --git a/example/config.example.json b/example/config.example.json new file mode 100644 index 0000000..1631e13 --- /dev/null +++ b/example/config.example.json @@ -0,0 +1,8 @@ +{ + "// 注意": "请复制此文件为 config.json 并填写您的邮箱和密码", + "user_email": "your-email@example.com", + "user_pass": "your-password", + "api_url": "https://ttspro.cn/getSpeek.php", + "kbitrate": "audio-16khz-32kbitrate-mono-mp3", + "output_format": "binary" +} diff --git a/example/run.sh b/example/run.sh new file mode 100755 index 0000000..fec09da --- /dev/null +++ b/example/run.sh @@ -0,0 +1,61 @@ +#!/bin/bash +# 示例运行脚本 +# 解决 ts-node 无法正确处理中文文件名的问题 + +# 检查配置文件 +if [ ! -f "config.json" ]; then + echo "❌ 错误:config.json 不存在" + echo "📝 请复制 config.example.json 为 config.json 并填写邮箱和密码" + exit 1 +fi + +# 编译项目 +echo "🔨 正在编译项目..." +pnpm run build + +# 复制 config.json 到 dist/example +echo "📋 复制配置文件到输出目录..." +cp config.json ../dist/example/ + +# 切换到 dist/example 目录运行示例 +cd ../dist/example + + # 运行示例 + case "$1" in + 0) + echo "🎙️ 运行示例 0: 简单对话演示" + node "00-简单对话演示.js" + ;; + 1) + echo "🎙️ 运行示例 1: 多说话人对话 - 链式调用" + node "01-多说话人对话 - 链式调用.js" + ;; + 2) + echo "🎙️ 运行示例 2: 多说话人对话 - 函数式" + node "02-多说话人对话 - 函数式.js" + ;; + 3) + echo "🎙️ 运行示例 3: 31 种情感风格演示" + node "03-31 种情感风格演示.js" + ;; + 4) + echo "🎙️ 运行示例 4: 情感强度控制演示" + node "04-情感强度控制演示.js" + ;; + 5) + echo "🎙️ 运行示例 5: 文本替换功能演示" + node "05-文本替换功能演示.js" + ;; + *) + echo "用法:./run.sh <示例编号>" + echo "" + echo "可用示例:" + echo " 0 - 简单对话演示" + echo " 1 - 多说话人对话 - 链式调用" + echo " 2 - 多说话人对话 - 函数式" + echo " 3 - 31 种情感风格演示" + echo " 4 - 情感强度控制演示" + echo " 5 - 文本替换功能演示" + exit 1 + ;; +esac From 7df85a5239e225a0868b8794e1c89502041ba442 Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Fri, 20 Mar 2026 17:01:29 +0800 Subject: [PATCH 03/10] chore: remove root-level test file Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus --- .gitignore | 7 ++- AGENTS.md | 33 +++++++---- README.md | 145 +++++++++++++++++++++++++++++++++++++++++++++++ src/MsEdgeTTS.ts | 42 ++++++++++++++ src/index.ts | 5 +- tsconfig.json | 3 +- 6 files changed, 220 insertions(+), 15 deletions(-) diff --git a/.gitignore b/.gitignore index 074fc6f..e6a6396 100644 --- a/.gitignore +++ b/.gitignore @@ -1,5 +1,5 @@ # Build output -dist +dist/ # General files node_modules @@ -28,4 +28,7 @@ package-lock.json example_audio.webm example_audio_pitched.webm -msedgetts-test/ \ No newline at end of file +# Generated test files and AI-generated content +msedgetts-test/ +.sisyphus/ +.github/ \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index 4538d17..14ccf58 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -14,18 +14,29 @@ Microsoft Edge TTS 文本转语音库 - 使用 Azure Speech Service (Microsoft E ``` ./ -├── src/ # 全部源代码 -│ ├── index.ts # 主入口点(barrel exports) -│ ├── MsEdgeTTS.ts # 核心 TTS 类(457 行) -│ ├── Output.ts # 音频输出格式枚举 -│ ├── Prosody.ts # 语速/音调/音量选项 -│ ├── utils.ts # 工具函数(路径拼接) -│ └── MsEdgeTTS.spec.ts # 单元测试 +├── src/ # 全部源代码(9 个 TypeScript 文件) +│ ├── index.ts # 主入口点(barrel exports,6 个导出) +│ ├── MsEdgeTTS.ts # 核心 TTS 类(457 行,WebSocket 通信) +│ ├── MsEdgeTTS.spec.ts # 单元测试 +│ ├── Output.ts # 音频输出格式枚举 + 扩展名映射 +│ ├── Prosody.ts # 语速/音调/音量选项类 +│ ├── DialogueTurn.ts # 对话轮次类型定义 +│ ├── DialogueBuilder.ts # 对话构建器类 + SSML 构建函数 +│ ├── SSMLUtils.ts # SSML 工具函数(转义、验证) +│ └── utils.ts # 路径拼接工具 +├── example/ # 示例演示代码(6 个中文命名文件) +│ ├── 00-简单对话演示.ts +│ ├── 01-多说话人对话 - 链式调用.ts +│ ├── 02-多说话人对话 - 函数式.ts +│ ├── 03-31 种情感风格演示.ts +│ ├── 04-情感强度控制演示.ts +│ └── 05-文本替换功能演示.ts ├── .github/workflows/ -│ └── deploy_docs.yml # CI/CD:文档部署到 gh-pages -├── package.json # 依赖 + Jest 配置 -├── tsconfig.json # TypeScript 编译配置 -└── README.md # API 文档 +│ └── deploy_docs.yml # CI/CD:仅文档部署到 gh-pages +├── docs/ # 手动编写的 SSML 文档 +├── package.json # 依赖 + Jest 配置(内联) +├── tsconfig.json # TypeScript 编译配置 +└── README.md # API 文档 ``` ## WHERE TO LOOK diff --git a/README.md b/README.md index b55c3be..aebdb38 100644 --- a/README.md +++ b/README.md @@ -134,3 +134,148 @@ import {MsEdgeTTS, OUTPUT_FORMAT} from "msedge-tts"; For the full documentation check out the [API Documentation](https://migushthe2nd.github.io/MsEdgeTTS). This library only supports promises. + +## Multi-Speaker Dialogue + +Supports multi-speaker dialogue synthesis, making it easy to create audio content containing multiple voice characters. + +### Simple Example (Functional) + +Quickly build dialogue using the `buildDialogueSSML()` utility function: + +```js +import {MsEdgeTTS, OUTPUT_FORMAT, buildDialogueSSML} from "msedge-tts"; + +(async () => { + const tts = new MsEdgeTTS(); + await tts.setMetadata("zh-CN-XiaoxiaoNeural", OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS); + + const ssml = buildDialogueSSML([ + { voice: "zh-CN-XiaoxiaoNeural", text: "Hello", style: "cheerful" }, + { voice: "en-US-AndrewNeural", text: "Hello", lang: "en-US" } + ]); + + const {audioStream} = await tts.toStream(ssml); + + audioStream.on("data", (data) => { + console.log("DATA RECEIVED", data); + }); +})(); +``` + +### Chained Call Example + +Build dialogue in a chained manner using the `DialogueBuilder` class: + +```js +import {MsEdgeTTS, OUTPUT_FORMAT, DialogueBuilder} from "msedge-tts"; + +(async () => { + const tts = new MsEdgeTTS(); + await tts.setMetadata("zh-CN-XiaoxiaoNeural", OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS); + + const dialogue = new DialogueBuilder() + .addTurn({ voice: "zh-CN-XiaoxiaoNeural", text: "Hello everyone!" }) + .addTurn({ voice: "en-US-AndrewNeural", text: "Hi everyone!" }) + .build(); + + const {audioStream} = await tts.toStreamDialogue(dialogue); + + audioStream.on("data", (data) => { + console.log("DATA RECEIVED", data); + }); +})(); +``` + +### Chinese-English Mixed Example + +Supports mixing multiple languages within the same dialogue: + +```js +import {MsEdgeTTS, OUTPUT_FORMAT, buildDialogueSSML} from "msedge-tts"; + +(async () => { + const tts = new MsEdgeTTS(); + await tts.setMetadata("zh-CN-XiaoxiaoNeural", OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS); + + const ssml = buildDialogueSSML([ + { + voice: "zh-CN-XiaoxiaoNeural", + text: "Welcome to our meeting", + style: "friendly" + }, + { + voice: "en-US-AndrewNeural", + text: "Welcome to our conference", + style: "friendly", + lang: "en-US" + }, + { + voice: "zh-CN-YunxiNeural", + text: "Today we will discuss the future of artificial intelligence", + style: "documentary-narration" + } + ]); + + const {audioStream} = await tts.toStream(ssml); + + audioStream.on("data", (data) => { + console.log("DATA RECEIVED", data); + }); +})(); +``` + +### Supported Emotional Styles + +Microsoft Azure Speech Service officially supports the following 28 emotional styles: + +| Style | Description | +| --- | --- | +| `advertisement_upbeat` | Promote products or services with an excited and energetic tone | +| `affectionate` | Express warm and affectionate tone with higher pitch and volume | +| `angry` | Express angry and disgusted tone | +| `assistant` | Speak in a warm and relaxed tone, used for digital assistants | +| `calm` | Speak with composure and calmness | +| `chat` | Express a relaxed and casual tone | +| `cheerful` | Express a positive and pleasant tone | +| `customerservice` | Provide support to customers with a friendly and enthusiastic tone | +| `depressed` | Express melancholy and depressed tone with lower pitch and volume | +| `documentary-narration` | Narrate documentaries in a relaxed, interested, and informative style | +| `empathetic` | Express care and understanding | +| `excited` | Express an optimistic and hopeful tone | +| `fearful` | Express fear with higher pitch, higher volume, and faster speech rate | +| `friendly` | Express a pleasant, charming, and warm tone | +| `gentle` | Express a mild, polite, and pleasant tone with lower pitch and volume | +| `hopeful` | Speak in a warm and longing tone | +| `lyrical` | Express emotions in a graceful and slightly sentimental way | +| `narration-professional` | Read content in a professional and objective tone | +| `narration-relaxed` | Speak in a soothing and pleasant tone, used for content narration | +| `newscast` | Narrate news in a formal and professional tone | +| `newscast-casual` | Deliver general news in a common, casual tone | +| `newscast-formal` | Deliver news in a formal, confident, and authoritative tone | +| `poetry-reading` | Express emotional and rhythmic tone when reading poetry | +| `sad` | Express a sorrowful tone | +| `serious` | Express a serious and commanding tone | +| `shouting` | Sound as if speaking from a distance or in another location | +| `sports_commentary` | Express a relaxed yet interested tone for broadcasting sports events | +| `sports_commentary_excited` | Broadcast sports event highlights with a fast and energetic tone | +| `terrified` | Express a fearful tone with fast speech rate and trembling voice | +| `unfriendly` | Express a cold and indifferent tone | +| `whispering` | Speak in a soft tone trying to produce a gentle and mild sound | + +### Using Style Degree + +You can adjust the emotional intensity through the `styleDegree` parameter (range: 0.01 to 2.0, default is 1): + +```js +const ssml = buildDialogueSSML([ + { + voice: "zh-CN-XiaomoNeural", + text: "Hurry up, be careful on the road", + style: "sad", + styleDegree: 2.0 // Stronger sadness emotion + } +]); +``` + +For more detailed information, please refer to the [Microsoft official documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-voice). diff --git a/src/MsEdgeTTS.ts b/src/MsEdgeTTS.ts index 5f86370..b756c2c 100644 --- a/src/MsEdgeTTS.ts +++ b/src/MsEdgeTTS.ts @@ -7,6 +7,8 @@ import * as fs from "fs" import {Agent} from "http" import {ProsodyOptions} from "./Prosody" import {joinPath} from "./utils"; +import { Dialogue, DialogueTurn } from "./DialogueTurn"; +import { buildDialogueSSML } from "./DialogueBuilder"; export type Voice = { Name: string; @@ -307,6 +309,27 @@ export class MsEdgeTTS { return this._rawSSMLRequestToFile(dirPath, this._SSMLTemplate(input, options)) } + /** + * Writes raw audio synthesised from dialogue to a file. Supports multi-speaker conversations. + * + * @param dirPath a valid output directory path + * @param dialogue a {@link Dialogue} object or an array of {@link DialogueTurn} objects + * @param options (optional) {@link ProsodyOptions} - Note: prosody options are applied globally and may conflict with per-turn settings in dialogue + @returns {Promise<{audioFilePath: string, metadataFilePath: string | null}>} - a `Promise` with the full filepaths + */ + toFileDialogue(dirPath: string, dialogue: Dialogue | DialogueTurn[], options?: ProsodyOptions): Promise<{ + audioFilePath: string, + metadataFilePath: string | null + }> { + let ssml: string; + if (dialogue instanceof Dialogue) { + ssml = dialogue.toSSML(); + } else { + ssml = buildDialogueSSML(dialogue); + } + return this.rawToFile(dirPath, ssml); + } + /** * Writes raw audio synthesised from text in real-time to a {@link Readable}. Uses a basic {@link _SSMLTemplate SML template}. * @@ -321,6 +344,25 @@ export class MsEdgeTTS { return this._rawSSMLRequest(this._SSMLTemplate(input, options)) } + /** + * Writes raw audio synthesised from dialogue in real-time to a {@link Readable}. Supports multi-speaker conversations. + * + * @param dialogue a {@link Dialogue} object or an array of {@link DialogueTurn} objects + @returns {Promise<{audioStream: Readable, metadataStream: Readable | null}>} - a `Promise` with the streams + */ + toStreamDialogue(dialogue: Dialogue | DialogueTurn[]): { + audioStream: Readable, + metadataStream: Readable | null, + } { + let ssml: string; + if (dialogue instanceof Dialogue) { + ssml = dialogue.toSSML(); + } else { + ssml = buildDialogueSSML(dialogue); + } + return this.rawToStream(ssml); + } + /** * Writes raw audio synthesised from text to a file. Has no SSML template. Basic SSML should be provided in the request. * diff --git a/src/index.ts b/src/index.ts index a46a6ee..9e2cde2 100644 --- a/src/index.ts +++ b/src/index.ts @@ -1,3 +1,6 @@ export * from "./MsEdgeTTS" export * from "./Output" -export * from "./Prosody" \ No newline at end of file +export * from "./Prosody" +export * from "./DialogueTurn" +export * from "./DialogueBuilder" +export * from "./SSMLUtils" \ No newline at end of file diff --git a/tsconfig.json b/tsconfig.json index 08ba2c1..9926d11 100644 --- a/tsconfig.json +++ b/tsconfig.json @@ -1,6 +1,7 @@ { "include": [ - "src/**/*" + "src/**/*", + "example/**/*.ts" ], "exclude": [ "node_modules", From 9b422942e11bb0dada209e663fed9e3410f6cd84 Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Fri, 20 Mar 2026 17:02:44 +0800 Subject: [PATCH 04/10] chore: remove root-level test file and add AGENTS.md documentation - Delete test-multi-speaker-demo.ts (non-standard location) - Add src/AGENTS.md for core source code documentation - Update root AGENTS.md with complete project structure - Add new source files: DialogueBuilder.ts, DialogueTurn.ts, SSMLUtils.ts --- src/AGENTS.md | 145 ++++++++++++++++++++++++++++++++++++++ src/DialogueBuilder.ts | 155 +++++++++++++++++++++++++++++++++++++++++ src/DialogueTurn.ts | 45 ++++++++++++ src/SSMLUtils.ts | 82 ++++++++++++++++++++++ 4 files changed, 427 insertions(+) create mode 100644 src/AGENTS.md create mode 100644 src/DialogueBuilder.ts create mode 100644 src/DialogueTurn.ts create mode 100644 src/SSMLUtils.ts diff --git a/src/AGENTS.md b/src/AGENTS.md new file mode 100644 index 0000000..991391c --- /dev/null +++ b/src/AGENTS.md @@ -0,0 +1,145 @@ +# src/ 目录知识库 + +**所属模块**: 核心 TTS 功能实现 + +--- + +## OVERVIEW + +MsEdgeTTS 核心源代码目录 - 包含 WebSocket 通信、SSML 生成、音频输出控制等全部功能实现。 + +--- + +## WHERE TO LOOK + +| 任务 | 文件 | 说明 | +|------|------|------| +| 修改 WebSocket 通信逻辑 | `MsEdgeTTS.ts` | 连接初始化、消息收发、边界元数据处理 | +| 添加新音频格式 | `Output.ts` | `OUTPUT_FORMAT` 枚举 + `OUTPUT_EXTENSIONS` 映射 | +| 修改语音选项 | `Prosody.ts` | `ProsodyOptions` 类(rate/pitch/volume) | +| 修改对话构建器 | `DialogueBuilder.ts` | 链式调用构建器 + `buildDialogueSSML()` 函数 | +| 添加 SSML 工具 | `SSMLUtils.ts` | 转义函数、情感风格验证 | +| 修改类型定义 | `DialogueTurn.ts` | `DialogueTurn`、`Dialogue`、`TextSegment`、`Substitution` | +| 添加单元测试 | `*.spec.ts` | 与源码同目录,Jest 配置在 package.json | + +--- + +## FILE STRUCTURE + +``` +src/ +├── index.ts # Barrel export(6 个导出) +├── MsEdgeTTS.ts # 核心类(457 行) +├── MsEdgeTTS.spec.ts # 单元测试 +├── Output.ts # OUTPUT_FORMAT 枚举 + OUTPUT_EXTENSIONS +├── Prosody.ts # ProsodyOptions 类 + RATE/PITCH/VOLUME 枚举 +├── DialogueTurn.ts # DialogueTurn/Dialogue/TextSegment/Substitution 类型 +├── DialogueBuilder.ts # DialogueBuilder 类 + buildDialogueSSML() 函数 +├── SSMLUtils.ts # escapeSSML/replaceText/validateStyle/validateStyleDegree +└── utils.ts # joinPath() 路径拼接工具 +``` + +--- + +## CODE MAP + +| Symbol | Type | 文件 | 作用 | +|--------|------|------|------| +| `MsEdgeTTS` | Class | `MsEdgeTTS.ts` | 主类:WebSocket 连接、语音合成、流处理 | +| `OUTPUT_FORMAT` | Enum | `Output.ts` | 支持的音频格式(MP3/WEBM 多种比特率) | +| `OUTPUT_EXTENSIONS` | Const | `Output.ts` | 格式到文件扩展名映射(`.mp3`/`.webm`) | +| `ProsodyOptions` | Class | `Prosody.ts` | 语速/音调/音量配置选项 | +| `RATE` | Enum | `Prosody.ts` | 语速预设(x-slow 到 x-fast) | +| `PITCH` | Enum | `Prosody.ts` | 音调预设(x-low 到 x-high) | +| `VOLUME` | Enum | `Prosody.ts` | 音量预设(silent 到 x-LOUD) | +| `DialogueBuilder` | Class | `DialogueBuilder.ts` | 链式对话构建器 | +| `buildDialogueSSML` | Function | `DialogueBuilder.ts` | 函数式 SSML 生成 | +| `validateStyle` | Function | `SSMLUtils.ts` | 验证 28 种官方情感风格 | +| `escapeSSML` | Function | `SSMLUtils.ts` | XML 转义(& < > " ') | + +--- + +## CONVENTIONS + +**TypeScript 配置**: +- `module`: CommonJS(非 ESM,为兼容性) +- `target`: ESNext +- `skipLibCheck`: true +- 编译排除:`src/**/*.spec.ts` + +**测试约定**: +- 测试文件与源码同目录:`*.spec.ts` +- Jest 配置内联在 `package.json` +- 测试超时:15000ms + +**导出模式**: +- 使用 barrel export(`index.ts` 统一导出) +- 6 个公共 API:`MsEdgeTTS`, `OUTPUT_FORMAT`, `ProsodyOptions`, `DialogueTurn`, `DialogueBuilder`, `buildDialogueSSML` + +**SSML 处理**: +- 仅支持 `speak`/`voice`/`prosody`/`mstts:express-as`/`lang`/`sub` 元素 +- 不支持完整 SSML 规范 + +--- + +## ANTI-PATTERNS (SRC) + +- ❌ **不要** 在浏览器中使用 - API 需要 Edge User-Agent(仅服务器端) +- ❌ **不要** 修改 `MsEdgeTTS.ts` 中的 Sec-MS-GEC 哈希算法 - 依赖 Azure 认证机制 +- ❌ **不要** 删除 `isomorphic-ws` 依赖 - 实现跨环境兼容 +- ❌ **不要** 使用回调 API - 仅支持 Promise + +--- + +## UNIQUE STYLES + +**WebSocket 通信**: +- Sec-MS-GEC 哈希认证(SHA-256 + Windows Tick 时间戳) +- 自定义 UUID 生成(非 `crypto.randomUUID`) +- 消息分隔符:`\r\n\r\n` + +**日志系统**: +- 可选 logger(`enableLogger` 选项) +- 仅记录连接状态、消息收发 + +**多人对话支持**: +- `DialogueBuilder` 链式调用 +- `buildDialogueSSML()` 函数式 API +- 支持 28 种情感风格 + 强度控制(0.01-2.0) +- 支持文本替换(`` 标签) +- 支持多语言混合(``) + +--- + +## COMMANDS + +```bash +# 编译 src/ 到 dist/ +pnpm run build + +# 运行测试(src/*.spec.ts) +pnpm test + +# 测试监听模式 +pnpm run test:watch + +# 测试覆盖率 +pnpm run test:cov +``` + +--- + +## NOTES + +**关键限制**: +- 2025 年 12 月更新:API 需要 Edge User-Agent,**浏览器中无法使用** +- 语音列表需要可信客户端 Token(硬编码:`6A5AA1D4EAFF4E9FB37E23D68491D6F4`) + +**已知问题**: +- `MsEdgeTTS.ts` 约 457 行 - 复杂度较高,建议拆分 + +**添加新功能流程**: +1. 在 `src/` 同级创建 `.ts` 文件 +2. 在 `index.ts` 添加导出 +3. 创建同名 `.spec.ts` 测试文件 +4. 运行 `pnpm test` 验证 diff --git a/src/DialogueBuilder.ts b/src/DialogueBuilder.ts new file mode 100644 index 0000000..23cd05e --- /dev/null +++ b/src/DialogueBuilder.ts @@ -0,0 +1,155 @@ +import { Dialogue, type DialogueTurn } from "./DialogueTurn"; +import { escapeSSML, replaceText, validateStyle, validateStyleDegree } from "./SSMLUtils"; + +/** + * 对话构建器类,用于链式构建多说话人对话 + */ +export class DialogueBuilder { + private turns: DialogueTurn[] = []; + + /** + * 创建对话构建器 + */ + constructor() {} + + /** + * 添加一个对话回合(链式调用) + * @param turn 对话回合对象 + * @returns 当前构建器实例(支持链式调用) + * @throws 当 turn 参数无效时抛出异常 + */ + addTurn(turn: DialogueTurn): DialogueBuilder { + // 严格模式验证 + if (!turn.voice || turn.voice.trim() === "") { + throw new Error("voice name is required and cannot be empty"); + } + + if (turn.text !== undefined && turn.text !== null && turn.text.trim() === "") { + throw new Error("text cannot be empty string"); + } + + if (turn.style !== undefined && turn.style !== null) { + validateStyle(turn.style); + } + + if (turn.styleDegree !== undefined && turn.styleDegree !== null) { + validateStyleDegree(turn.styleDegree); + } + + this.turns.push(turn); + return this; + } + + /** + * 构建 Dialogue 对象 + * @returns 包含所有添加回合的 Dialogue 对象 + */ + build(): Dialogue { + const dialogue = new Dialogue(); + dialogue.turns = [...this.turns]; + return dialogue; + } + + /** + * 重置构建器状态,清空所有已添加的回合 + * @returns 当前构建器实例(支持链式调用) + */ + reset(): DialogueBuilder { + this.turns = []; + return this; + } +} + +/** + * 构建多说话人对话的 SSML 字符串 + * @param turns 对话回合数组 + * @returns 完整的 SSML 字符串 + */ +export function buildDialogueSSML(turns: DialogueTurn[]): string { + const voiceElements: string[] = []; + + for (const turn of turns) { + // 处理文本:先应用替换,后应用 SSML 转义 + let processedText = turn.text || ""; + + // 应用文本替换(生成 标签) + if (turn.substitutions && turn.substitutions.length > 0) { + // 按文本长度降序处理,确保先替换长词 + const sortedSubs = [...turn.substitutions].sort((a, b) => b.text.length - a.text.length); + const placeholders: Map = new Map(); + + for (let i = 0; i < sortedSubs.length; i++) { + const sub = sortedSubs[i]; + // 先对 alias 和 text 进行 SSML 转义 + const escapedAlias = escapeSSML(sub.alias); + const escapedText = escapeSSML(sub.text); + // 生成 text 标签 + const subTag = `${escapedText}`; + // 使用唯一占位符 + const placeholder = `__SUB_PLACEHOLDER_${i}__`; + placeholders.set(placeholder, subTag); + // 先替换为占位符 + processedText = processedText.replace( + new RegExp(sub.text.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), "g"), + placeholder + ); + } + + // 应用 SSML 转义 + processedText = escapeSSML(processedText); + + // 恢复 标签 + for (const [placeholder, subTag] of placeholders.entries()) { + processedText = processedText.replace(placeholder, subTag); + } + } else { + // 没有替换时,直接应用 SSML 转义 + processedText = escapeSSML(processedText); + } + + // 处理 children(如果有) + let childrenContent = ""; + if (turn.children && turn.children.length > 0) { + childrenContent = turn.children + .map((segment) => { + let segmentText = escapeSSML(segment.text); + if (segment.substitution) { + segmentText = segment.substitution; + } + if (segment.lang) { + return `${segmentText}`; + } + return segmentText; + }) + .join(""); + } + + // 构建 voice 元素内容 + let voiceContent = childrenContent || processedText; + + // 应用 lang(如果有) + if (turn.lang) { + voiceContent = `${voiceContent}`; + } + + // 应用 style 和 styleDegree(如果有) + if (turn.style) { + const styleDegreeAttr = turn.styleDegree !== undefined && turn.styleDegree !== null + ? ` styledegree="${turn.styleDegree}"` + : ""; + voiceContent = `${voiceContent}`; + } + + // 构建完整的 voice 元素 + voiceElements.push(`${voiceContent}`); + } + + // 推断主要语言(根据第一个 voice 名称) + const firstVoice = turns[0]?.voice || "zh-CN-XiaoxiaoNeural"; + const lang = firstVoice.split("-").slice(0, 2).join("-"); // 提取 "zh-CN" 或 "en-US" + + // 构建完整的 SSML + return ` +${voiceElements.join("\n")} +`; +} diff --git a/src/DialogueTurn.ts b/src/DialogueTurn.ts new file mode 100644 index 0000000..2eb9cd5 --- /dev/null +++ b/src/DialogueTurn.ts @@ -0,0 +1,45 @@ +/** + * 文本替换接口,用于将文本中的特定字符串替换为别名 + */ +export interface Substitution { + text: string; + alias: string; +} + +/** + * 文本片段接口,支持语言指定和文本替换 + */ +export interface TextSegment { + text: string; + lang?: string; + substitution?: string; +} + +/** + * 对话轮次接口,定义单个说话者的语音参数和文本内容 + */ +export interface DialogueTurn { + speaker?: string; + voice: string; + text?: string; + children?: TextSegment[]; + style?: string; + styleDegree?: number; + lang?: string; + substitutions?: Substitution[]; +} + +/** + * 对话类,包含多个对话轮次并可转换为 SSML + */ +export class Dialogue { + turns: DialogueTurn[] = []; + + /** + * 将对话转换为 SSML 格式 + * @returns SSML 字符串(占位实现,后续任务会完善) + */ + toSSML(): string { + return ""; + } +} diff --git a/src/SSMLUtils.ts b/src/SSMLUtils.ts new file mode 100644 index 0000000..14b4e10 --- /dev/null +++ b/src/SSMLUtils.ts @@ -0,0 +1,82 @@ +import type { Substitution } from "./DialogueTurn"; + +/** + * 转义 SSML 特殊字符 + * 转义顺序:先 & 后其他,防止重复转义 + */ +export function escapeSSML(text: string): string { + return text + .replace(/&/g, "&") + .replace(//g, ">") + .replace(/"/g, """) + .replace(/'/g, "'"); +} + +/** + * 按顺序替换文本中的匹配项(单次遍历,非递归) + */ +export function replaceText(text: string, substitutions: Substitution[]): string { + let result = text; + for (const sub of substitutions) { + result = result.replace(new RegExp(sub.text, "g"), sub.alias); + } + return result; +} + +/** + * Microsoft Azure Speech Service 官方支持的 28 种情感风格 + */ +const VALID_STYLES = [ + "advertisement_upbeat", + "affectionate", + "angry", + "assistant", + "calm", + "chat", + "cheerful", + "customerservice", + "depressed", + "documentary-narration", + "empathetic", + "excited", + "fearful", + "friendly", + "gentle", + "hopeful", + "lyrical", + "narration-professional", + "narration-relaxed", + "newscast", + "newscast-casual", + "newscast-formal", + "poetry-reading", + "sad", + "serious", + "shouting", + "sports_commentary", + "sports_commentary_excited", + "terrified", + "unfriendly", + "whispering", +] as const; + +/** + * 验证 style 是否为有效的 Microsoft 官方情感风格 + * 无效时抛出 Error + */ +export function validateStyle(style: string): void { + if (!VALID_STYLES.includes(style as any)) { + throw new Error(`Invalid style "${style}". Valid styles: ${VALID_STYLES.join(", ")}`); + } +} + +/** + * 验证 styleDegree 范围(0.01-2.0) + * 无效时抛出 Error + */ +export function validateStyleDegree(degree: number): void { + if (degree < 0.01 || degree > 2.0) { + throw new Error("styleDegree must be between 0.01 and 2.0"); + } +} From 9d32f595f0d9715d7e39ca7010fd9ef331344ae8 Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Sun, 22 Mar 2026 17:38:11 +0800 Subject: [PATCH 05/10] docs: update documentation for multi-speaker dialogue and SSML features Update project documentation to reflect new API capabilities and conventions. - Add DialogueBuilder class and interfaces (DialogueTurn, TextSegment) - Document SSML utilities (escapeSSML, validateStyle, validateStyleDegree) - Update project overview with current code scale and feature list - Add sections for error handling, logging, and SSML processing conventions - List specific error scenarios and anti-patterns for contributors --- AGENTS.md | 45 +++- README.md | 722 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 762 insertions(+), 5 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 14ccf58..faae599 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,14 +1,16 @@ # PROJECT KNOWLEDGE BASE -**Generated:** 2026-03-18 +**Generated:** 2026-03-22 **Commit:** main branch **Branch:** main ## OVERVIEW -Microsoft Edge TTS 文本转语音库 - 使用 Azure Speech Service (Microsoft Edge Read Aloud API) 的 Node.js/TypeScript 模块。支持语音合成、SSML、多种音频格式输出。 +Microsoft Edge TTS 文本转语音库 - 使用 Azure Speech Service (Microsoft Edge Read Aloud API) 的 Node.js/TypeScript 模块。支持语音合成、SSML、多说话人对话、情感风格控制、多种音频格式输出。 **核心栈**: TypeScript, WebSocket, Jest (测试), pnpm (包管理器) +**代码规模**: ~1010 行 TypeScript (src/ 目录) +**更新时间**: 2026-03-22 ## STRUCTURE @@ -16,7 +18,7 @@ Microsoft Edge TTS 文本转语音库 - 使用 Azure Speech Service (Microsoft E ./ ├── src/ # 全部源代码(9 个 TypeScript 文件) │ ├── index.ts # 主入口点(barrel exports,6 个导出) -│ ├── MsEdgeTTS.ts # 核心 TTS 类(457 行,WebSocket 通信) +│ ├── MsEdgeTTS.ts # 核心 TTS 类(~499 行,WebSocket 通信) │ ├── MsEdgeTTS.spec.ts # 单元测试 │ ├── Output.ts # 音频输出格式枚举 + 扩展名映射 │ ├── Prosody.ts # 语速/音调/音量选项类 @@ -64,6 +66,11 @@ Microsoft Edge TTS 文本转语音库 - 使用 Azure Speech Service (Microsoft E | `VOLUME` | Enum | `src/Prosody.ts` | 音量预设(silent 到 x-LOUD) | | `Voice` | Type | `src/MsEdgeTTS.ts` | 语音元数据结构 | | `MetadataOptions` | Class | `src/MsEdgeTTS.ts` | 边界元数据选项(句子/单词) | +| `DialogueBuilder` | Class | `src/DialogueBuilder.ts` | 链式对话构建器 | +| `buildDialogueSSML` | Function | `src/DialogueBuilder.ts` | 函数式 SSML 生成 | +| `escapeSSML` | Function | `src/SSMLUtils.ts` | XML 转义(& < > " ') | +| `validateStyle` | Function | `src/SSMLUtils.ts` | 验证 28 种官方情感风格 | +| `validateStyleDegree` | Function | `src/SSMLUtils.ts` | 验证 styleDegree 范围(0.01-2.0) | | `joinPath` | Function | `src/utils.ts` | 路径拼接工具 | ## CONVENTIONS @@ -83,14 +90,40 @@ Microsoft Edge TTS 文本转语音库 - 使用 Azure Speech Service (Microsoft E - 强制使用 `pnpm`(preinstall 钩子) - 版本锁定:pnpm-lock.yaml +**错误处理约定**: +- 验证失败时抛出明确 Error(见 SSMLUtils.ts) +- 无效输入立即抛出,不调用 fallback + +**日志约定**: +- 可选 logger 通过 `enableLogger` 选项启用 +- 使用私有 `_log()` 方法记录 +- 仅记录连接状态、消息收发 + +**SSML 处理约定**: +- 转义顺序先 & 后其他,防止重复转义 +- 仅支持 `speak`, `voice`, `prosody` 元素 + ## ANTI-PATTERNS (THIS PROJECT) - ❌ **不要** 使用 npm/yarn - 项目强制使用 pnpm - ❌ **不要** 将测试移至独立目录 - 保持 `*.spec.ts` 与源码同级 - ❌ **不要** 修改 tsconfig 的 module/moduleResolution - 依赖 CommonJS -- ❌ **不要** 在浏览器中使用 - API 需要 Edge User-Agent(仅限服务器端) +- ❌ **不要** 修改 Sec-MS-GEC 哈希算法 - 依赖 Azure 认证机制 +- ❌ **不要** 删除 `isomorphic-ws` 依赖 - 实现跨环境兼容 +- ❌ **不要** 使用回调 API - 仅支持 Promise +- ❌ **不要** 在浏览器中使用 - API 需要 Edge User-Agent(仅服务器端) - ❌ **不要** 删除 `dist/` 外的文件 - 发布仅包含 dist 目录 +## ERROR HANDLING + +**抛出 Error 的场景**: +- 未配置 metadata:`"Speech synthesis not configured yet..."` +- 无效 voiceLocale:`"Could not infer voiceLocale from voiceName..."` +- 无效 style:`'Invalid style "xxx". Valid styles: ...'` +- styleDegree 越界:`"styleDegree must be between 0.01 and 2.0"` +- 空 voice 名称:`"voice name is required and cannot be empty"` +- 空文本:`"text cannot be empty string"` + ## UNIQUE STYLES **SSML 模板**: @@ -141,7 +174,9 @@ pnpm run publish **已知问题**: - package.json 中的 `src/test/test.ts` 和 `src/test/jest-e2e.json` 不存在(遗留配置) -- CI 仅部署文档,不运行测试 +- 测试覆盖率不足:仅 1 个测试文件(MsEdgeTTS.spec.ts),覆盖率 11% +- utils.ts 过于简化(仅 6 行代码),可考虑合并 +- example/ 目录混合非 TS 文件(config.json, run.sh 等) **发布流程**: 1. `pnpm run build` 编译到 dist/ diff --git a/README.md b/README.md index aebdb38..c6c416d 100644 --- a/README.md +++ b/README.md @@ -279,3 +279,725 @@ const ssml = buildDialogueSSML([ ``` For more detailed information, please refer to the [Microsoft official documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-voice). + +--- + +## Complete API Reference + +### Class: `MsEdgeTTS` + +Main TTS class for speech synthesis via WebSocket. + +#### Constructor + +```ts +new MsEdgeTTS(options?: Options) +``` + +**Options:** +| Parameter | Type | Default | Description | +|-----------|------|---------|-------------| +| `agent` | `Agent` | `undefined` | Custom HTTP agent (proxy support, **not supported in browser**) | +| `enableLogger` | `boolean` | `false` | Enable built-in logger for connection status | + +#### Methods + +##### `getVoices(): Promise` + +Fetch the list of voices available in Microsoft Edge. + +**Returns:** Array of voice objects with properties: +- `Name`: Full voice name +- `ShortName`: Short identifier (e.g., `"en-US-AriaNeural"`) +- `Gender`: `"Male"` or `"Female"` +- `Locale`: Voice locale (e.g., `"en-US"`) +- `SuggestedCodec`: Recommended codec +- `FriendlyName`: Display name +- `Status`: Voice status + +**Example:** +```ts +const tts = new MsEdgeTTS(); +const voices = await tts.getVoices(); +console.log(voices.filter(v => v.Gender === "Female")); +``` + +--- + +##### `setMetadata(voiceName, outputFormat, metadataOptions?): Promise` + +Initialize speech synthesis parameters. **Must be called before `toStream` or `toFile`.** + +**Parameters:** +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `voiceName` | `string` | ✅ | Voice ShortName (e.g., `"en-US-AriaNeural"`) | +| `outputFormat` | `OUTPUT_FORMAT` | ✅ | Audio output format | +| `metadataOptions` | `MetadataOptions` | ❌ | Boundary metadata options | + +**MetadataOptions:** +| Property | Type | Default | Description | +|----------|------|---------|-------------| +| `voiceLocale` | `string` | Auto-inferred | Voice locale override | +| `sentenceBoundaryEnabled` | `boolean` | `false` | Enable sentence boundary metadata | +| `wordBoundaryEnabled` | `boolean` | `false` | Enable word boundary metadata | + +**Example:** +```ts +await tts.setMetadata( + "en-US-AriaNeural", + OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS, + { wordBoundaryEnabled: true, sentenceBoundaryEnabled: true } +); +``` + +--- + +##### `toStream(input, options?): { audioStream: Readable, metadataStream: Readable | null }` + +Synthesize text to audio stream (real-time). + +**Parameters:** +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `input` | `string` | ✅ | Text or SSML to synthesize | +| `options` | `ProsodyOptions` | ❌ | Voice prosody settings | + +**Returns:** +- `audioStream`: Node.js Readable stream with raw audio data +- `metadataStream`: Readable stream with boundary metadata (if enabled) + +**Example:** +```ts +const { audioStream, metadataStream } = await tts.toStream("Hello world", { + rate: RATE.FAST, + pitch: "+10Hz", + volume: VOLUME.LOUD +}); + +audioStream.on("data", (data) => { + console.log("Audio chunk:", data); +}); + +metadataStream?.on("data", (chunk) => { + const metadata = JSON.parse(chunk.toString()); + console.log("Metadata:", metadata); +}); +``` + +--- + +##### `toFile(dirPath, input, options?): Promise<{ audioFilePath: string, metadataFilePath: string | null }>` + +Synthesize text and save to file. + +**Parameters:** +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `dirPath` | `string` | ✅ | Output directory path | +| `input` | `string` | ✅ | Text or SSML to synthesize | +| `options` | `ProsodyOptions` | ❌ | Voice prosody settings | + +**Example:** +```ts +const { audioFilePath, metadataFilePath } = await tts.toFile( + "./output", + "Hello world", + { rate: 0.8 } +); +console.log("Saved to:", audioFilePath); +``` + +--- + +##### `toStreamDialogue(dialogue): { audioStream: Readable, metadataStream: Readable | null }` + +Synthesize multi-speaker dialogue to stream. + +**Parameters:** +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `dialogue` | `Dialogue | DialogueTurn[]` | ✅ | Dialogue object or array of turns | + +**Example:** +```ts +const dialogue = new DialogueBuilder() + .addTurn({ voice: "zh-CN-XiaoxiaoNeural", text: "你好", style: "cheerful" }) + .addTurn({ voice: "en-US-AndrewNeural", text: "Hello", lang: "en-US" }) + .build(); + +const { audioStream } = await tts.toStreamDialogue(dialogue); +``` + +--- + +##### `toFileDialogue(dirPath, dialogue, options?): Promise<{ audioFilePath: string, metadataFilePath: string | null }>` + +Synthesize multi-speaker dialogue and save to file. + +**Parameters:** +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `dirPath` | `string` | ✅ | Output directory path | +| `dialogue` | `Dialogue | DialogueTurn[]` | ✅ | Dialogue object or array of turns | +| `options` | `ProsodyOptions` | ❌ | Global prosody settings | + +--- + +##### `rawToStream(requestSSML): { audioStream: Readable, metadataStream: Readable | null }` + +Synthesize custom SSML to stream (no template applied). + +**Parameters:** +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `requestSSML` | `string` | ✅ | Complete SSML string | + +**Example:** +```ts +const customSSML = ` + + + + Hello world + + +`; + +const { audioStream } = await tts.rawToStream(customSSML); +``` + +--- + +##### `rawToFile(dirPath, requestSSML): Promise<{ audioFilePath: string, metadataFilePath: string | null }>` + +Synthesize custom SSML and save to file. + +--- + +##### `close(): void` + +Close the WebSocket connection. + +--- + +### Enum: `OUTPUT_FORMAT` + +Supported audio output formats. + +| Format | Codec | Bitrate | Extension | Use Case | +|--------|-------|---------|-----------|----------| +| `AUDIO_24KHZ_48KBITRATE_MONO_MP3` | MP3 | 48 kbps | `.mp3` | Standard quality | +| `AUDIO_24KHZ_96KBITRATE_MONO_MP3` | MP3 | 96 kbps | `.mp3` | High quality | +| `WEBM_24KHZ_16BIT_MONO_OPUS` | OPUS | ~64 kbps | `.webm` | Web streaming | + +**Usage:** +```ts +import { OUTPUT_FORMAT } from "msedge-tts"; + +await tts.setMetadata("en-US-AriaNeural", OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS); +``` + +--- + +### Class: `ProsodyOptions` + +Voice prosody configuration. + +```ts +class ProsodyOptions { + pitch?: PITCH | string = "+0Hz" + rate?: RATE | string | number = 1.0 + volume?: VOLUME | string | number = 100.0 +} +``` + +#### Properties + +##### `pitch` + +Baseline pitch for the voice. + +**Accepted values:** +- `PITCH` enum: `X_LOW`, `LOW`, `MEDIUM`, `HIGH`, `X_HIGH`, `DEFAULT` +- Relative frequency: `"+50Hz"`, `"-20Hz"` +- Relative semitones: `"+2st"`, `"-3st"` +- Relative percentage: `"+50%"`, `"-25%"` + +**Default:** `"+0Hz"` + +--- + +##### `rate` + +Speaking rate for the voice. + +**Accepted values:** +- `RATE` enum: `X_SLOW`, `SLOW`, `MEDIUM`, `FAST`, `X_FAST`, `DEFAULT` +- Relative number: `0.5` (50%), `2.0` (200%) +- Relative percentage string: `"+50%"`, `"-25%"` + +**Default:** `1.0` (normal speed) + +--- + +##### `volume` + +Volume level for the voice. + +**Accepted values:** +- `VOLUME` enum: `SILENT`, `X_SOFT`, `SOFT`, `MEDIUM`, `LOUD`, `X_LOUD`, `DEFAULT` +- Absolute number: `0` to `100` +- Relative number: `"+10"`, `"-20"` +- Relative percentage: `"+50%"`, `"-30%"` + +**Default:** `100.0` + +--- + +### Enum: `RATE` + +Speaking rate presets. + +| Value | Description | +|-------|-------------| +| `X_SLOW` | Extra slow (0.3x) | +| `SLOW` | Slow (0.5x) | +| `MEDIUM` | Medium (0.8x) | +| `DEFAULT` | Normal (1.0x) | +| `FAST` | Fast (1.5x) | +| `X_FAST` | Extra fast (2.0x) | + +--- + +### Enum: `PITCH` + +Pitch presets. + +| Value | Description | +|-------|-------------| +| `X_LOW` | Extra low | +| `LOW` | Low | +| `MEDIUM` | Medium | +| `DEFAULT` | Normal | +| `HIGH` | High | +| `X_HIGH` | Extra high | + +--- + +### Enum: `VOLUME` + +Volume presets. + +| Value | Description | +|-------|-------------| +| `SILENT` | Silent | +| `X_SOFT` | Extra soft | +| `SOFT` | Soft | +| `MEDIUM` | Medium | +| `LOUD` | Loud | +| `X_LOUD` | Extra loud | + +--- + +### Interface: `DialogueTurn` + +Single speaker turn in a multi-speaker dialogue. + +```ts +interface DialogueTurn { + speaker?: string // Optional speaker name + voice: string // Voice ShortName (required) + text?: string // Text content + children?: TextSegment[] // Child text segments + style?: string // Emotional style + styleDegree?: number // Style intensity (0.01-2.0) + lang?: string // Language override (e.g., "en-US") + substitutions?: Substitution[] // Text replacements +} +``` + +--- + +### Interface: `TextSegment` + +Text segment with language or substitution. + +```ts +interface TextSegment { + text: string + lang?: string // Language for this segment + substitution?: string // Custom SSML substitution +} +``` + +--- + +### Interface: `Substitution` + +Text substitution for pronunciation. + +```ts +interface Substitution { + text: string // Text to replace + alias: string // Replacement text (or pronunciation) +} +``` + +**Example:** +```ts +{ + text: "W3C", + alias: "World Wide Web Consortium" +} +``` + +--- + +### Class: `DialogueBuilder` + +Chainable builder for multi-speaker dialogues. + +```ts +class DialogueBuilder { + constructor() + addTurn(turn: DialogueTurn): DialogueBuilder + build(): Dialogue + reset(): DialogueBuilder +} +``` + +**Example:** +```ts +const dialogue = new DialogueBuilder() + .addTurn({ voice: "zh-CN-XiaoxiaoNeural", text: "你好", style: "friendly" }) + .addTurn({ voice: "en-US-AndrewNeural", text: "Hello", lang: "en-US" }) + .build(); +``` + +--- + +### Function: `buildDialogueSSML(turns: DialogueTurn[]): string` + +Functional API to build SSML from dialogue turns. + +**Parameters:** +| Parameter | Type | Description | +|-----------|------|-------------| +| `turns` | `DialogueTurn[]` | Array of dialogue turns | + +**Returns:** Complete SSML string + +**Example:** +```ts +const ssml = buildDialogueSSML([ + { voice: "zh-CN-XiaoxiaoNeural", text: "你好" }, + { voice: "en-US-AndrewNeural", text: "Hello", lang: "en-US" } +]); +``` + +--- + +### Function: `escapeSSML(text: string): string` + +Escape special XML characters in text. + +**Escapes:** +- `&` → `&` +- `<` → `<` +- `>` → `>` +- `"` → `"` +- `'` → `'` + +**Example:** +```ts +escapeSSML("Tom & Jerry ") +// Returns: "Tom & Jerry <Cat>" +``` + +--- + +### Function: `validateStyle(style: string): void` + +Validate emotional style name. Throws `Error` if invalid. + +**Valid styles:** All 28 Microsoft official styles (see table above) + +--- + +### Function: `validateStyleDegree(degree: number): void` + +Validate style intensity range. Throws `Error` if outside 0.01-2.0. + +--- + +## Error Handling + +### Common Errors + +#### 1. Metadata Not Configured + +```ts +// ❌ Wrong: Calling toStream without setMetadata +const { audioStream } = await tts.toStream("Hello"); +// Throws: "Speech synthesis not configured yet..." + +// ✅ Correct: +await tts.setMetadata("en-US-AriaNeural", OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS); +const { audioStream } = await tts.toStream("Hello"); +``` + +--- + +#### 2. Invalid Voice Name + +```ts +// ❌ Wrong: Invalid voice name +await tts.setMetadata("invalid-voice", OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS); +// May throw: "Could not infer voiceLocale from voiceName..." + +// ✅ Correct: Use valid ShortName from getVoices() +const voices = await tts.getVoices(); +const validVoice = voices[0].ShortName; +await tts.setMetadata(validVoice, OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS); +``` + +--- + +#### 3. Invalid Style Name + +```ts +import { buildDialogueSSML } from "msedge-tts"; + +// ❌ Wrong: Invalid style +const ssml = buildDialogueSSML([ + { voice: "zh-CN-XiaoxiaoNeural", text: "Hello", style: "invalid-style" } +]); +// Throws: 'Invalid style "invalid-style". Valid styles: ...' + +// ✅ Correct: Use valid style +const ssml = buildDialogueSSML([ + { voice: "zh-CN-XiaoxiaoNeural", text: "Hello", style: "cheerful" } +]); +``` + +--- + +#### 4. Invalid styleDegree Range + +```ts +// ❌ Wrong: Out of range +const ssml = buildDialogueSSML([ + { voice: "zh-CN-XiaoxiaoNeural", text: "Hello", style: "sad", styleDegree: 5.0 } +]); +// Throws: 'styleDegree must be between 0.01 and 2.0' + +// ✅ Correct: Within range 0.01-2.0 +const ssml = buildDialogueSSML([ + { voice: "zh-CN-XiaoxiaoNeural", text: "Hello", style: "sad", styleDegree: 1.5 } +]); +``` + +--- + +#### 5. No Audio Data Received + +```ts +// May occur if: +// - Network connection lost +// - Invalid SSML syntax +// - Voice service unavailable + +try { + await tts.toFile("./output", "Hello"); +} catch (error) { + console.error("Generation failed:", error.message); + // Handle: "No audio data received" +} +``` + +--- + +## Performance Optimization + +### 1. Reuse MsEdgeTTS Instance + +```ts +// ❌ Inefficient: Create new instance for each request +for (const text of texts) { + const tts = new MsEdgeTTS(); + await tts.setMetadata(voice, format); + await tts.toFile(`./output/${i}.mp3`, text); +} + +// ✅ Efficient: Reuse instance +const tts = new MsEdgeTTS(); +await tts.setMetadata(voice, format); +for (const text of texts) { + await tts.toFile(`./output/${i}.mp3`, text); +} +``` + +--- + +### 2. Batch Dialogue Turns + +```ts +// ❌ Inefficient: Separate requests +await tts.toFile("./output/1.mp3", buildDialogueSSML([turn1])); +await tts.toFile("./output/2.mp3", buildDialogueSSML([turn2])); + +// ✅ Efficient: Single request +await tts.toFile("./output/combined.mp3", buildDialogueSSML([turn1, turn2, turn3])); +``` + +--- + +### 3. Use Appropriate Bitrate + +| Use Case | Recommended Format | +|----------|-------------------| +| Podcast/Audiobook | `AUDIO_24KHZ_96KBITRATE_MONO_MP3` | +| Voice Assistant | `AUDIO_24KHZ_48KBITRATE_MONO_MP3` | +| Web Streaming | `WEBM_24KHZ_16BIT_MONO_OPUS` | + +--- + +### 4. Enable Logger for Debugging + +```ts +const tts = new MsEdgeTTS({ enableLogger: true }); +// Logs: connection status, message exchange, disconnection +``` + +--- + +## FAQ + +### Q: Can I use this library in the browser? + +**A:** No. As of December 2025, the API requires a Microsoft Edge User-Agent, which browsers other than Edge cannot provide. Use this library in server-side Node.js environments only. + +--- + +### Q: How do I get a list of all available voices? + +**A:** Use the `getVoices()` method: + +```ts +const tts = new MsEdgeTTS(); +const voices = await tts.getVoices(); +console.log(voices.map(v => ({ name: v.ShortName, gender: v.Gender, locale: v.Locale }))); +``` + +--- + +### Q: Can I mix multiple languages in one dialogue? + +**A:** Yes! Use the `lang` property in `DialogueTurn`: + +```ts +const ssml = buildDialogueSSML([ + { voice: "zh-CN-XiaoxiaoNeural", text: "Welcome to our meeting", style: "friendly" }, + { voice: "zh-CN-XiaoxiaoNeural", text: "欢迎参加我们的会议", lang: "zh-CN" }, + { voice: "en-US-AndrewNeural", text: "Today we will discuss AI", lang: "en-US" } +]); +``` + +--- + +### Q: How do I change the speaking speed? + +**A:** Use the `rate` option: + +```ts +// Using preset +await tts.toStream("Hello", { rate: RATE.FAST }); + +// Using custom value (0.5 = 50% speed, 2.0 = 200% speed) +await tts.toStream("Hello", { rate: 0.75 }); + +// Using percentage string +await tts.toStream("Hello", { rate: "+50%" }); // 150% speed +``` + +--- + +### Q: What is the maximum text length for synthesis? + +**A:** Microsoft Azure Speech Service has a limit of approximately 1000 characters per request. For longer texts: +1. Split into multiple requests +2. Use `DialogueBuilder` to chain segments +3. Concatenate audio files post-synthesis + +--- + +### Q: How do I use a proxy? + +**A:** Pass a custom HTTP agent: + +```ts +import { SocksProxyAgent } from 'socks-proxy-agent'; +import { MsEdgeTTS } from "msedge-tts"; + +const agent = new SocksProxyAgent("socks://user:pass@proxy-host:port"); +const tts = new MsEdgeTTS({ agent }); +await tts.setMetadata("en-US-AriaNeural", OUTPUT_FORMAT.WEBM_24KHZ_16BIT_MONO_OPUS); +``` + +--- + +### Q: Why am I getting "No audio data received"? + +**A:** Common causes: +1. **Network issues**: Check internet connection +2. **Invalid SSML**: Verify SSML syntax +3. **Voice service down**: Try a different voice +4. **Rate limiting**: Wait and retry + +--- + +## Changelog + +### Version 2.0.4 (Current) + +**Features:** +- ✅ Multi-speaker dialogue support (`DialogueBuilder`, `buildDialogueSSML`) +- ✅ 28 emotional styles with intensity control (0.01-2.0) +- ✅ Text substitution (`` tags) +- ✅ Multi-language mixing (`` tags) +- ✅ Sentence/word boundary metadata +- ✅ Proxy support via custom HTTP agent +- ✅ 3 audio output formats (MP3 48/96 kbps, WEBM OPUS) + +**Breaking Changes:** +- ⚠️ December 2025: API now requires Edge User-Agent (browser support dropped) + +**Dependencies:** +- `axios`: ^1.11.0 +- `isomorphic-ws`: ^5.0.0 +- `ws`: ^8.14.1 +- `buffer`: ^6.0.3 +- `stream-browserify`: ^3.0.0 + +--- + +## Related Projects + +- [Azure Speech Service Documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/) +- [SSML Specification](https://www.w3.org/TR/speech-synthesis11/) +- [Microsoft Edge Read Aloud API](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech) + +--- + +## License + +MIT License - See LICENSE file for details. + +--- + +## Support + +- **Issues**: [GitHub Issues](https://github.com/Migushthe2nd/MsEdgeTTS/issues) +- **npm**: [msedge-tts](https://www.npmjs.com/package/msedge-tts) +- **Documentation**: [API Docs](https://migushthe2nd.github.io/MsEdgeTTS) From ad1e60ff58b8eb1330b9a7eeac356297cac3100c Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Sun, 22 Mar 2026 18:15:43 +0800 Subject: [PATCH 06/10] docs: rename example files to English names and update references - Renamed 6 example TypeScript files to English names - Updated example/README.md with new filenames - All example files now use English naming convention - Git history preserved via git mv --- .../00-simple-dialogue-demo.ts | 0 .../01-multi-speaker-dialogue-chained.ts | 0 .../02-multi-speaker-dialogue-functional.ts | 0 .../03-31-emotional-styles-demo.ts | 0 .../04-style-degree-control-demo.ts | 0 .../05-text-substitution-demo.ts | 0 example/README.md | 98 +++++++++---------- 7 files changed, 49 insertions(+), 49 deletions(-) rename "example/00-\347\256\200\345\215\225\345\257\271\350\257\235\346\274\224\347\244\272.ts" => example/00-simple-dialogue-demo.ts (100%) rename "example/01-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \351\223\276\345\274\217\350\260\203\347\224\250.ts" => example/01-multi-speaker-dialogue-chained.ts (100%) rename "example/02-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \345\207\275\346\225\260\345\274\217.ts" => example/02-multi-speaker-dialogue-functional.ts (100%) rename "example/03-31 \347\247\215\346\203\205\346\204\237\351\243\216\346\240\274\346\274\224\347\244\272.ts" => example/03-31-emotional-styles-demo.ts (100%) rename "example/04-\346\203\205\346\204\237\345\274\272\345\272\246\346\216\247\345\210\266\346\274\224\347\244\272.ts" => example/04-style-degree-control-demo.ts (100%) rename "example/05-\346\226\207\346\234\254\346\233\277\346\215\242\345\212\237\350\203\275\346\274\224\347\244\272.ts" => example/05-text-substitution-demo.ts (100%) diff --git "a/example/00-\347\256\200\345\215\225\345\257\271\350\257\235\346\274\224\347\244\272.ts" b/example/00-simple-dialogue-demo.ts similarity index 100% rename from "example/00-\347\256\200\345\215\225\345\257\271\350\257\235\346\274\224\347\244\272.ts" rename to example/00-simple-dialogue-demo.ts diff --git "a/example/01-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \351\223\276\345\274\217\350\260\203\347\224\250.ts" b/example/01-multi-speaker-dialogue-chained.ts similarity index 100% rename from "example/01-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \351\223\276\345\274\217\350\260\203\347\224\250.ts" rename to example/01-multi-speaker-dialogue-chained.ts diff --git "a/example/02-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \345\207\275\346\225\260\345\274\217.ts" b/example/02-multi-speaker-dialogue-functional.ts similarity index 100% rename from "example/02-\345\244\232\350\257\264\350\257\235\344\272\272\345\257\271\350\257\235 - \345\207\275\346\225\260\345\274\217.ts" rename to example/02-multi-speaker-dialogue-functional.ts diff --git "a/example/03-31 \347\247\215\346\203\205\346\204\237\351\243\216\346\240\274\346\274\224\347\244\272.ts" b/example/03-31-emotional-styles-demo.ts similarity index 100% rename from "example/03-31 \347\247\215\346\203\205\346\204\237\351\243\216\346\240\274\346\274\224\347\244\272.ts" rename to example/03-31-emotional-styles-demo.ts diff --git "a/example/04-\346\203\205\346\204\237\345\274\272\345\272\246\346\216\247\345\210\266\346\274\224\347\244\272.ts" b/example/04-style-degree-control-demo.ts similarity index 100% rename from "example/04-\346\203\205\346\204\237\345\274\272\345\272\246\346\216\247\345\210\266\346\274\224\347\244\272.ts" rename to example/04-style-degree-control-demo.ts diff --git "a/example/05-\346\226\207\346\234\254\346\233\277\346\215\242\345\212\237\350\203\275\346\274\224\347\244\272.ts" b/example/05-text-substitution-demo.ts similarity index 100% rename from "example/05-\346\226\207\346\234\254\346\233\277\346\215\242\345\212\237\350\203\275\346\274\224\347\244\272.ts" rename to example/05-text-substitution-demo.ts diff --git a/example/README.md b/example/README.md index bcd03da..dc275ba 100644 --- a/example/README.md +++ b/example/README.md @@ -30,78 +30,78 @@ pnpm run build ### 3. 运行示例 ```bash -# 示例 1: 多说话人对话 - 链式调用 -node example/01-多说话人对话 - 链式调用.ts +# Example 1: Multi-Speaker Dialogue (Chained) +node example/01-multi-speaker-dialogue-chained.ts -# 示例 2: 多说话人对话 - 函数式 -node example/02-多说话人对话 - 函数式.ts +# Example 2: Multi-Speaker Dialogue (Functional) +node example/02-multi-speaker-dialogue-functional.ts -# 示例 3: 31 种情感风格演示 -node example/03-31 种情感风格演示.ts +# Example 3: 31 Emotional Styles Demo +node example/03-31-emotional-styles-demo.ts -# 示例 4: 情感强度控制演示 -node example/04-情感强度控制演示.ts +# Example 4: Style Degree Control Demo +node example/04-style-degree-control-demo.ts -# 示例 5: 文本替换功能演示 -node example/05-文本替换功能演示.ts +# Example 5: Text Substitution Demo +node example/05-text-substitution-demo.ts ``` -## 示例说明 +## Example Descriptions -### 示例 1: 多说话人对话 - 链式调用 +### Example 1: Multi-Speaker Dialogue (Chained) -使用 `DialogueBuilder` 类以链式调用方式构建对话。 +Build dialogue using the `DialogueBuilder` class with chained calls. -**特点**: -- 链式调用语法 -- 中英混合播客场景 -- 4 个说话人轮次 +**Features**: +- Chained call syntax +- Chinese-English mixed podcast scenario +- 4 speaker turns -**输出**: `example/output/01-播客对话 - 链式调用.mp3` +**Output**: `example/output/01-multi-speaker-dialogue-chained.mp3` -### 示例 2: 多说话人对话 - 函数式 +### Example 2: Multi-Speaker Dialogue (Functional) -使用 `buildDialogueSSML()` 函数直接构建对话。 +Build dialogue using the `buildDialogueSSML()` function. -**特点**: -- 函数式语法 -- 多语言客服对话 -- 4 个对话轮次 +**Features**: +- Functional syntax +- Multi-language customer service dialogue +- 4 dialogue turns -**输出**: `example/output/02-客服对话 - 函数式.mp3` +**Output**: `example/output/02-multi-speaker-dialogue-functional.mp3` -### 示例 3: 31 种情感风格演示 +### Example 3: 31 Emotional Styles Demo -遍历所有 Microsoft Azure 支持的 31 种情感风格。 +Demonstrate all 31 emotional styles supported by Microsoft Azure. -**特点**: -- 完整的 31 种风格列表 -- 每种风格一句示例 -- 表格形式展示 +**Features**: +- Complete list of 31 styles +- One example sentence per style +- Table format presentation -**输出**: `example/output/03-31 种情感风格演示.mp3` +**Output**: `example/output/03-31-emotional-styles-demo.mp3` -### 示例 4: 情感强度控制演示 +### Example 4: Style Degree Control Demo -演示 `styleDegree` 参数(0.01-2.0 范围)。 +Demonstrate the `styleDegree` parameter (range: 0.01-2.0). -**特点**: -- 0.5/1.0/2.0 三种强度对比 -- 使用 `sad` 情感 -- 同一语音不同强度 +**Features**: +- Three intensity levels: 0.5/1.0/2.0 +- Uses `sad` emotional style +- Same voice with different intensities -**输出**: `example/output/04-情感强度控制演示.mp3` +**Output**: `example/output/04-style-degree-control-demo.mp3` -### 示例 5: 文本替换功能演示 +### Example 5: Text Substitution Demo -演示 `substitutions` 参数替换专业术语。 +Demonstrate the `substitutions` parameter for replacing technical terms. -**特点**: +**Features**: - W3C → 万维网联盟 - HTTP → 超文本传输协议 - CEO → Chief Executive Officer -**输出**: `example/output/05-文本替换功能演示.mp3` +**Output**: `example/output/05-text-substitution-demo.mp3` ## API 参数说明 @@ -116,14 +116,14 @@ node example/05-文本替换功能演示.ts ## 输出目录 -所有生成的音频文件保存在: +All generated audio files are saved in: ``` example/output/ -├── 01-播客对话 - 链式调用.mp3 -├── 02-客服对话 - 函数式.mp3 -├── 03-31 种情感风格演示.mp3 -├── 04-情感强度控制演示.mp3 -└── 05-文本替换功能演示.mp3 +├── 01-multi-speaker-dialogue-chained.mp3 +├── 02-multi-speaker-dialogue-functional.mp3 +├── 03-31-emotional-styles-demo.mp3 +├── 04-style-degree-control-demo.mp3 +└── 05-text-substitution-demo.mp3 ``` ## 注意事项 From 62f36afbfde122a287b02fe6dae1cf12802dfe78 Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Sun, 22 Mar 2026 18:22:26 +0800 Subject: [PATCH 07/10] docs: translate src/ JSDoc comments to English - Translated DialogueTurn.ts: interfaces and class comments - Translated SSMLUtils.ts: function and constant comments - Translated DialogueBuilder.ts: class and method comments - Standardized terminology (Dialogue, Turn, Substitution, SSML, etc.) - All Chinese characters removed from src/ JSDoc comments - Build verification: pnpm run build passes --- src/DialogueBuilder.ts | 64 +++++++++++++++++++++--------------------- src/DialogueTurn.ts | 12 ++++---- src/SSMLUtils.ts | 16 +++++------ 3 files changed, 46 insertions(+), 46 deletions(-) diff --git a/src/DialogueBuilder.ts b/src/DialogueBuilder.ts index 23cd05e..a271aee 100644 --- a/src/DialogueBuilder.ts +++ b/src/DialogueBuilder.ts @@ -2,24 +2,24 @@ import { Dialogue, type DialogueTurn } from "./DialogueTurn"; import { escapeSSML, replaceText, validateStyle, validateStyleDegree } from "./SSMLUtils"; /** - * 对话构建器类,用于链式构建多说话人对话 + * Dialogue builder class for chain-building multi-speaker dialogues */ export class DialogueBuilder { private turns: DialogueTurn[] = []; /** - * 创建对话构建器 + * Create a dialogue builder */ constructor() {} /** - * 添加一个对话回合(链式调用) - * @param turn 对话回合对象 - * @returns 当前构建器实例(支持链式调用) - * @throws 当 turn 参数无效时抛出异常 + * Add a dialogue turn (chained call) + * @param turn - Dialogue turn object + * @returns Current builder instance (supports chained calls) + * @throws Throws an error when turn parameter is invalid */ addTurn(turn: DialogueTurn): DialogueBuilder { - // 严格模式验证 + // Strict mode validation if (!turn.voice || turn.voice.trim() === "") { throw new Error("voice name is required and cannot be empty"); } @@ -41,8 +41,8 @@ export class DialogueBuilder { } /** - * 构建 Dialogue 对象 - * @returns 包含所有添加回合的 Dialogue 对象 + * Build a Dialogue object + * @returns Dialogue object containing all added turns */ build(): Dialogue { const dialogue = new Dialogue(); @@ -51,8 +51,8 @@ export class DialogueBuilder { } /** - * 重置构建器状态,清空所有已添加的回合 - * @returns 当前构建器实例(支持链式调用) + * Reset builder state, clearing all added turns + * @returns Current builder instance (supports chained calls) */ reset(): DialogueBuilder { this.turns = []; @@ -61,53 +61,53 @@ export class DialogueBuilder { } /** - * 构建多说话人对话的 SSML 字符串 - * @param turns 对话回合数组 - * @returns 完整的 SSML 字符串 + * Build SSML string for multi-speaker dialogue + * @param turns - Array of dialogue turns + * @returns Complete SSML string */ export function buildDialogueSSML(turns: DialogueTurn[]): string { const voiceElements: string[] = []; for (const turn of turns) { - // 处理文本:先应用替换,后应用 SSML 转义 + // Process text: apply substitutions first, then SSML escaping let processedText = turn.text || ""; - // 应用文本替换(生成 标签) + // Apply text substitution (generate tags) if (turn.substitutions && turn.substitutions.length > 0) { - // 按文本长度降序处理,确保先替换长词 + // Sort by text length descending to ensure longer words are replaced first const sortedSubs = [...turn.substitutions].sort((a, b) => b.text.length - a.text.length); const placeholders: Map = new Map(); for (let i = 0; i < sortedSubs.length; i++) { const sub = sortedSubs[i]; - // 先对 alias 和 text 进行 SSML 转义 + // First escape alias and text for SSML const escapedAlias = escapeSSML(sub.alias); const escapedText = escapeSSML(sub.text); - // 生成 text 标签 + // Generate text tag const subTag = `${escapedText}`; - // 使用唯一占位符 + // Use unique placeholder const placeholder = `__SUB_PLACEHOLDER_${i}__`; placeholders.set(placeholder, subTag); - // 先替换为占位符 + // First replace with placeholder processedText = processedText.replace( new RegExp(sub.text.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), "g"), placeholder ); } - // 应用 SSML 转义 + // Apply SSML escaping processedText = escapeSSML(processedText); - // 恢复 标签 + // Restore tags for (const [placeholder, subTag] of placeholders.entries()) { processedText = processedText.replace(placeholder, subTag); } } else { - // 没有替换时,直接应用 SSML 转义 + // When no substitutions, apply SSML escaping directly processedText = escapeSSML(processedText); } - // 处理 children(如果有) + // Process children (if any) let childrenContent = ""; if (turn.children && turn.children.length > 0) { childrenContent = turn.children @@ -124,15 +124,15 @@ export function buildDialogueSSML(turns: DialogueTurn[]): string { .join(""); } - // 构建 voice 元素内容 + // Build voice element content let voiceContent = childrenContent || processedText; - // 应用 lang(如果有) + // Apply lang (if any) if (turn.lang) { voiceContent = `${voiceContent}`; } - // 应用 style 和 styleDegree(如果有) + // Apply style and styleDegree (if any) if (turn.style) { const styleDegreeAttr = turn.styleDegree !== undefined && turn.styleDegree !== null ? ` styledegree="${turn.styleDegree}"` @@ -140,15 +140,15 @@ export function buildDialogueSSML(turns: DialogueTurn[]): string { voiceContent = `${voiceContent}`; } - // 构建完整的 voice 元素 + // Build complete voice element voiceElements.push(`${voiceContent}`); } - // 推断主要语言(根据第一个 voice 名称) + // Infer primary language (based on first voice name) const firstVoice = turns[0]?.voice || "zh-CN-XiaoxiaoNeural"; - const lang = firstVoice.split("-").slice(0, 2).join("-"); // 提取 "zh-CN" 或 "en-US" + const lang = firstVoice.split("-").slice(0, 2).join("-"); // Extract "zh-CN" or "en-US" - // 构建完整的 SSML + // Build complete SSML return ` ${voiceElements.join("\n")} `; diff --git a/src/DialogueTurn.ts b/src/DialogueTurn.ts index 2eb9cd5..7497794 100644 --- a/src/DialogueTurn.ts +++ b/src/DialogueTurn.ts @@ -1,5 +1,5 @@ /** - * 文本替换接口,用于将文本中的特定字符串替换为别名 + * Text substitution interface for replacing specific strings in text with aliases */ export interface Substitution { text: string; @@ -7,7 +7,7 @@ export interface Substitution { } /** - * 文本片段接口,支持语言指定和文本替换 + * Text segment interface supporting language specification and text substitution */ export interface TextSegment { text: string; @@ -16,7 +16,7 @@ export interface TextSegment { } /** - * 对话轮次接口,定义单个说话者的语音参数和文本内容 + * Dialogue turn interface defining voice parameters and text content for a single speaker */ export interface DialogueTurn { speaker?: string; @@ -30,14 +30,14 @@ export interface DialogueTurn { } /** - * 对话类,包含多个对话轮次并可转换为 SSML + * Dialogue class containing multiple dialogue turns and convertible to SSML */ export class Dialogue { turns: DialogueTurn[] = []; /** - * 将对话转换为 SSML 格式 - * @returns SSML 字符串(占位实现,后续任务会完善) + * Convert dialogue to SSML format + * @returns SSML string (placeholder implementation, will be completed in subsequent tasks) */ toSSML(): string { return ""; diff --git a/src/SSMLUtils.ts b/src/SSMLUtils.ts index 14b4e10..128f631 100644 --- a/src/SSMLUtils.ts +++ b/src/SSMLUtils.ts @@ -1,8 +1,8 @@ import type { Substitution } from "./DialogueTurn"; /** - * 转义 SSML 特殊字符 - * 转义顺序:先 & 后其他,防止重复转义 + * Escape SSML special characters + * Escape order: & first, then others to prevent double escaping */ export function escapeSSML(text: string): string { return text @@ -14,7 +14,7 @@ export function escapeSSML(text: string): string { } /** - * 按顺序替换文本中的匹配项(单次遍历,非递归) + * Replace matches in text sequentially (single pass, non-recursive) */ export function replaceText(text: string, substitutions: Substitution[]): string { let result = text; @@ -25,7 +25,7 @@ export function replaceText(text: string, substitutions: Substitution[]): string } /** - * Microsoft Azure Speech Service 官方支持的 28 种情感风格 + * Officially supported 28 emotional styles by Microsoft Azure Speech Service */ const VALID_STYLES = [ "advertisement_upbeat", @@ -62,8 +62,8 @@ const VALID_STYLES = [ ] as const; /** - * 验证 style 是否为有效的 Microsoft 官方情感风格 - * 无效时抛出 Error + * Validate if style is a valid Microsoft official emotional style + * Throws Error if invalid */ export function validateStyle(style: string): void { if (!VALID_STYLES.includes(style as any)) { @@ -72,8 +72,8 @@ export function validateStyle(style: string): void { } /** - * 验证 styleDegree 范围(0.01-2.0) - * 无效时抛出 Error + * Validate styleDegree range (0.01-2.0) + * Throws Error if invalid */ export function validateStyleDegree(degree: number): void { if (degree < 0.01 || degree > 2.0) { From 05c4cfc4a73b8c6c44d36fff95ba5cd8610a4d2d Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Sun, 22 Mar 2026 19:13:22 +0800 Subject: [PATCH 08/10] docs: translate example/*.ts files to English - Translated all 6 example TypeScript files - Updated all comments, console messages, and error messages to English - Task 7: 00-simple-dialogue-demo.ts - Chinese SSML retained (multilingual demo) - Task 8: 01-multi-speaker-dialogue-chained.ts - Chinese dialogue retained (multilingual demo) - Task 9: 02-multi-speaker-dialogue-functional.ts - Chinese dialogue retained (multilingual demo) - Task 10: 03-31-emotional-styles-demo.ts - Changed to English examples - Task 11: 04-style-degree-control-demo.ts - Changed to English examples - Task 12: 05-text-substitution-demo.ts - Changed substitution examples to English - All output filenames updated to English - Build verification: pnpm run build passes --- example/00-simple-dialogue-demo.ts | 44 ++++----- example/01-multi-speaker-dialogue-chained.ts | 48 ++++----- .../02-multi-speaker-dialogue-functional.ts | 48 ++++----- example/03-31-emotional-styles-demo.ts | 35 ++++--- example/04-style-degree-control-demo.ts | 80 +++++++-------- example/05-text-substitution-demo.ts | 99 ++++++++++--------- 6 files changed, 182 insertions(+), 172 deletions(-) diff --git a/example/00-simple-dialogue-demo.ts b/example/00-simple-dialogue-demo.ts index c686d61..91f727e 100644 --- a/example/00-simple-dialogue-demo.ts +++ b/example/00-simple-dialogue-demo.ts @@ -2,28 +2,28 @@ import * as fs from "fs"; import * as path from "path"; /** - * 示例 0: 简单对话演示 - * 直接使用给定的 SSML 示例(女儿和父亲对话) + * Example 0: Simple Dialogue Demo + * Directly use the given SSML example (daughter-father conversation) */ async function main() { - // 输出装饰框 + // Output decorative box console.log("╔═══════════════════════════════════════════════╗"); - console.log("║ 示例 0: 简单对话演示 ║"); + console.log("║ Example 0: Simple Dialogue Demo ║"); console.log("╚═══════════════════════════════════════════════╝"); console.log(); - // 读取配置 + // Read configuration const configPath = path.join(__dirname, "config.json"); if (!fs.existsSync(configPath)) { - console.error("❌ 错误:config.json 不存在"); - console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); - console.error(`📁 示例文件位置:${configPath}`); + console.error("❌ Error: config.json does not exist"); + console.error("📝 Please copy config.example.json to config.json and fill in your email and password"); + console.error(`📁 Example file location: ${configPath}`); process.exit(1); } const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); - // 给定的 SSML 示例:女儿和父亲对话 + // Given SSML example: daughter-father conversation const ssml = ` @@ -39,8 +39,8 @@ async function main() {
`; - // 显示完整的 SSML - console.log("使用的 SSML:"); + // Display the complete SSML + console.log("SSML Used:"); console.log("┌──────────────────────────────────────────────┐"); const ssmlLines = ssml.split("\n"); for (const line of ssmlLines) { @@ -50,15 +50,15 @@ async function main() { console.log("└──────────────────────────────────────────────┘"); console.log(); - // 输出路径 + // Output path const outputDir = path.join(__dirname, "output"); if (!fs.existsSync(outputDir)) { fs.mkdirSync(outputDir, { recursive: true }); } - const outputPath = path.join(outputDir, "00-简单对话演示.mp3"); + const outputPath = path.join(outputDir, "00-simple-dialogue-demo.mp3"); - // 调用 TTS API - console.log("正在调用 TTS API..."); + // Call TTS API + console.log("Calling TTS API..."); try { const response = await fetch(config.api_url, { @@ -73,21 +73,21 @@ async function main() { }); if (!response.ok) { - throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + throw new Error(`API request failed: ${response.status} ${response.statusText}`); } - // 保存文件 + // Save file const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync(outputPath, buffer); - // 计算文件大小 + // Calculate file size const fileSizeKB = (buffer.length / 1024).toFixed(1); - console.log("✅ 音频生成成功!"); - console.log(`📁 文件已保存:${outputPath}`); - console.log(`📊 文件大小:${fileSizeKB} KB`); + console.log("✅ Audio generation successful!"); + console.log(`📁 File saved: ${outputPath}`); + console.log(`📊 File size: ${fileSizeKB} KB`); } catch (error) { - console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + console.error("❌ Generation failed:", error instanceof Error ? error.message : error); process.exit(1); } } diff --git a/example/01-multi-speaker-dialogue-chained.ts b/example/01-multi-speaker-dialogue-chained.ts index fcf180b..da7a1e5 100644 --- a/example/01-multi-speaker-dialogue-chained.ts +++ b/example/01-multi-speaker-dialogue-chained.ts @@ -3,28 +3,28 @@ import * as fs from "fs"; import * as path from "path"; /** - * 示例 1: 多说话人对话 - 链式调用 - * 使用 DialogueBuilder 构建中英混合播客对话 + * Example 1: Multi-Speaker Dialogue - Chained Call + * Build Chinese-English mixed podcast dialogue using DialogueBuilder */ async function main() { - // 输出装饰框 + // Output decorative box console.log("╔═══════════════════════════════════════════════╗"); - console.log("║ 示例 1: 多说话人对话 - 链式调用 ║"); + console.log("║ Example 1: Multi-Speaker Dialogue - Chained ║"); console.log("╚═══════════════════════════════════════════════╝"); console.log(); - // 读取配置 + // Read configuration const configPath = path.join(__dirname, "config.json"); if (!fs.existsSync(configPath)) { - console.error("❌ 错误:config.json 不存在"); - console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); - console.error(`📁 示例文件位置:${configPath}`); + console.error("❌ Error: config.json does not exist"); + console.error("📝 Please copy config.example.json to config.json and fill in email and password"); + console.error(`📁 Example file location: ${configPath}`); process.exit(1); } const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); - // 构建对话:4 个说话人轮次(2 中文 + 2 英文) + // Build dialogue: 4 speaker turns (2 Chinese + 2 English) const dialogue = new DialogueBuilder() .addTurn({ voice: "zh-CN-XiaoxiaoNeural", @@ -50,14 +50,14 @@ async function main() { }) .build(); - console.log(`生成的对话轮次:${dialogue.turns.length} 个`); + console.log(`Generated dialogue turns: ${dialogue.turns.length}`); console.log(); - // 生成 SSML + // Generate SSML const ssml = buildDialogueSSML(dialogue.turns); - // SSML 预览 - console.log("SSML 预览:"); + // SSML preview + console.log("SSML Preview:"); console.log("┌──────────────────────────────────────────────┐"); const ssmlLines = ssml.split("\n"); for (const line of ssmlLines) { @@ -67,15 +67,15 @@ async function main() { console.log("└──────────────────────────────────────────────┘"); console.log(); - // 输出路径 + // Output path const outputDir = path.join(__dirname, "output"); if (!fs.existsSync(outputDir)) { fs.mkdirSync(outputDir, { recursive: true }); } - const outputPath = path.join(outputDir, "01-播客对话 - 链式调用.mp3"); + const outputPath = path.join(outputDir, "podcast-dialogue-chained.mp3"); - // 调用 TTS API - console.log("正在调用 TTS API..."); + // Call TTS API + console.log("Calling TTS API..."); try { const response = await fetch(config.api_url, { @@ -90,21 +90,21 @@ async function main() { }); if (!response.ok) { - throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + throw new Error(`API request failed: ${response.status} ${response.statusText}`); } - // 保存文件 + // Save file const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync(outputPath, buffer); - // 计算文件大小 + // Calculate file size const fileSizeKB = (buffer.length / 1024).toFixed(1); - console.log("✅ 音频生成成功!"); - console.log(`📁 文件已保存:${outputPath}`); - console.log(`📊 文件大小:${fileSizeKB} KB`); + console.log("✅ Audio generation successful!"); + console.log(`📁 File saved: ${outputPath}`); + console.log(`📊 File size: ${fileSizeKB} KB`); } catch (error) { - console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + console.error("❌ Generation failed:", error instanceof Error ? error.message : error); process.exit(1); } } diff --git a/example/02-multi-speaker-dialogue-functional.ts b/example/02-multi-speaker-dialogue-functional.ts index 3c86c0f..af302e5 100644 --- a/example/02-multi-speaker-dialogue-functional.ts +++ b/example/02-multi-speaker-dialogue-functional.ts @@ -3,28 +3,28 @@ import * as fs from "fs"; import * as path from "path"; /** - * 示例 2: 多说话人对话 - 函数式 - * 使用 buildDialogueSSML 函数构建中英混合客服对话 + * Example 2: Multi-Speaker Dialogue - Functional + * Build Chinese-English mixed customer service dialogue using buildDialogueSSML function */ async function main() { - // 输出装饰框 + // Output decorative box console.log("╔═══════════════════════════════════════════════╗"); - console.log("║ 示例 2: 多说话人对话 - 函数式 ║"); + console.log("║ Example 2: Multi-Speaker Dialogue - Functional ║"); console.log("╚═══════════════════════════════════════════════╝"); console.log(); - // 读取配置 + // Read configuration const configPath = path.join(__dirname, "config.json"); if (!fs.existsSync(configPath)) { - console.error("❌ 错误:config.json 不存在"); - console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); - console.error(`📁 示例文件位置:${configPath}`); + console.error("❌ Error: config.json does not exist"); + console.error("📝 Please copy config.example.json to config.json and fill in your email and password"); + console.error(`📁 Example file location: ${configPath}`); process.exit(1); } const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); - // 构建对话:4 个说话人轮次(2 中文客服 + 2 英文客服) + // Build dialogue: 4 speaker turns (2 Chinese customer service + 2 English customer service) const turns: DialogueTurn[] = [ { voice: "zh-CN-XiaoxiaoNeural", @@ -50,14 +50,14 @@ async function main() { }, ]; - console.log(`构建的对话轮次:${turns.length} 个`); + console.log(`Building dialogue turns: ${turns.length}`); console.log(); - // 生成 SSML + // Generate SSML const ssml = buildDialogueSSML(turns); - // SSML 预览 - console.log("SSML 预览:"); + // SSML preview + console.log("SSML Preview:"); console.log("┌──────────────────────────────────────────────┐"); const ssmlLines = ssml.split("\n"); for (const line of ssmlLines) { @@ -67,15 +67,15 @@ async function main() { console.log("└──────────────────────────────────────────────┘"); console.log(); - // 输出路径 + // Output path const outputDir = path.join(__dirname, "output"); if (!fs.existsSync(outputDir)) { fs.mkdirSync(outputDir, { recursive: true }); } - const outputPath = path.join(outputDir, "02-客服对话 - 函数式.mp3"); + const outputPath = path.join(outputDir, "02-customer-service-dialogue-functional.mp3"); - // 调用 TTS API - console.log("正在调用 TTS API..."); + // Call TTS API + console.log("Calling TTS API..."); try { const response = await fetch(config.api_url, { @@ -90,21 +90,21 @@ async function main() { }); if (!response.ok) { - throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + throw new Error(`API request failed: ${response.status} ${response.statusText}`); } - // 保存文件 + // Save file const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync(outputPath, buffer); - // 计算文件大小 + // Calculate file size const fileSizeKB = (buffer.length / 1024).toFixed(1); - console.log("✅ 音频生成成功!"); - console.log(`📁 文件已保存:${outputPath}`); - console.log(`📊 文件大小:${fileSizeKB} KB`); + console.log("✅ Audio generation successful!"); + console.log(`📁 File saved: ${outputPath}`); + console.log(`📊 File size: ${fileSizeKB} KB`); } catch (error) { - console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + console.error("❌ Generation failed:", error instanceof Error ? error.message : error); process.exit(1); } } diff --git a/example/03-31-emotional-styles-demo.ts b/example/03-31-emotional-styles-demo.ts index 146c363..22863b8 100644 --- a/example/03-31-emotional-styles-demo.ts +++ b/example/03-31-emotional-styles-demo.ts @@ -1,3 +1,10 @@ +/** + * Example 3: 31 Emotional Styles Demo + * + * Demonstrates all 31 emotional styles supported by Microsoft Azure Speech Service. + * Each style is showcased with a sample sentence. + */ + import { MsEdgeTTS, OUTPUT_FORMAT, buildDialogueSSML, type DialogueTurn } from "../src"; import * as fs from "fs"; import * as path from "path"; @@ -14,9 +21,9 @@ const allStyles = [ ]; function printStyleTable(styles: string[]): void { - console.log("\n所有情感风格列表:"); + console.log("\nComplete Emotional Styles List:"); console.log("┌────┬─────────────────────────────────────┐"); - console.log("│ 序号 │ 风格名称 │"); + console.log("│ No. │ Style Name │"); console.log("├────┼─────────────────────────────────────┤"); styles.forEach((style, index) => { @@ -30,7 +37,7 @@ function printStyleTable(styles: string[]): void { async function main(): Promise { console.log("╔═══════════════════════════════════════════════╗"); - console.log("║ 示例 3: 31 种情感风格演示 ║"); + console.log("║ Example 3: 31 Emotional Styles Demo ║"); console.log("╚═══════════════════════════════════════════════╝"); printStyleTable(allStyles); @@ -44,8 +51,8 @@ async function main(): Promise { email = config.email; password = config.password; } catch (error) { - console.error("错误:无法读取 config.json,请确保已创建配置文件"); - console.error("提示:复制 config.example.json 为 config.json 并填写邮箱密码"); + console.error("Error: Unable to read config.json. Please ensure the config file exists."); + console.error("Tip: Copy config.example.json to config.json and fill in your email and password."); process.exit(1); } @@ -53,17 +60,17 @@ async function main(): Promise { const voiceName = "zh-CN-XiaoxiaoNeural"; const outputFormat = OUTPUT_FORMAT.AUDIO_24KHZ_48KBITRATE_MONO_MP3; - console.log(`\n使用语音:${voiceName}`); - console.log(`输出格式:MP3`); + console.log(`\nUsing voice: ${voiceName}`); + console.log(`Output format: MP3`); const turns: DialogueTurn[] = allStyles.map((style, index) => ({ voice: voiceName, - text: `这是第${index + 1}种情感风格,${style}。`, + text: `This is style number ${index + 1}: ${style}.`, style: style })); const ssml = buildDialogueSSML(turns); - console.log(`\n生成的 SSML 长度:${ssml.length} 字符`); + console.log(`\nGenerated SSML length: ${ssml.length} characters`); try { await tts.setMetadata(voiceName, outputFormat); @@ -73,18 +80,18 @@ async function main(): Promise { fs.mkdirSync(outputDir, { recursive: true }); } - const outputPath = path.join(outputDir, "03-31 种情感风格演示.mp3"); + const outputPath = path.join(outputDir, "03-31-emotional-styles-demo.mp3"); - console.log(`\n正在生成音频...`); + console.log(`\nGenerating audio...`); const { audioFilePath } = await tts.toFile(outputDir, ssml); fs.renameSync(audioFilePath, outputPath); - console.log(`\n✅ 音频已保存到:${outputPath}`); - console.log(`✅ 共生成 ${allStyles.length} 种情感风格演示`); + console.log(`\n✅ Audio saved to: ${outputPath}`); + console.log(`✅ Generated ${allStyles.length} emotional style demonstrations`); } catch (error) { - console.error("\n❌ 生成音频时出错:"); + console.error("\n❌ Error generating audio:"); if (error instanceof Error) { console.error(error.message); } else { diff --git a/example/04-style-degree-control-demo.ts b/example/04-style-degree-control-demo.ts index d0f08bf..5acfb66 100644 --- a/example/04-style-degree-control-demo.ts +++ b/example/04-style-degree-control-demo.ts @@ -3,74 +3,74 @@ import * as fs from "fs"; import * as path from "path"; /** - * 示例 4: 情感强度控制演示 - * 演示 styleDegree 参数(范围 0.01-2.0)对情感表达的影响 + * Example 4: Style Degree Control Demo + * Demonstrates the effect of styleDegree parameter (range 0.01-2.0) on emotional expression */ async function main() { - // 输出装饰框 + // Output decorative box console.log("╔═══════════════════════════════════════════════╗"); - console.log("║ 示例 4: 情感强度控制演示 ║"); + console.log("║ Example 4: Style Degree Control Demo ║"); console.log("╚═══════════════════════════════════════════════╝"); console.log(); - // 读取配置 + // Read configuration const configPath = path.join(__dirname, "config.json"); if (!fs.existsSync(configPath)) { - console.error("❌ 错误:config.json 不存在"); - console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); - console.error(`📁 示例文件位置:${configPath}`); + console.error("❌ Error: config.json does not exist"); + console.error("📝 Please copy config.example.json to config.json and fill in your email and password"); + console.error(`📁 Example file location: ${configPath}`); process.exit(1); } const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); - // 输出 styleDegree 说明 - console.log("📖 styleDegree 参数说明:"); + // Output styleDegree explanation + console.log("📖 styleDegree Parameter Explanation:"); console.log("┌──────────────────────────────────────────────┐"); - console.log("│ 范围:0.01 - 2.0 │"); - console.log("│ 0.5: 较弱的情感表达 │"); - console.log("│ 1.0: 正常情感表达(默认) │"); - console.log("│ 2.0: 最强情感表达 │"); + console.log("│ Range: 0.01 - 2.0 │"); + console.log("│ 0.5: Weaker emotional expression │"); + console.log("│ 1.0: Normal emotional expression (default) │"); + console.log("│ 2.0: Strongest emotional expression │"); console.log("└──────────────────────────────────────────────┘"); console.log(); - // 构建对话:同一句话,三种不同强度 + // Build dialogue: same sentence, three different intensities const turns: DialogueTurn[] = [ { voice: "zh-CN-XiaomoNeural", - text: "这很正常", + text: "This is normal", style: "sad", - styleDegree: 0.5, // 较弱 + styleDegree: 0.5, // Weaker }, { voice: "zh-CN-XiaomoNeural", - text: "这真的很令人难过", + text: "This is really sad", style: "sad", - styleDegree: 1.0, // 正常 + styleDegree: 1.0, // Normal }, { voice: "zh-CN-XiaomoNeural", - text: "这简直太让人心碎了!", + text: "This is absolutely heartbreaking!", style: "sad", - styleDegree: 2.0, // 最强 + styleDegree: 2.0, // Strongest }, ]; - // 显示对话内容 - console.log("📝 对话内容:"); + // Display dialogue content + console.log("📝 Dialogue Content:"); console.log("┌──────────────────────────────────────────────┐"); turns.forEach((turn, index) => { - const intensity = turn.styleDegree === 0.5 ? "较弱" : turn.styleDegree === 1.0 ? "正常" : "最强"; - console.log(`│ ${index + 1}. [强度${intensity}] ${turn.text.padEnd(25)} │`); + const intensity = turn.styleDegree === 0.5 ? "Weaker" : turn.styleDegree === 1.0 ? "Normal" : "Strongest"; + console.log(`│ ${index + 1}. [Intensity: ${intensity}] ${turn.text.padEnd(25)} │`); }); console.log("└──────────────────────────────────────────────┘"); console.log(); - // 生成 SSML + // Generate SSML const ssml = buildDialogueSSML(turns); - // SSML 预览 - console.log("📄 SSML 预览:"); + // SSML preview + console.log("📄 SSML Preview:"); console.log("┌──────────────────────────────────────────────┐"); const ssmlLines = ssml.split("\n"); for (const line of ssmlLines) { @@ -80,15 +80,15 @@ async function main() { console.log("└──────────────────────────────────────────────┘"); console.log(); - // 输出路径 + // Output path const outputDir = path.join(__dirname, "output"); if (!fs.existsSync(outputDir)) { fs.mkdirSync(outputDir, { recursive: true }); } - const outputPath = path.join(outputDir, "04-情感强度控制演示.mp3"); + const outputPath = path.join(outputDir, "04-style-degree-control-demo.mp3"); - // 调用 TTS API - console.log("🎙️ 正在调用 TTS API..."); + // Call TTS API + console.log("🎙️ Calling TTS API..."); try { const response = await fetch(config.api_url, { @@ -103,23 +103,23 @@ async function main() { }); if (!response.ok) { - throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + throw new Error(`API request failed: ${response.status} ${response.statusText}`); } - // 保存文件 + // Save file const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync(outputPath, buffer); - // 计算文件大小 + // Calculate file size const fileSizeKB = (buffer.length / 1024).toFixed(1); - console.log("✅ 音频生成成功!"); - console.log(`📁 文件已保存:${outputPath}`); - console.log(`📊 文件大小:${fileSizeKB} KB`); + console.log("✅ Audio generation successful!"); + console.log(`📁 File saved: ${outputPath}`); + console.log(`📊 File size: ${fileSizeKB} KB`); console.log(); - console.log("💡 提示:播放音频对比三种情感强度的差异"); + console.log("💡 Tip: Play the audio to compare the differences between the three emotional intensities"); } catch (error) { - console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + console.error("❌ Generation failed:", error instanceof Error ? error.message : error); process.exit(1); } } diff --git a/example/05-text-substitution-demo.ts b/example/05-text-substitution-demo.ts index 358f54c..254f1dd 100644 --- a/example/05-text-substitution-demo.ts +++ b/example/05-text-substitution-demo.ts @@ -3,46 +3,46 @@ import * as fs from "fs"; import * as path from "path"; /** - * 示例 5: 文本替换功能演示 - * 演示 substitutions 参数,展示专业术语替换(W3C, HTTP, CEO 等) + * Example 5: Text Substitution Demo + * Demonstrates the substitutions parameter with technical term replacements (W3C, HTTP, CEO, etc.) */ async function main() { - // 输出装饰框 + // Output decorative box console.log("╔═══════════════════════════════════════════════╗"); - console.log("║ 示例 5: 文本替换功能演示 ║"); + console.log("║ Example 5: Text Substitution Demo ║"); console.log("╚═══════════════════════════════════════════════╝"); console.log(); - // 读取配置 + // Read configuration const configPath = path.join(__dirname, "config.json"); if (!fs.existsSync(configPath)) { - console.error("❌ 错误:config.json 不存在"); - console.error("📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"); - console.error(`📁 示例文件位置:${configPath}`); + console.error("❌ Error: config.json does not exist"); + console.error("📝 Please copy config.example.json to config.json and fill in email and password"); + console.error(`📁 Example file location: ${configPath}`); process.exit(1); } const config = JSON.parse(fs.readFileSync(configPath, "utf-8")); - // 输出 substitutions 说明 - console.log("📖 substitutions 参数说明:"); + // Output substitutions explanation + console.log("📖 substitutions Parameter Explanation:"); console.log("┌──────────────────────────────────────────────┐"); - console.log("│ 格式:{ text: string, alias: string } │"); - console.log("│ text: 原文中的词 │"); - console.log("│ alias: 朗读时使用的别名 │"); - console.log("│ SSML 生成 text 标签 │"); + console.log("│ Format: { text: string, alias: string } │"); + console.log("│ text: The word in the original text │"); + console.log("│ alias: The alias used during reading │"); + console.log("│ SSML generates text tag│"); console.log("└──────────────────────────────────────────────┘"); console.log(); - // 构建对话:演示专业术语替换 + // Build dialogue: demonstrate technical term substitution const turns: DialogueTurn[] = [ { voice: "zh-CN-XiaoxiaoNeural", - text: "W3C 制定了 Web 标准,API 基于 HTTP 协议", + text: "W3C develops Web standards, API is based on HTTP protocol", substitutions: [ - { text: "W3C", alias: "万维网联盟" }, - { text: "Web", alias: "万维网" }, - { text: "HTTP", alias: "超文本传输协议" }, + { text: "W3C", alias: "World Wide Web Consortium" }, + { text: "Web", alias: "World Wide Web" }, + { text: "HTTP", alias: "Hypertext Transfer Protocol" }, ], style: "narration-professional", }, @@ -57,31 +57,34 @@ async function main() { }, ]; - // 显示替换前后的对比 - console.log("📝 替换前后对比:"); + // Display before/after substitution comparison + console.log("📝 Before/After Substitution Comparison:"); console.log("┌──────────────────────────────────────────────┐"); - console.log("│ 【中文部分】 │"); - console.log("│ 原文:W3C 制定了 Web 标准,API 基于 HTTP 协议 │"); - console.log("│ 朗读:万维网联盟制定了万维网标准,API 基于超文本 │"); - console.log("│ 传输协议 │"); + console.log("│ [Chinese Part] │"); + console.log("│ Original: W3C develops Web standards, API is │"); + console.log("│ based on HTTP protocol │"); + console.log("│ Reading: World Wide Web Consortium develops │"); + console.log("│ World Wide Web standards, API is │"); + console.log("│ based on Hypertext Transfer Protocol│"); console.log("├──────────────────────────────────────────────┤"); - console.log("│ 【英文部分】 │"); - console.log("│ 原文:The CEO said: innovation drives success │"); - console.log("│ 朗读:The Chief Executive Officer said: │"); - console.log("│ innovation drives success │"); + console.log("│ [English Part] │"); + console.log("│ Original: The CEO said: innovation drives │"); + console.log("│ success │"); + console.log("│ Reading: The Chief Executive Officer said: │"); + console.log("│ innovation drives success │"); console.log("└──────────────────────────────────────────────┘"); console.log(); - // 显示替换规则列表 - console.log("📋 替换规则列表:"); + // Display substitution rules list + console.log("📋 Substitution Rules List:"); console.log("┌──────────────────────────────────────────────┐"); - console.log("│ 中文部分替换规则: │"); + console.log("│ Chinese Part Substitution Rules: │"); turns[0].substitutions?.forEach((sub) => { const line = `│ "${sub.text}" → "${sub.alias}"`.padEnd(47) + "│"; console.log(line); }); console.log("├──────────────────────────────────────────────┤"); - console.log("│ 英文部分替换规则: │"); + console.log("│ English Part Substitution Rules: │"); turns[1].substitutions?.forEach((sub) => { const line = `│ "${sub.text}" → "${sub.alias}"`.padEnd(47) + "│"; console.log(line); @@ -89,11 +92,11 @@ async function main() { console.log("└──────────────────────────────────────────────┘"); console.log(); - // 生成 SSML + // Generate SSML const ssml = buildDialogueSSML(turns); - // SSML 预览 - console.log("📄 SSML 预览:"); + // SSML Preview + console.log("📄 SSML Preview:"); console.log("┌──────────────────────────────────────────────┐"); const ssmlLines = ssml.split("\n"); for (const line of ssmlLines) { @@ -103,15 +106,15 @@ async function main() { console.log("└──────────────────────────────────────────────┘"); console.log(); - // 输出路径 + // Output path const outputDir = path.join(__dirname, "output"); if (!fs.existsSync(outputDir)) { fs.mkdirSync(outputDir, { recursive: true }); } - const outputPath = path.join(outputDir, "05-文本替换功能演示.mp3"); + const outputPath = path.join(outputDir, "05-text-substitution-demo.mp3"); - // 调用 TTS API - console.log("🎙️ 正在调用 TTS API..."); + // Call TTS API + console.log("🎙️ Calling TTS API..."); try { const response = await fetch(config.api_url, { @@ -126,23 +129,23 @@ async function main() { }); if (!response.ok) { - throw new Error(`API 请求失败:${response.status} ${response.statusText}`); + throw new Error(`API request failed: ${response.status} ${response.statusText}`); } - // 保存文件 + // Save file const buffer = Buffer.from(await response.arrayBuffer()); fs.writeFileSync(outputPath, buffer); - // 计算文件大小 + // Calculate file size const fileSizeKB = (buffer.length / 1024).toFixed(1); - console.log("✅ 音频生成成功!"); - console.log(`📁 文件已保存:${outputPath}`); - console.log(`📊 文件大小:${fileSizeKB} KB`); + console.log("✅ Audio generation successful!"); + console.log(`📁 File saved: ${outputPath}`); + console.log(`📊 File size: ${fileSizeKB} KB`); console.log(); - console.log("💡 提示:播放音频对比替换前后的朗读效果"); + console.log("💡 Tip: Play the audio to compare the reading effect before and after substitution"); } catch (error) { - console.error("❌ 生成失败:", error instanceof Error ? error.message : error); + console.error("❌ Generation failed:", error instanceof Error ? error.message : error); process.exit(1); } } From df07e0d5a993a403ce79be42d1ca8da868fdd2b9 Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Sun, 22 Mar 2026 19:28:47 +0800 Subject: [PATCH 09/10] docs: translate documentation to English Wave 4 - Documentation Translation: - example/run.sh: Translated all shell comments and echo messages - example/README.md: Translated complete example documentation - AGENTS.md: Translated project knowledge base (184 lines) - docs/ssml-structure.md: Translated SSML structure documentation (252 lines) - docs/ssml-voice.md: Translated SSML voice documentation (226 lines) - docs/ssml-pronunciation.md: Translated SSML pronunciation docs (199 lines) All documentation now in English with: - Technical documentation style - Accurate SSML terminology - Microsoft documentation attribution retained - Build verification: pnpm run build passes --- AGENTS.md | 214 ++++++++++++++++++------------------- docs/ssml-pronunciation.md | 112 ++++++++++--------- docs/ssml-structure.md | 202 +++++++++++++++++----------------- docs/ssml-voice.md | 188 +++++++++++++++++--------------- example/README.md | 60 +++++------ example/run.sh | 50 ++++----- 6 files changed, 424 insertions(+), 402 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index faae599..30d1939 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -6,27 +6,27 @@ ## OVERVIEW -Microsoft Edge TTS 文本转语音库 - 使用 Azure Speech Service (Microsoft Edge Read Aloud API) 的 Node.js/TypeScript 模块。支持语音合成、SSML、多说话人对话、情感风格控制、多种音频格式输出。 +Microsoft Edge TTS Text-to-Speech Library - A Node.js/TypeScript module using Azure Speech Service (Microsoft Edge Read Aloud API). Supports speech synthesis, SSML, multi-speaker dialogue, emotional style control, and multiple audio format output. -**核心栈**: TypeScript, WebSocket, Jest (测试), pnpm (包管理器) -**代码规模**: ~1010 行 TypeScript (src/ 目录) -**更新时间**: 2026-03-22 +**Core Stack**: TypeScript, WebSocket, Jest (testing), pnpm (package manager) +**Code Size**: ~1010 lines of TypeScript (src/ directory) +**Last Updated**: 2026-03-22 ## STRUCTURE ``` ./ -├── src/ # 全部源代码(9 个 TypeScript 文件) -│ ├── index.ts # 主入口点(barrel exports,6 个导出) -│ ├── MsEdgeTTS.ts # 核心 TTS 类(~499 行,WebSocket 通信) -│ ├── MsEdgeTTS.spec.ts # 单元测试 -│ ├── Output.ts # 音频输出格式枚举 + 扩展名映射 -│ ├── Prosody.ts # 语速/音调/音量选项类 -│ ├── DialogueTurn.ts # 对话轮次类型定义 -│ ├── DialogueBuilder.ts # 对话构建器类 + SSML 构建函数 -│ ├── SSMLUtils.ts # SSML 工具函数(转义、验证) -│ └── utils.ts # 路径拼接工具 -├── example/ # 示例演示代码(6 个中文命名文件) +├── src/ # All source code (9 TypeScript files) +│ ├── index.ts # Main entry point (barrel exports, 6 exports) +│ ├── MsEdgeTTS.ts # Core TTS class (~499 lines, WebSocket communication) +│ ├── MsEdgeTTS.spec.ts # Unit tests +│ ├── Output.ts # Audio output format enum + extension mapping +│ ├── Prosody.ts # Rate/pitch/volume options class +│ ├── DialogueTurn.ts # Dialogue turn type definition +│ ├── DialogueBuilder.ts # Dialogue builder class + SSML builder function +│ ├── SSMLUtils.ts # SSML utility functions (escape, validate) +│ └── utils.ts # Path joining utility +├── example/ # Example demo code (6 Chinese-named files) │ ├── 00-简单对话演示.ts │ ├── 01-多说话人对话 - 链式调用.ts │ ├── 02-多说话人对话 - 函数式.ts @@ -34,151 +34,151 @@ Microsoft Edge TTS 文本转语音库 - 使用 Azure Speech Service (Microsoft E │ ├── 04-情感强度控制演示.ts │ └── 05-文本替换功能演示.ts ├── .github/workflows/ -│ └── deploy_docs.yml # CI/CD:仅文档部署到 gh-pages -├── docs/ # 手动编写的 SSML 文档 -├── package.json # 依赖 + Jest 配置(内联) -├── tsconfig.json # TypeScript 编译配置 -└── README.md # API 文档 +│ └── deploy_docs.yml # CI/CD: Documentation deployment to gh-pages only +├── docs/ # Manually written SSML documentation +├── package.json # Dependencies + Jest config (inline) +├── tsconfig.json # TypeScript compilation configuration +└── README.md # API documentation ``` ## WHERE TO LOOK -| 任务 | 位置 | 说明 | +| Task | Location | Description | |------|------|------| -| 添加新功能 | `src/` | 直接在同级创建 `.ts` 文件 | -| 修改核心逻辑 | `src/MsEdgeTTS.ts` | WebSocket 通信、SSML 处理 | -| 添加音频格式 | `src/Output.ts` | `OUTPUT_FORMAT` 枚举 | -| 修改语音选项 | `src/Prosody.ts` | `ProsodyOptions` 类 | -| 添加测试 | `src/*.spec.ts` | 测试与源码同目录 | -| 修改 CI/CD | `.github/workflows/` | 仅文档部署流程 | -| 配置 Jest | `package.json` | Jest 配置内联在 package.json | +| Add new feature | `src/` | Create `.ts` file at same level | +| Modify core logic | `src/MsEdgeTTS.ts` | WebSocket communication, SSML processing | +| Add audio format | `src/Output.ts` | `OUTPUT_FORMAT` enum | +| Modify voice options | `src/Prosody.ts` | `ProsodyOptions` class | +| Add tests | `src/*.spec.ts` | Tests in same directory as source | +| Modify CI/CD | `.github/workflows/` | Documentation deployment flow only | +| Configure Jest | `package.json` | Jest config inline in package.json | ## CODE MAP | Symbol | Type | Location | Role | |--------|------|----------|------| -| `MsEdgeTTS` | Class | `src/MsEdgeTTS.ts` | 主类:WebSocket 连接、语音合成 | -| `OUTPUT_FORMAT` | Enum | `src/Output.ts` | 支持的音频输出格式(MP3, WEBM) | -| `OUTPUT_EXTENSIONS` | Const | `src/Output.ts` | 格式到文件扩展名映射 | -| `ProsodyOptions` | Class | `src/Prosody.ts` | 语速/音调/音量配置选项 | -| `RATE` | Enum | `src/Prosody.ts` | 语速预设(x-slow 到 x-fast) | -| `PITCH` | Enum | `src/Prosody.ts` | 音调预设(x-low 到 x-high) | -| `VOLUME` | Enum | `src/Prosody.ts` | 音量预设(silent 到 x-LOUD) | -| `Voice` | Type | `src/MsEdgeTTS.ts` | 语音元数据结构 | -| `MetadataOptions` | Class | `src/MsEdgeTTS.ts` | 边界元数据选项(句子/单词) | -| `DialogueBuilder` | Class | `src/DialogueBuilder.ts` | 链式对话构建器 | -| `buildDialogueSSML` | Function | `src/DialogueBuilder.ts` | 函数式 SSML 生成 | -| `escapeSSML` | Function | `src/SSMLUtils.ts` | XML 转义(& < > " ') | -| `validateStyle` | Function | `src/SSMLUtils.ts` | 验证 28 种官方情感风格 | -| `validateStyleDegree` | Function | `src/SSMLUtils.ts` | 验证 styleDegree 范围(0.01-2.0) | -| `joinPath` | Function | `src/utils.ts` | 路径拼接工具 | +| `MsEdgeTTS` | Class | `src/MsEdgeTTS.ts` | Main class: WebSocket connection, speech synthesis | +| `OUTPUT_FORMAT` | Enum | `src/Output.ts` | Supported audio output formats (MP3, WEBM) | +| `OUTPUT_EXTENSIONS` | Const | `src/Output.ts` | Format to file extension mapping | +| `ProsodyOptions` | Class | `src/Prosody.ts` | Rate/pitch/volume configuration options | +| `RATE` | Enum | `src/Prosody.ts` | Speaking rate presets (x-slow to x-fast) | +| `PITCH` | Enum | `src/Prosody.ts` | Pitch presets (x-low to x-high) | +| `VOLUME` | Enum | `src/Prosody.ts` | Volume presets (silent to x-LOUD) | +| `Voice` | Type | `src/MsEdgeTTS.ts` | Voice metadata structure | +| `MetadataOptions` | Class | `src/MsEdgeTTS.ts` | Boundary metadata options (sentence/word) | +| `DialogueBuilder` | Class | `src/DialogueBuilder.ts` | Chained dialogue builder | +| `buildDialogueSSML` | Function | `src/DialogueBuilder.ts` | Functional SSML generation | +| `escapeSSML` | Function | `src/SSMLUtils.ts` | XML escape (& < > " ') | +| `validateStyle` | Function | `src/SSMLUtils.ts` | Validate 28 official emotional styles | +| `validateStyleDegree` | Function | `src/SSMLUtils.ts` | Validate styleDegree range (0.01-2.0) | +| `joinPath` | Function | `src/utils.ts` | Path joining utility | ## CONVENTIONS -**TypeScript 配置**: +**TypeScript Configuration**: - `target`: ESNext - `module`: CommonJS - `outDir`: dist/ -- 跳过库检查(skipLibCheck: true) +- Skip library check (skipLibCheck: true) -**测试约定**: -- 测试文件:`*.spec.ts` 与源码同目录 -- Jest 配置内联在 `package.json` -- 测试超时:15 秒 +**Testing Conventions**: +- Test files: `*.spec.ts` in same directory as source +- Jest config inline in `package.json` +- Test timeout: 15 seconds -**包管理器**: -- 强制使用 `pnpm`(preinstall 钩子) -- 版本锁定:pnpm-lock.yaml +**Package Manager**: +- pnpm required (preinstall hook) +- Version lock: pnpm-lock.yaml -**错误处理约定**: -- 验证失败时抛出明确 Error(见 SSMLUtils.ts) -- 无效输入立即抛出,不调用 fallback +**Error Handling Conventions**: +- Throw clear Error on validation failure (see SSMLUtils.ts) +- Invalid input throws immediately, no fallback -**日志约定**: -- 可选 logger 通过 `enableLogger` 选项启用 -- 使用私有 `_log()` 方法记录 -- 仅记录连接状态、消息收发 +**Logging Conventions**: +- Optional logger via `enableLogger` option +- Private `_log()` method for logging +- Log only connection status, message exchange -**SSML 处理约定**: -- 转义顺序先 & 后其他,防止重复转义 -- 仅支持 `speak`, `voice`, `prosody` 元素 +**SSML Processing Conventions**: +- Escape & first, then others, to prevent double escaping +- Only `speak`, `voice`, `prosody` elements supported ## ANTI-PATTERNS (THIS PROJECT) -- ❌ **不要** 使用 npm/yarn - 项目强制使用 pnpm -- ❌ **不要** 将测试移至独立目录 - 保持 `*.spec.ts` 与源码同级 -- ❌ **不要** 修改 tsconfig 的 module/moduleResolution - 依赖 CommonJS -- ❌ **不要** 修改 Sec-MS-GEC 哈希算法 - 依赖 Azure 认证机制 -- ❌ **不要** 删除 `isomorphic-ws` 依赖 - 实现跨环境兼容 -- ❌ **不要** 使用回调 API - 仅支持 Promise -- ❌ **不要** 在浏览器中使用 - API 需要 Edge User-Agent(仅服务器端) -- ❌ **不要** 删除 `dist/` 外的文件 - 发布仅包含 dist 目录 +- ❌ **Do NOT** use npm/yarn - project requires pnpm +- ❌ **Do NOT** move tests to separate directory - keep `*.spec.ts` alongside source +- ❌ **Do NOT** modify tsconfig module/moduleResolution - depends on CommonJS +- ❌ **Do NOT** modify Sec-MS-GEC hash algorithm - depends on Azure authentication +- ❌ **Do NOT** remove `isomorphic-ws` dependency - enables cross-environment compatibility +- ❌ **Do NOT** use callback API - Promise only +- ❌ **Do NOT** use in browser - API requires Edge User-Agent (server-side only) +- ❌ **Do NOT** delete files outside `dist/` - publish includes only dist directory ## ERROR HANDLING -**抛出 Error 的场景**: -- 未配置 metadata:`"Speech synthesis not configured yet..."` -- 无效 voiceLocale:`"Could not infer voiceLocale from voiceName..."` -- 无效 style:`'Invalid style "xxx". Valid styles: ...'` -- styleDegree 越界:`"styleDegree must be between 0.01 and 2.0"` -- 空 voice 名称:`"voice name is required and cannot be empty"` -- 空文本:`"text cannot be empty string"` +**Error Throwing Scenarios**: +- Metadata not configured: `"Speech synthesis not configured yet..."` +- Invalid voiceLocale: `"Could not infer voiceLocale from voiceName..."` +- Invalid style: `'Invalid style "xxx". Valid styles: ...'` +- styleDegree out of range: `"styleDegree must be between 0.01 and 2.0"` +- Empty voice name: `"voice name is required and cannot be empty"` +- Empty text: `"text cannot be empty string"` ## UNIQUE STYLES -**SSML 模板**: -- 默认模板:`` → `` → `` -- 仅支持 `speak`, `voice`, `prosody` 元素 -- 不支持完整 SSML +**SSML Template**: +- Default template: `` → `` → `` +- Only `speak`, `voice`, `prosody` elements supported +- Full SSML not supported -**WebSocket 通信**: -- 使用 `isomorphic-ws` 实现浏览器/Node 兼容 -- 自定义 UUID 生成(非 crypto.randomUUID) -- Sec-MS-GEC 哈希认证机制 +**WebSocket Communication**: +- Uses `isomorphic-ws` for browser/Node compatibility +- Custom UUID generation (not crypto.randomUUID) +- Sec-MS-GEC hash authentication mechanism -**日志系统**: -- 可选 logger(enableLogger 选项) -- 仅记录连接状态、消息收发 +**Logging System**: +- Optional logger (enableLogger option) +- Logs only connection status, message exchange ## COMMANDS ```bash -# 安装依赖 +# Install dependencies pnpm install -# 开发(构建 + 运行测试) +# Development (build + run tests) pnpm run dev -# 编译 TypeScript +# Compile TypeScript pnpm run build -# 运行测试 +# Run tests pnpm test -# 测试(监听模式) +# Tests (watch mode) pnpm run test:watch -# 测试(覆盖率) +# Tests (coverage) pnpm run test:cov -# 发布到 npm +# Publish to npm pnpm run publish ``` ## NOTES -**关键限制**: -- 2025 年 12 月更新:API 需要 Edge User-Agent,**浏览器中无法使用** -- 仅支持 Promise API,不支持回调 -- 语音列表需要可信客户端 Token(硬编码在源码中) +**Key Limitations**: +- December 2025 update: API requires Edge User-Agent, **cannot be used in browsers** +- Promise API only, no callback support +- Voice list requires trusted client Token (hardcoded in source) -**已知问题**: -- package.json 中的 `src/test/test.ts` 和 `src/test/jest-e2e.json` 不存在(遗留配置) -- 测试覆盖率不足:仅 1 个测试文件(MsEdgeTTS.spec.ts),覆盖率 11% -- utils.ts 过于简化(仅 6 行代码),可考虑合并 -- example/ 目录混合非 TS 文件(config.json, run.sh 等) +**Known Issues**: +- `src/test/test.ts` and `src/test/jest-e2e.json` in package.json do not exist (legacy config) +- Insufficient test coverage: only 1 test file (MsEdgeTTS.spec.ts), 11% coverage +- utils.ts is too simplified (only 6 lines), could be merged +- example/ directory mixes non-TS files (config.json, run.sh, etc.) -**发布流程**: -1. `pnpm run build` 编译到 dist/ +**Publish Flow**: +1. `pnpm run build` compiles to dist/ 2. `pnpm publish --access=public` -3. 文档自动部署到 gh-pages(通过 GitHub Actions) +3. Documentation auto-deploys to gh-pages (via GitHub Actions) diff --git a/docs/ssml-pronunciation.md b/docs/ssml-pronunciation.md index 548245f..52802b1 100644 --- a/docs/ssml-pronunciation.md +++ b/docs/ssml-pronunciation.md @@ -1,19 +1,19 @@ -# 语音合成标记语言 (SSML) 的发音 - 语音服务 - Foundry Tools | Microsoft Learn +# Pronunciation in Speech Synthesis Markup Language (SSML) - Speech Service - Foundry Tools | Microsoft Learn -可以将语音合成标记语言 (SSML) 与 text to speech 一起使用,以指定语音的发音方式。 例如,可以将 SSML 与音素和自定义词典配合使用来改进发音。 +Speech Synthesis Markup Language (SSML) can be used with text-to-speech to specify how speech should be pronounced. For example, SSML can be used with phonemes and custom dictionaries to improve pronunciation. -## 音素元素 +## Phoneme Element -`phoneme` 元素用于 SSML 文档中的发音。 始终提供人类可读的语音作为备用方案。 +The `phoneme` element is used for pronunciation in SSML documents. Always provide human-readable speech as a fallback. -| Attribute | 说明 | 必需还是可选 | +| Attribute | Description | Required or Optional | | --- | --- | --- | -| `alphabet` | 音标字母表。 支持:`ipa`, `sapi`, `ups`, `x-sampa`。 | 可选 | -| `ph` | 包含用于指定单词发音的音素字符串。 | 必选 | +| `alphabet` | Phonetic alphabet. Supported: `ipa`, `sapi`, `ups`, `x-sampa`. | Optional | +| `ph` | Phoneme string containing the pronunciation of the word. | Required | -### 音素示例 +### Phoneme Examples -使用 IPA 字母表: +Using the IPA alphabet: ```xml @@ -23,7 +23,7 @@ ``` -使用 SAPI 字母表: +Using the SAPI alphabet: ```xml @@ -33,7 +33,7 @@ ``` -使用 x-sampa 字母表: +Using the x-sampa alphabet: ```xml @@ -43,15 +43,15 @@ ``` -## 自定义词典 +## Custom Dictionary -使用 `lexicon` 元素引用自定义词典 XML 文件来定义多个实体的发音。 +Use the `lexicon` element to reference a custom dictionary XML file to define pronunciations for multiple entities. -| Attribute | 说明 | 必需还是可选 | +| Attribute | Description | Required or Optional | | --- | --- | --- | -| `uri` | 自定义词典 XML 文件的 URI(`.xml` 或 `.pls`)。 | 必选 | +| `uri` | URI of the custom dictionary XML file (`.xml` or `.pls`). | Required | -### 自定义词典示例 +### Custom Dictionary Example ```xml ``` -### 自定义词典文件格式 +### Custom Dictionary File Format ```xml @@ -87,22 +87,22 @@ ``` -**限制**: -- 文件大小最大 100 KB -- 词典缓存 15 分钟刷新 -- 一个词典仅限一种区域设置 +**Limitations**: +- Maximum file size: 100 KB +- Dictionary cache refreshes every 15 minutes +- One locale per dictionary -## Say-as 元素 +## Say-as Element -指示元素文本的内容类型(如数字、日期等)。 +Indicates the content type of the element text (such as numbers, dates, etc.). -| Attribute | 说明 | 必需还是可选 | +| Attribute | Description | Required or Optional | | --- | --- | --- | -| `interpret-as` | 内容类型。 支持:`characters`, `cardinal`, `ordinal`, `date`, `time`, `currency`, `telephone` 等。 | 必选 | -| `format` | 精确格式(如 `mdy`, `hms12` 等)。 | 可选 | -| `detail` | 朗读细节层次。 | 可选 | +| `interpret-as` | Content type. Supported: `characters`, `cardinal`, `ordinal`, `date`, `time`, `currency`, `telephone`, etc. | Required | +| `format` | Exact format (such as `mdy`, `hms12`, etc.). | Optional | +| `detail` | Level of detail for reading. | Optional | -### Say-as 示例 +### Say-as Examples ```xml @@ -116,28 +116,28 @@ ``` -### 支持的 interpret-as 值 +### Supported interpret-as Values -| interpret-as | 说明 | +| interpret-as | Description | | --- | --- | -| `characters`, `spell-out` | 逐字母拼写 | -| `alphanumeric` | 字母数字混合拼写 | -| `cardinal`, `number` | 基数 | -| `ordinal` | 序数 | -| `number_digit` | 单个数字序列 | -| `fraction` | 分数 | -| `date` | 日期 | -| `time` | 时间 | -| `duration` | 持续时间 | -| `telephone` | 电话号码 | -| `currency` | 货币 | -| `unit` | 单位 | -| `address` | 地址 | -| `name` | 人名 | - -## Sub 元素 - -使用 `sub` 元素指定别名文本代替原元素文本。 +| `characters`, `spell-out` | Spell out letter by letter | +| `alphanumeric` | Alphanumeric mixed spelling | +| `cardinal`, `number` | Cardinal numbers | +| `ordinal` | Ordinal numbers | +| `number_digit` | Sequence of individual digits | +| `fraction` | Fractions | +| `date` | Dates | +| `time` | Time | +| `duration` | Duration | +| `telephone` | Phone numbers | +| `currency` | Currency | +| `unit` | Units of measurement | +| `address` | Addresses | +| `name` | Personal names | + +## Sub Element + +Use the `sub` element to specify alias text to replace the original element text. ```xml @@ -147,9 +147,9 @@ ``` -## 数学表达式的阅读 +## Reading Mathematical Expressions -### 方法 1:纯文本数学表达式 +### Method 1: Plain Text Mathematical Expressions ```xml @@ -160,7 +160,7 @@ ``` -读出括号: +Read out parentheses: ```xml @@ -171,7 +171,7 @@ ``` -### 方法 2:使用 MathML +### Method 2: Using MathML ```xml @@ -196,4 +196,10 @@ ``` -输出:"a squared 加 b squared 等于 c squared" +Output: "a squared plus b squared equals c squared" + +--- + +**Note**: This documentation is based on Microsoft's official SSML documentation. For the most up-to-date information, please refer to the [Microsoft Azure Speech Service documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup-pronunciation). + +© Microsoft Corporation. All rights reserved. diff --git a/docs/ssml-structure.md b/docs/ssml-structure.md index 0bb966a..b12447c 100644 --- a/docs/ssml-structure.md +++ b/docs/ssml-structure.md @@ -1,26 +1,26 @@ -# 语音合成标记语言 (SSML) 文档结构和事件 - 语音服务 - Foundry Tools | Microsoft Learn +# Speech Synthesis Markup Language (SSML) Document Structure and Events - Speech Service - Foundry Tools | Microsoft Learn -语音合成标记语言(SSML)连同输入文本一起,决定了文本转语音输出的结构、内容和其他特征。 例如,可以使用 SSML 来定义段落、句子、中断/暂停或静音。 可以使用事件标记(例如书签或视素)来包装文本,这些标记可以稍后由应用程序处理。 +Speech Synthesis Markup Language (SSML), used together with input text, determines the structure, content, and other characteristics of text-to-speech output. For example, you can use SSML to define paragraphs, sentences, breaks/pauses, or silence. You can wrap text with event markers (such as bookmarks or visemes) that can be processed later by applications. -有关如何在 SSML 文档中构建元素的详细信息,请参阅以下部分。 +For more information on how to structure elements in an SSML document, see the following sections. -注意 +> **Note** +> +> In addition to Azure Neural (non-HD) voices in Foundry Tools, you can also use [Azure HD (High-Definition) Voices in Foundry Tools](high-definition-voices) and [Azure OpenAI Neural (HD and non-HD) Voices](openai-voices). HD voices provide higher quality for more diverse scenarios. -除了 Foundry Tools 中的 Azure 语音神经(非高清)语音外,你还可以使用 [Foundry Tools 中的 Azure 语音高清 (HD) 语音](high-definition-voices)和 [Azure OpenAI 神经(高清和非高清)语音](openai-voices)。 HD 语音为更多样化的场景提供更高的质量。 +Certain voices do not support all [Speech Synthesis Markup Language (SSML)](speech-synthesis-markup-structure) tags. This includes Neural Text-to-Speech HD voices, Personal Voices, and Embedded Voices. -某些语音不支持所有 [语音合成标记语言 (SSML)](speech-synthesis-markup-structure) 标记。 这包括神经网络文本转语音高清语音、个性化语音和嵌入语音。 +- For Azure HD voices, check SSML support [here](speech-synthesis-markup-voice). +- For Personal Voices, SSML support can be found [here](personal-voice-how-to-use#supported-and-unsupported-ssml-elements-for-personal-voice). +- For Embedded Voices, check SSML support [here](embedded-speech#embedded-voices-capabilities). -- 对于 Azure 高清(HD)语音,请检查此处的 SSML 支持。 -- 对于个人语音,可以在 [此处](personal-voice-how-to-use#supported-and-unsupported-ssml-elements-for-personal-voice) 找到 SSML 支持。 -- 有关嵌入式声音,请在 [此处](embedded-speech#embedded-voices-capabilities) 查看 SSML 支持。 +## Document Structure -## 文档结构 +The Speech Service implementation of SSML is based on the World Wide Web Consortium's [Speech Synthesis Markup Language Version 1.0](https://www.w3.org/TR/2004/REC-speech-synthesis-20040907/). The elements supported by Speech Service may differ from the W3C standard. -SSML 的语音服务实现基于万维网联合会的 [语音合成标记语言版本 1.0](https://www.w3.org/TR/2004/REC-speech-synthesis-20040907/)。 语音服务支持的元素可能与 W3C 标准不同。 +Each SSML document is created using SSML elements (or tags). These elements are used to adjust speech, style, syllables, prosody, volume, and more. -每个 SSML 文档是使用 SSML 元素(或标记)创建的。 这些元素用于调整语音、风格、音节、韵律、音量等。 - -下面是 SSML 文档的基本结构和语法的子集: +Below is a subset of the basic structure and syntax of an SSML document: ```xml @@ -49,38 +49,38 @@ SSML 的语音服务实现基于万维网联合会的 [语音合成标记语言 ``` -以下列表描述了每个元素中允许的一些内容示例: - -- `audio`:如果音频文件不存在或无法播放,可在 `audio` 元素的正文中包含可讲述的纯文本或 SSML 标记。 `audio` 元素还包含文本和以下元素:`audio`、`break`、`p`、`s`、`phoneme`、`prosody`、`say-as` 和 `sub`。 -- `bookmark`:此元素不能包含文本或任何其他元素。 -- `break`:此元素不能包含文本或任何其他元素。 -- `emphasis`:此元素可包含文本和以下元素:`audio`、`break`、`emphasis`、`lang`、`phoneme`、`prosody`、`say-as` 和 `sub`。 -- `lang`:此元素可包含除 `mstts:backgroundaudio`、`voice` 和 `speak` 以外的所有其他元素。 -- `lexicon`:此元素不能包含文本或任何其他元素。 -- `math`:此元素只能包含文本和 MathML 元素。 -- `mstts:audioduration`:此元素不能包含文本或任何其他元素。 -- `mstts:backgroundaudio`:此元素不能包含文本或任何其他元素。 -- ``:此元素不能包含文本或任何其他元素。 它指定语音转换的源音频 URL。 -- `mstts:embedding`:此元素可包含文本和以下元素:`audio`、`break`、`emphasis`、`lang`、`phoneme`、`prosody`、`say-as` 和 `sub`。 -- `mstts:express-as`:此元素可包含文本和以下元素:`audio`、`break`、`emphasis`、`lang`、`phoneme`、`prosody`、`say-as` 和 `sub`。 -- `mstts:silence`:此元素不能包含文本或任何其他元素。 -- `mstts:viseme`:此元素不能包含文本或任何其他元素。 -- `p`:此元素可包含文本和以下元素:`audio`、`break`、`phoneme`、`prosody`、`say-as`、`sub`、`mstts:express-as` 和 `s`。 -- `phoneme`:此元素只能包含文本,不能包含任何其他元素。 -- `prosody`:此元素可包含文本和以下元素:`audio`、`break`、`p`、`phoneme`、`prosody`、`say-as`、`sub` 和 `s`。 -- `s`:此元素可包含文本和以下元素:`audio`、`break`、`phoneme`、`prosody`、`say-as`、`mstts:express-as` 和 `sub`。 -- `say-as`:此元素只能包含文本,不能包含任何其他元素。 -- `sub`:此元素只能包含文本,不能包含任何其他元素。 -- `speak`:SSML 文档的根元素。 此元素可包含以下元素:`mstts:backgroundaudio` 和 `voice`。 -- `voice`:此元素可包含除 `mstts:backgroundaudio` 和 `speak` 以外的所有其他元素。 - -语音服务可自动适当处理停顿(例如,在句号后面暂停片刻),或者在以问号结尾的句子中使用正确的音调。 - -## 特殊字符 - -若要在 SSML 元素的值或文本中使用字符 `&`、`<` 和 `>`,则必须使用实体格式。 具体而言,必须使用 `&` 而不是 `&`,使用 `<` 而不是 `<`,使用 `>` 而不是 `>`。 否则,无法正确分析 SSML。 - -例如,请指定 `green & yellow` 而不是 `green & yellow`。 系统会正确分析以下 SSML: +The following list describes examples of some content allowed within each element: + +- `audio`: If the audio file doesn't exist or can't be played, you can include speakable plain text or SSML tags in the body of the `audio` element. The `audio` element also contains text and the following elements: `audio`, `break`, `p`, `s`, `phoneme`, `prosody`, `say-as`, and `sub`. +- `bookmark`: This element cannot contain text or any other elements. +- `break`: This element cannot contain text or any other elements. +- `emphasis`: This element can contain text and the following elements: `audio`, `break`, `emphasis`, `lang`, `phoneme`, `prosody`, `say-as`, and `sub`. +- `lang`: This element can contain all other elements except `mstts:backgroundaudio`, `voice`, and `speak`. +- `lexicon`: This element cannot contain text or any other elements. +- `math`: This element can contain only text and MathML elements. +- `mstts:audioduration`: This element cannot contain text or any other elements. +- `mstts:backgroundaudio`: This element cannot contain text or any other elements. +- ``: This element cannot contain text or any other elements. It specifies the source audio URL for voice conversion. +- `mstts:embedding`: This element can contain text and the following elements: `audio`, `break`, `emphasis`, `lang`, `phoneme`, `prosody`, `say-as`, and `sub`. +- `mstts:express-as`: This element can contain text and the following elements: `audio`, `break`, `emphasis`, `lang`, `phoneme`, `prosody`, `say-as`, and `sub`. +- `mstts:silence`: This element cannot contain text or any other elements. +- `mstts:viseme`: This element cannot contain text or any other elements. +- `p`: This element can contain text and the following elements: `audio`, `break`, `phoneme`, `prosody`, `say-as`, `sub`, `mstts:express-as`, and `s`. +- `phoneme`: This element can contain only text and cannot contain any other elements. +- `prosody`: This element can contain text and the following elements: `audio`, `break`, `p`, `phoneme`, `prosody`, `say-as`, `sub`, and `s`. +- `s`: This element can contain text and the following elements: `audio`, `break`, `phoneme`, `prosody`, `say-as`, `mstts:express-as`, and `sub`. +- `say-as`: This element can contain only text and cannot contain any other elements. +- `sub`: This element can contain only text and cannot contain any other elements. +- `speak`: The root element of an SSML document. This element can contain the following elements: `mstts:backgroundaudio` and `voice`. +- `voice`: This element can contain all other elements except `mstts:backgroundaudio` and `speak`. + +The Speech Service can automatically handle pauses appropriately (for example, pausing briefly after a period) or use the correct intonation for sentences ending with a question mark. + +## Special Characters + +To use the characters `&`, `<`, and `>` in the values or text of SSML elements, you must use entity formatting. Specifically, you must use `&` instead of `&`, `<` instead of `<`, and `>` instead of `>`. Otherwise, the SSML will not be parsed correctly. + +For example, specify `green & yellow` instead of `green & yellow`. The following SSML will be parsed correctly: ```xml @@ -90,35 +90,35 @@ SSML 的语音服务实现基于万维网联合会的 [语音合成标记语言 ``` -特殊字符(例如引号、撇号和括号)必须经过转义。 有关详细信息,请参阅 [可扩展标记语言 (XML) 1.0:附录 D](https://www.w3.org/TR/xml/#sec-entexpand)。 +Special characters such as quotation marks, apostrophes, and parentheses must be escaped. For more information, see [Extensible Markup Language (XML) 1.0: Appendix D](https://www.w3.org/TR/xml/#sec-entexpand). -属性值必须用双引号或单引号括起来。 例如,`` 和 `` 是格式正确的有效元素,但无法识别 ``。 +Attribute values must be enclosed in double or single quotation marks. For example, `` and `` are well-formed and valid elements, but `` will not be recognized. -## Speak 根元素 +## Speak Root Element -`speak` 元素包含版本、语言和标记词汇定义等信息。 `speak` 元素是所有 SSML 文档必需的根元素。 你必须在 `speak` 元素内指定默认语言,无论是否在其他地方调整该语言,例如在 [`lang`](speech-synthesis-markup-voice#use-voice-elements) 元素中。 +The `speak` element contains information such as version, language, and markup vocabulary definitions. The `speak` element is the required root element for all SSML documents. You must specify the default language within the `speak` element, regardless of whether you adjust that language elsewhere, such as in the [`lang`](speech-synthesis-markup-voice#use-voice-elements) element. -下面是 `speak` 元素的语法: +Below is the syntax for the `speak` element: ```xml ``` -| Attribute | 说明 | 必需还是可选 | +| Attribute | Description | Required or Optional | | --- | --- | --- | -| `version` | 指示用于解释文档标记的 SSML 规范的版本。 当前版本为"1.0"。 | 必选 | -| `xml:lang` | 根文档的语言。 该值可以包含语言代码(如 `en` (英语))或本地化信息,如 `en-US` (英语 - 美国)。 | 必选 | -| `xmlns` | 用于定义 SSML 文档的标记词汇(元素类型和属性名称)的文档的 URI。 当前 URI 为 "http://www.w3.org/2001/10/synthesis"。 | 必选 | +| `version` | Indicates the version of the SSML specification used to interpret the document markup. The current version is "1.0". | Required | +| `xml:lang` | The language of the root document. The value can contain a language code (such as `en` for English) or locale information such as `en-US` (English - United States). | Required | +| `xmlns` | The URI for the document that defines the markup vocabulary (element types and attribute names) of the SSML document. The current URI is "http://www.w3.org/2001/10/synthesis". | Required | -`speak` 元素必须至少包含一个 [语音元素](speech-synthesis-markup-voice#use-voice-elements)。 +The `speak` element must contain at least one [voice element](speech-synthesis-markup-voice#use-voice-elements). -### 演讲示例 +### Speak Examples -`speak`介绍了 元素属性支持的值。 +The following introduces the values supported by the `speak` element attributes. -#### 单一声音示例 +#### Single Voice Example -本示例使用 `en-US-Ava:DragonHDLatestNeural` 语音。 有关更多示例,请参阅 [语音示例](speech-synthesis-markup-voice#voice-examples)。 +This example uses the `en-US-Ava:DragonHDLatestNeural` voice. For more examples, see [Voice Examples](speech-synthesis-markup-voice#voice-examples). ```xml @@ -128,30 +128,30 @@ SSML 的语音服务实现基于万维网联合会的 [语音合成标记语言 ``` -## 添加停顿 +## Adding Breaks -使用 `break` 元素替代单词之间的默认中断或暂停行为。 否则,语音服务会自动插入暂停。 +Use the `break` element to override the default break or pause behavior between words. Otherwise, the Speech Service will automatically insert pauses. -下表描述了 `break` 元素的属性用法。 +The following table describes the attribute usage for the `break` element. -| Attribute | 说明 | 必需还是可选 | +| Attribute | Description | Required or Optional | | --- | --- | --- | -| `strength` | 暂停的相对持续时间,使用以下值之一:
- x-weak
- weak
- medium(默认值)
- strong
- x-strong | 可选 | -| `time` | 暂停的绝对持续时间,以秒为单位(例如 `2s`)或以毫秒为单位(例如 `500ms`)。 有效值的范围为 0 到 20000 毫秒。 如果设置的值大于支持的最大值,则服务将使用 `20000ms`。 如果设置了 `time` 属性,则会忽略 `strength` 属性。 | 可选 | +| `strength` | The relative duration of the pause, using one of the following values:
- x-weak
- weak
- medium (default)
- strong
- x-strong | Optional | +| `time` | The absolute duration of the pause, in seconds (for example `2s`) or milliseconds (for example `500ms`). Valid values range from 0 to 20000 milliseconds. If the set value is greater than the supported maximum, the service will use `20000ms`. If the `time` attribute is set, the `strength` attribute is ignored. | Optional | -下面是有关该 `strength` 属性的更多详细信息。 +Below are more details about the `strength` attribute. -| Strength | 相对持续时间 | +| Strength | Relative Duration | | --- | --- | -| X-weak | 250 毫秒 | -| Weak | 500 毫秒 | -| 中型 | 750 毫秒 | -| 非常 | 1,000 毫秒 | -| X-strong | 1,250 毫秒 | +| x-weak | 250 ms | +| weak | 500 ms | +| medium | 750 ms | +| strong | 1,000 ms | +| x-strong | 1,250 ms | -### 停顿示例 +### Break Examples -`break`介绍了 元素属性支持的值。 以下三种方式都会增加 750 毫秒的中断。 +The following introduces the values supported by the `break` element attributes. All three methods below add a 750ms break. ```xml @@ -163,26 +163,26 @@ SSML 的语音服务实现基于万维网联合会的 [语音合成标记语言 ``` -## 添加静音 +## Adding Silence -使用 `mstts:silence` 元素在文本前后,或者在两个相邻句子之间添加暂停。 +Use the `mstts:silence` element to add pauses before or after text, or between two adjacent sentences. -`mstts:silence` 和 `break` 之间的差别之一是,`break` 元素可以插入到文本中的任意位置。 静音仅适用于输入文本的开头或结尾,或者两个相邻句子的分界处。 +One difference between `mstts:silence` and `break` is that the `break` element can be inserted anywhere in the text. Silence applies only to the beginning or end of input text, or at the boundary between two adjacent sentences. -静默设置应用于其所在 `voice` 元素内的所有输入文本。 若要再次重置或更改静音设置,必须使用包含相同或不同语音的新 `voice` 元素。 +The silence setting applies to all input text within the `voice` element where it is located. To reset or change the silence setting again, you must use a new `voice` element containing the same or a different voice. -下表描述了 `mstts:silence` 元素的属性用法。 +The following table describes the attribute usage for the `mstts:silence` element. -| Attribute | 说明 | 必需还是可选 | +| Attribute | Description | Required or Optional | | --- | --- | --- | -| `type` | 指定添加静音的位置和方式。 支持以下静音类型:
- `Leading` – 文本开头的附加静音。 设置的值添加到文本开头前的自然静音。
- `Leading-exact` – 文本开头的静音。 该值是绝对静音长度。
- `Tailing` – 文本末尾的附加静音。 设置的值添加到最后一个单词后的自然静音中。
- `Tailing-exact` – 文本末尾的静音。 该值是绝对静音长度。
- `Sentenceboundary` – 相邻句子之间的附加静音。 此类型的实际静音长度包括上一个句子中最后一个单词后的自然静音、为此类型设置的值,以及下一个句子中起始单词之前的自然静音。
- `Sentenceboundary-exact` - 相邻句子之间的静音。 该值是绝对静音长度。
- `Comma-exact` - 半角或全角格式的逗号处的静音。 该值是绝对静音长度。
- `Semicolon-exact` - 半角或全角格式的分号处的静音。 该值是绝对静音长度。
- `Enumerationcomma-exact` - 全角格式的枚举逗号处的静音。 该值是绝对静音长度。

绝对静音类型(带有 `-exact` 后缀)会替换任何其他自然的前导或尾随静音。 绝对静音类型优先于相应的非绝对类型。 例如,如果同时设置了 `Leading` 和 `Leading-exact` 类型,则 `Leading-exact` 类型将生效。 [WordBoundary 事件](how-to-speech-synthesis#subscribe-to-synthesizer-events) 优先于标点符号相关的静音设置,包括 `Comma-exact`、`Semicolon-exact` 或 `Enumerationcomma-exact`。 同时使用 `WordBoundary` 事件和与标点符号相关的静音设置时,与标点符号相关的静音设置不会生效。 | 必选 | -| `value` | 暂停持续时间,以秒为单位(例如 `2s`)或以毫秒为单位(例如 `500ms`)。 有效值的范围为 0 到 20000 毫秒。 如果设置的值大于支持的最大值,则服务将使用 `20000ms`。 | 必选 | +| `type` | Specifies where and how silence is added. The following silence types are supported:
- `Leading` – Additional silence at the beginning of text. The set value is added to the natural silence before the beginning of the text.
- `Leading-exact` – Silence at the beginning of text. The value is the absolute silence length.
- `Tailing` – Additional silence at the end of text. The set value is added to the natural silence after the last word.
- `Tailing-exact` – Silence at the end of text. The value is the absolute silence length.
- `Sentenceboundary` – Additional silence between adjacent sentences. The actual silence length for this type includes the natural silence after the last word of the previous sentence, the value set for this type, and the natural silence before the starting word of the next sentence.
- `Sentenceboundary-exact` – Silence between adjacent sentences. The value is the absolute silence length.
- `Comma-exact` – Silence at half-width or full-width commas. The value is the absolute silence length.
- `Semicolon-exact` – Silence at half-width or full-width semicolons. The value is the absolute silence length.
- `Enumerationcomma-exact` – Silence at full-width enumeration commas. The value is the absolute silence length.

Absolute silence types (with the `-exact` suffix) replace any other natural leading or trailing silence. Absolute silence types take precedence over their corresponding non-absolute types. For example, if both `Leading` and `Leading-exact` types are set, the `Leading-exact` type takes effect. [WordBoundary events](how-to-speech-synthesis#subscribe-to-synthesizer-events) take precedence over punctuation-related silence settings, including `Comma-exact`, `Semicolon-exact`, or `Enumerationcomma-exact`. When using both `WordBoundary` events and punctuation-related silence settings, the punctuation-related silence settings will not take effect. | Required | +| `value` | The pause duration, in seconds (for example `2s`) or milliseconds (for example `500ms`). Valid values range from 0 to 20000 milliseconds. If the set value is greater than the supported maximum, the service will use `20000ms`. | Required | -### MSTTS 静音示例 +### MSTTS Silence Examples -`mstts:silence`介绍了 元素属性支持的值。 +The following introduces the values supported by the `mstts:silence` element attributes. -在本例中,`mstts:silence` 用于在两个句子之间添加 200 毫秒的静音。 +In this example, `mstts:silence` is used to add 200ms of silence between two sentences. ```xml @@ -194,7 +194,7 @@ A good place to start is by trying out the slew of educational apps that are hel ``` -在此示例中,`mstts:silence` 用于在逗号处添加 50 毫秒的静音,在分号处添加 100 毫秒的静音,在枚举逗号处添加 150 毫秒的静音。 +In this example, `mstts:silence` is used to add 50ms of silence at commas, 100ms of silence at semicolons, and 150ms of silence at enumeration commas. ```xml @@ -204,13 +204,13 @@ A good place to start is by trying out the slew of educational apps that are hel ``` -## 指定段落和句子 +## Specifying Paragraphs and Sentences -`p` 和 `s` 元素分别用于表示段落和句子。 如果缺少这些元素,则语音服务会自动确定 SSML 文档的结构。 +The `p` and `s` elements are used to represent paragraphs and sentences, respectively. If these elements are missing, the Speech Service will automatically determine the structure of the SSML document. -### 段落和句子示例 +### Paragraphs and Sentences Example -以下示例定义了两个段落,其中每个段落包含句子。 在第二个段落中,语音服务会自动确定句子结构,因为它们未在 SSML 文档中定义。 +The following example defines two paragraphs, where each paragraph contains sentences. In the second paragraph, the Speech Service automatically determines the sentence structure because they are not explicitly defined in the SSML document. ```xml @@ -227,21 +227,21 @@ A good place to start is by trying out the slew of educational apps that are hel ``` -## Bookmark 元素 +## Bookmark Element -可以使用 SSML 中的 `bookmark` 元素来引用文本或标签序列中的特定位置。 然后使用语音 SDK 并订阅 `BookmarkReached` 事件以获取音频流中每个标记的偏移量。 `bookmark` 元素没有被读出。 有关详细信息,请参阅 [订阅合成器事件](how-to-speech-synthesis#subscribe-to-synthesizer-events)。 +You can use the `bookmark` element in SSML to reference specific positions in text or a sequence of tags. Then use the Speech SDK and subscribe to the `BookmarkReached` event to get the offset of each bookmark in the audio stream. The `bookmark` element is not spoken aloud. For more information, see [Subscribe to Synthesizer Events](how-to-speech-synthesis#subscribe-to-synthesizer-events). -下表描述了 `bookmark` 元素的属性用法。 +The following table describes the attribute usage for the `bookmark` element. -| Attribute | 说明 | 必需还是可选 | +| Attribute | Description | Required or Optional | | --- | --- | --- | -| `mark` | `bookmark` 元素的引用文本。 | 必选 | +| `mark` | The reference text for the `bookmark` element. | Required | -### Bookmark 示例 +### Bookmark Example -`bookmark`介绍了 元素属性支持的值。 +The following introduces the values supported by the `bookmark` element attributes. -你可能想知道以下代码片断中每个与花相关的词的时间偏移量: +You might want to know the time offset of each flower-related word in the following code snippet: ```xml @@ -250,3 +250,7 @@ A good place to start is by trying out the slew of educational apps that are hel
``` + +--- + +*This documentation is adapted from Microsoft Azure Speech Service official documentation. All SSML specifications and element descriptions are based on Microsoft's technical documentation.* diff --git a/docs/ssml-voice.md b/docs/ssml-voice.md index 72db6ce..2f97a03 100644 --- a/docs/ssml-voice.md +++ b/docs/ssml-voice.md @@ -1,25 +1,25 @@ -# 语音合成标记语言 (SSML) 的语音和声音 - 语音服务 - Foundry Tools | Microsoft Learn +# Voice and Sounds in Speech Synthesis Markup Language (SSML) - Speech Service - Foundry Tools | Microsoft Learn -可以使用语音合成标记语言 (SSML) 为语音输出指定文本转语音的声音、语言、名称、风格和角色。 还可以在单个 SSML 文档中使用多种语音,并调整重音、语速、音调和音量。 此外,SSML 还能够插入预先录制的音频,例如音效或音符。 +You can use Speech Synthesis Markup Language (SSML) to specify the voice, language, name, style, and role for text-to-speech output. You can also use multiple voices in a single SSML document and adjust stress, speech rate, pitch, and volume. Additionally, SSML allows insertion of pre-recorded audio, such as sound effects or musical notes. -本文介绍了如何使用 SSML 元素来指定语音和声音。 有关 SSML 语法的详细信息,请参阅 [SSML 文档结构和事件](speech-synthesis-markup-structure)。 +This article describes how to use SSML elements to specify voice and sounds. For more information about SSML syntax, see [SSML document structure and events](speech-synthesis-markup-structure). -## 使用语音元素 +## Using the voice element -必须在每个 SSML `voice` 元素中至少指定一个 元素。 此元素可确定用于文本转语音的声音。 +You must specify at least one `name` attribute in each SSML `voice` element. This attribute determines the voice used for text-to-speech. -可以在单个 SSML 文档中包含多个 `voice` 元素。 每个 `voice` 元素可以指定不同的语音。 还可以通过不同的设置多次使用同一语音,例如,当 [更改句子之间的静音持续时间](speech-synthesis-markup-structure#add-silence) 时。 +You can include multiple `voice` elements in a single SSML document. Each `voice` element can specify a different voice. You can also use the same voice multiple times with different settings, for example, when [changing the duration of silence between sentences](speech-synthesis-markup-structure#add-silence). -下表介绍 `voice` 元素的属性的用法: +The following table describes the usage of `voice` element attributes: -| Attribute | 说明 | 必需还是可选 | +| Attribute | Description | Required or Optional | | --- | --- | --- | -| `name` | 用于文本转语音输出的声音。 有关支持的标准语音的完整列表,请参阅 [语言支持](language-support?tabs=tts)。 | 必选 | -| `effect` | 音频效果处理器,用于在设备上针对特定方案优化合成语音输出的质量。 对于生产环境中的某些方案,听觉体验可能会因某些设备上的播放失真而降级。 例如,由于扬声器响应、房间混响和背景噪音等环境因素,来自汽车扬声器的合成语音可能会听起来迟钝而低沉。 乘客可能必须调高音量才能听得更清楚。 为了避免在这种情况下进行手动操作,音频效果处理器可以通过补偿播放失真来让声音更清晰。支持以下值:
- `eq_car` - 在汽车、公共汽车和其他封闭车辆中提供高保真语音时,优化听觉体验。
- `eq_telecomhp8k` - 优化电信或电话方案中窄带语音的听觉体验。 应使用 8 kHz 的采样率。 如果采样率不是 8 kHz,则不会优化输出语音的听觉质量。

如果值缺失或无效,则会忽略此属性,而不会应用任何效果。 | 可选 | +| `name` | The voice used for text-to-speech output. For a complete list of supported standard voices, see [Language support](language-support?tabs=tts). | Required | +| `effect` | Audio effect processor used to optimize the quality of synthesized speech output on devices for specific scenarios. In certain production scenarios, the listening experience may be degraded due to playback distortion on certain devices. For example, synthesized speech from car speakers may sound dull and muffled due to environmental factors such as speaker response, room reverberation, and background noise. Passengers may have to turn up the volume to hear more clearly. To avoid manual operation in this situation, the audio effect processor can make the voice clearer by compensating for playback distortion. The following values are supported:
- `eq_car` - Optimizes the listening experience when delivering high-fidelity speech in cars, buses, and other enclosed vehicles.
- `eq_telecomhp8k` - Optimizes the listening experience for narrowband speech in telecommunications or telephony scenarios. A sample rate of 8 kHz should be used. If the sample rate is not 8 kHz, the listening quality of the output speech will not be optimized.

If the value is missing or invalid, this attribute is ignored and no effect is applied. | Optional | -### 语音示例 +### Voice examples -#### 单一声音示例 +#### Single voice example ```xml @@ -29,7 +29,7 @@ ``` -#### 多个语音的示例 +#### Multiple voices example ```xml @@ -42,7 +42,7 @@ ``` -#### 音频效果示例 +#### Audio effect example ```xml @@ -52,7 +52,7 @@ ``` -#### 多讲话人语音示例 +#### Multi-speaker voice example ```xml @@ -65,99 +65,99 @@ ``` -## 使用说话风格和角色 +## Using speaking styles and roles -默认情况下,神经网络声音采用中性讲话风格。 可在句子层面调整讲话风格、风格强度和角色。 +By default, neural voices use a neutral speaking style. You can adjust the speaking style, style intensity, and role at the sentence level. -下表介绍 `mstts:express-as` 元素的属性的用法: +The following table describes the usage of `mstts:express-as` element attributes: -| Attribute | 说明 | 必需还是可选 | +| Attribute | Description | Required or Optional | | --- | --- | --- | -| `style` | 特定声音的说话风格。 可以表达快乐、同情和平静等情绪。 | 必选 | -| `styledegree` | 讲话风格的强度。 可接受值的范围为:`0.01` 到 `2`(含)。 默认值为 `1`。 | 可选 | -| `role` | 说话时的角色扮演。 声音可以模仿不同的年龄和性别。 | 可选 | +| `style` | The speaking style for a specific voice. Can express emotions such as happiness, sympathy, and calmness. | Required | +| `styledegree` | The intensity of the speaking style. Acceptable values range from `0.01` to `2` (inclusive). Default value is `1`. | Optional | +| `role` | Role-playing when speaking. Voices can imitate different ages and genders. | Optional | -### 支持的风格 (Style) +### Supported styles -| Style | 说明 | +| Style | Description | | --- | --- | -| `advertisement_upbeat` | 用兴奋和精力充沛的语气推广产品或服务。 | -| `affectionate` | 以较高的音调和音量表达温暖而亲切的语气。 | -| `angry` | 表达生气和厌恶的语气。 | -| `assistant` | 以温暖且轻松的语气说话,用于数字助手。 | -| `calm` | 以沉着冷静的态度说话。 | -| `chat` | 表达轻松随意的语气。 | -| `cheerful` | 表达积极愉快的语气。 | -| `customerservice` | 以友好热情的语气为客户提供支持。 | -| `depressed` | 调低音调和音量来表达忧郁、沮丧的语气。 | -| `documentary-narration` | 用轻松、感兴趣和信息丰富的风格讲述纪录片。 | -| `empathetic` | 表达关心和理解。 | -| `excited` | 表达乐观和充满希望的语气。 | -| `fearful` | 以较高的音调、较高的音量和较快的语速来表达恐惧。 | -| `friendly` | 表达一种愉快、怡人且温暖的语气。 | -| `gentle` | 以较低的音调和音量表达温和、礼貌和愉快的语气。 | -| `hopeful` | 以温暖和向往的语气说话。 | -| `lyrical` | 以优美又带感伤的方式表达情感。 | -| `narration-professional` | 以专业、客观的语气朗读内容。 | -| `narration-relaxed` | 以舒缓且悦耳的语气说话,用于内容朗读。 | -| `newscast` | 以正式专业的语气叙述新闻。 | -| `newscast-casual` | 以通用、随意的语气发布一般新闻。 | -| `newscast-formal` | 以正式、自信和权威的语气发布新闻。 | -| `poetry-reading` | 在读诗时表达出带情感和节奏的语气。 | -| `sad` | 表达悲伤语气。 | -| `serious` | 表达严肃和命令的语气。 | -| `shouting` | 以一种听起来好像语音在远处或在另一个位置说话。 | -| `sports_commentary` | 表达一种既轻松又感兴趣的语气,用于播报体育赛事。 | -| `sports_commentary_excited` | 用快速且充满活力的语气播报体育赛事精彩瞬间。 | -| `whispering` | 以试图发出轻柔、温和声音的柔和语气说话。 | -| `terrified` | 表达一种害怕的语气,语速快且声音颤抖。 | -| `unfriendly` | 表达一种冷淡无情的语气。 | - -### 支持的角色 (Role) - -| 角色 | 说明 | +| `advertisement_upbeat` | Promote products or services with an excited and energetic tone. | +| `affectionate` | Express warm and affectionate tone with higher pitch and volume. | +| `angry` | Express angry and disgusted tone. | +| `assistant` | Speak in a warm and relaxed tone, used for digital assistants. | +| `calm` | Speak with composure and calmness. | +| `chat` | Express a relaxed and casual tone. | +| `cheerful` | Express a positive and pleasant tone. | +| `customerservice` | Provide support to customers with a friendly and enthusiastic tone. | +| `depressed` | Express melancholy and depressed tone with lower pitch and volume. | +| `documentary-narration` | Narrate documentaries in a relaxed, interested, and informative style. | +| `empathetic` | Express care and understanding. | +| `excited` | Express an optimistic and hopeful tone. | +| `fearful` | Express fear with higher pitch, higher volume, and faster speech rate. | +| `friendly` | Express a pleasant, charming, and warm tone. | +| `gentle` | Express a mild, polite, and pleasant tone with lower pitch and volume. | +| `hopeful` | Speak in a warm and longing tone. | +| `lyrical` | Express emotions in a graceful and slightly sentimental way. | +| `narration-professional` | Read content in a professional and objective tone. | +| `narration-relaxed` | Speak in a soothing and pleasant tone, used for content narration. | +| `newscast` | Narrate news in a formal and professional tone. | +| `newscast-casual` | Deliver general news in a common, casual tone. | +| `newscast-formal` | Deliver news in a formal, confident, and authoritative tone. | +| `poetry-reading` | Express emotional and rhythmic tone when reading poetry. | +| `sad` | Express a sorrowful tone. | +| `serious` | Express a serious and commanding tone. | +| `shouting` | Sound as if speaking from a distance or in another location. | +| `sports_commentary` | Express a relaxed yet interested tone for broadcasting sports events. | +| `sports_commentary_excited` | Broadcast sports event highlights with a fast and energetic tone. | +| `terrified` | Express a fearful tone with fast speech rate and trembling voice. | +| `unfriendly` | Express a cold and indifferent tone. | +| `whispering` | Speak in a soft tone trying to produce a gentle and mild sound. | + +### Supported roles + +| Role | Description | | --- | --- | -| `Girl` | 声音模仿女孩。 | -| `Boy` | 声音模仿男孩。 | -| `YoungAdultFemale` | 声音模仿年轻的成年女性。 | -| `YoungAdultMale` | 声音模仿年轻的成年男性。 | -| `OlderAdultFemale` | 声音模仿年长的成年女性。 | -| `OlderAdultMale` | 声音模仿年长的成年男性。 | -| `SeniorFemale` | 声音模仿年老女性。 | -| `SeniorMale` | 声音模仿年老男性。 | +| `Girl` | Voice imitates a girl. | +| `Boy` | Voice imitates a boy. | +| `YoungAdultFemale` | Voice imitates a young adult female. | +| `YoungAdultMale` | Voice imitates a young adult male. | +| `OlderAdultFemale` | Voice imitates an older adult female. | +| `OlderAdultMale` | Voice imitates an older adult male. | +| `SeniorFemale` | Voice imitates an elderly female. | +| `SeniorMale` | Voice imitates an elderly male. | -### 风格和程度示例 +### Style and style degree examples ```xml - 快走吧,路上一定要注意安全,早去早回。 + Hurry up, be careful on the road, and come back early. ``` -### 角色示例 +### Role examples ```xml - 女儿看见父亲走了进来,问道: + The daughter saw her father walk in and asked: - "您来的挺快的,怎么过来的?" + "You came pretty fast, how did you get here?" - 父亲放下手提包,说: + The father put down his bag and said: - "刚打车过来的,路上还挺顺畅。" + "I just took a taxi, the traffic was smooth." ``` -## 调整讲话语言 +## Adjusting speaking language -使用 `` 元素调整多语言语音的说话语言。 +Use the `` element to adjust the speaking language for multilingual voices. ```xml @@ -169,19 +169,19 @@ ``` -## 调整韵律 +## Adjusting prosody -使用 `prosody` 元素指定音高、语调、范围、速率和音量的变化。 +Use the `prosody` element to specify variations in pitch, intonation, range, speech rate, and volume. -| Attribute | 说明 | +| Attribute | Description | | --- | --- | -| `contour` | 升降曲线表示音高的变化。 | -| `pitch` | 基线音节。 可用值:`x-low`, `low`, `medium`, `high`, `x-high`, 或相对值(如 `+20Hz`, `-2st`)。 | -| `range` | 音节范围。 | -| `rate` | 语速。 可用值:`x-slow`, `slow`, `medium`, `fast`, `x-fast`, 或相对值(如 `+30%`)。 | -| `volume` | 音量。 可用值:`silent`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`, 或相对值(如 `+20`)。 | +| `contour` | Contour curve representing pitch variations. | +| `pitch` | Baseline pitch. Available values: `x-low`, `low`, `medium`, `high`, `x-high`, or relative values (e.g., `+20Hz`, `-2st`). | +| `range` | Pitch range. | +| `rate` | Speech rate. Available values: `x-slow`, `slow`, `medium`, `fast`, `x-fast`, or relative values (e.g., `+30%`). | +| `volume` | Volume level. Available values: `silent`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`, or relative values (e.g., `+20`). | -### 韵律示例 +### Prosody example ```xml @@ -193,7 +193,7 @@ ``` -## 添加录制的音频 +## Adding recorded audio ```xml @@ -204,7 +204,7 @@ ``` -## 添加背景音频 +## Adding background audio ```xml @@ -215,7 +215,7 @@ ``` -## 语音转换元素 +## Voice conversion element ```xml @@ -224,3 +224,15 @@
``` + +--- + +## Related Links + +- [Microsoft Azure Speech Service Documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/) +- [SSML Specification](https://www.w3.org/TR/speech-synthesis11/) +- [Language Support for Text-to-Speech](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts) + +--- + +*This documentation is translated from Microsoft official documentation. All rights reserved to Microsoft.* diff --git a/example/README.md b/example/README.md index dc275ba..c5cb3ce 100644 --- a/example/README.md +++ b/example/README.md @@ -1,16 +1,16 @@ -# TTS Pro API 示例代码 +# TTS Pro API Example Code -## 快速开始 +## Quick Start -### 1. 配置账户信息 +### 1. Configure Account Information -复制配置模板并填写你的邮箱和密码: +Copy the configuration template and fill in your email and password: ```bash cp config.example.json config.json ``` -编辑 `config.json`: +Edit `config.json`: ```json { "user_email": "your-email@example.com", @@ -21,13 +21,13 @@ cp config.example.json config.json } ``` -### 2. 编译项目 +### 2. Build Project ```bash pnpm run build ``` -### 3. 运行示例 +### 3. Run Examples ```bash # Example 1: Multi-Speaker Dialogue (Chained) @@ -97,24 +97,24 @@ Demonstrate the `styleDegree` parameter (range: 0.01-2.0). Demonstrate the `substitutions` parameter for replacing technical terms. **Features**: -- W3C → 万维网联盟 -- HTTP → 超文本传输协议 +- W3C → World Wide Web Consortium +- HTTP → HyperText Transfer Protocol - CEO → Chief Executive Officer **Output**: `example/output/05-text-substitution-demo.mp3` -## API 参数说明 +## API Parameters -| 参数名 | 必填 | 说明 | 默认值 | -|--------|------|------|--------| -| `user_email` | ✅ | 用户邮箱 | - | -| `user_pass` | ✅ | 用户密码 | - | +| Parameter | Required | Description | Default | +|-----------|----------|-------------|---------| +| `user_email` | ✅ | User email | - | +| `user_pass` | ✅ | User password | - | | `type` | ❌ | `getSpeek`/`getBig`/`setBig` | `getSpeek` | -| `ssml` | ✅ | SSML 内容 | - | -| `kbitrate` | ❌ | 音频质量 | `audio-16khz-32kbitrate-mono-mp3` | -| `output_format` | ❌ | 返回类型:`二进制`/`url` | `二进制` | +| `ssml` | ✅ | SSML content | - | +| `kbitrate` | ❌ | Audio quality | `audio-16khz-32kbitrate-mono-mp3` | +| `output_format` | ❌ | Return type: `binary`/`url` | `binary` | -## 输出目录 +## Output Directory All generated audio files are saved in: ``` @@ -126,20 +126,20 @@ example/output/ └── 05-text-substitution-demo.mp3 ``` -## 注意事项 +## Notes -1. **账户安全**: `config.json` 已被 `.gitignore` 忽略,不会提交到 Git -2. **网络连接**: 运行示例需要网络连接以调用 API -3. **编译要求**: 运行前必须先执行 `pnpm run build` -4. **Node 版本**: 需要 Node.js 18+(支持 `fetch` API) +1. **Account Security**: `config.json` is ignored by `.gitignore` and will not be committed to Git +2. **Network Connection**: Running examples requires network connection to call the API +3. **Build Requirement**: You must run `pnpm run build` before running examples +4. **Node Version**: Requires Node.js 18+ (supports `fetch` API) -## 常见问题 +## FAQ -### Q: 提示 "config.json 不存在" -A: 请复制 `config.example.json` 为 `config.json` 并填写邮箱和密码 +### Q: It says "config.json does not exist" +A: Please copy `config.example.json` to `config.json` and fill in your email and password -### Q: 音频生成失败 -A: 检查网络连接,确认邮箱和密码正确 +### Q: Audio generation failed +A: Check network connection and verify that email and password are correct -### Q: 如何修改音频质量? -A: 编辑 `config.json` 中的 `kbitrate` 字段 +### Q: How to change audio quality? +A: Edit the `kbitrate` field in `config.json` diff --git a/example/run.sh b/example/run.sh index fec09da..e511d15 100755 --- a/example/run.sh +++ b/example/run.sh @@ -1,61 +1,61 @@ #!/bin/bash -# 示例运行脚本 -# 解决 ts-node 无法正确处理中文文件名的问题 +# Example run script +# Solve ts-node's inability to properly handle Chinese filenames -# 检查配置文件 +# Check configuration file if [ ! -f "config.json" ]; then - echo "❌ 错误:config.json 不存在" - echo "📝 请复制 config.example.json 为 config.json 并填写邮箱和密码" + echo "❌ Error: config.json does not exist" + echo "📝 Please copy config.example.json to config.json and fill in email and password" exit 1 fi -# 编译项目 -echo "🔨 正在编译项目..." +# Build project +echo "🔨 Building project..." pnpm run build -# 复制 config.json 到 dist/example -echo "📋 复制配置文件到输出目录..." +# Copy config.json to dist/example +echo "📋 Copying configuration file to output directory..." cp config.json ../dist/example/ -# 切换到 dist/example 目录运行示例 +# Switch to dist/example directory to run examples cd ../dist/example - # 运行示例 + # Run example case "$1" in 0) - echo "🎙️ 运行示例 0: 简单对话演示" + echo "🎙️ Running Example 0: Simple Dialogue Demo" node "00-简单对话演示.js" ;; 1) - echo "🎙️ 运行示例 1: 多说话人对话 - 链式调用" + echo "🎙️ Running Example 1: Multi-Speaker Dialogue - Chained" node "01-多说话人对话 - 链式调用.js" ;; 2) - echo "🎙️ 运行示例 2: 多说话人对话 - 函数式" + echo "🎙️ Running Example 2: Multi-Speaker Dialogue - Functional" node "02-多说话人对话 - 函数式.js" ;; 3) - echo "🎙️ 运行示例 3: 31 种情感风格演示" + echo "🎙️ Running Example 3: 31 Emotional Styles Demo" node "03-31 种情感风格演示.js" ;; 4) - echo "🎙️ 运行示例 4: 情感强度控制演示" + echo "🎙️ Running Example 4: Style Degree Control Demo" node "04-情感强度控制演示.js" ;; 5) - echo "🎙️ 运行示例 5: 文本替换功能演示" + echo "🎙️ Running Example 5: Text Substitution Demo" node "05-文本替换功能演示.js" ;; *) - echo "用法:./run.sh <示例编号>" + echo "Usage: ./run.sh " echo "" - echo "可用示例:" - echo " 0 - 简单对话演示" - echo " 1 - 多说话人对话 - 链式调用" - echo " 2 - 多说话人对话 - 函数式" - echo " 3 - 31 种情感风格演示" - echo " 4 - 情感强度控制演示" - echo " 5 - 文本替换功能演示" + echo "Available examples:" + echo " 0 - Simple Dialogue Demo" + echo " 1 - Multi-Speaker Dialogue - Chained" + echo " 2 - Multi-Speaker Dialogue - Functional" + echo " 3 - 31 Emotional Styles Demo" + echo " 4 - Style Degree Control Demo" + echo " 5 - Text Substitution Demo" exit 1 ;; esac From b3b97daef65c88012b3ccb919c88f88777af1fa7 Mon Sep 17 00:00:00 2001 From: huan-zz3 <2805033624@qq.com> Date: Sun, 22 Mar 2026 19:32:01 +0800 Subject: [PATCH 10/10] docs: translate src/AGENTS.md to English Additional translation - src/ directory knowledge base --- src/AGENTS.md | 150 +++++++++++++++++++++++++------------------------- 1 file changed, 75 insertions(+), 75 deletions(-) diff --git a/src/AGENTS.md b/src/AGENTS.md index 991391c..e8fd790 100644 --- a/src/AGENTS.md +++ b/src/AGENTS.md @@ -1,26 +1,26 @@ -# src/ 目录知识库 +# src/ Directory Knowledge Base -**所属模块**: 核心 TTS 功能实现 +**Module**: Core TTS Functionality Implementation --- ## OVERVIEW -MsEdgeTTS 核心源代码目录 - 包含 WebSocket 通信、SSML 生成、音频输出控制等全部功能实现。 +MsEdgeTTS core source code directory - Contains all functionality implementations including WebSocket communication, SSML generation, audio output control, etc. --- ## WHERE TO LOOK -| 任务 | 文件 | 说明 | +| Task | File | Description | |------|------|------| -| 修改 WebSocket 通信逻辑 | `MsEdgeTTS.ts` | 连接初始化、消息收发、边界元数据处理 | -| 添加新音频格式 | `Output.ts` | `OUTPUT_FORMAT` 枚举 + `OUTPUT_EXTENSIONS` 映射 | -| 修改语音选项 | `Prosody.ts` | `ProsodyOptions` 类(rate/pitch/volume) | -| 修改对话构建器 | `DialogueBuilder.ts` | 链式调用构建器 + `buildDialogueSSML()` 函数 | -| 添加 SSML 工具 | `SSMLUtils.ts` | 转义函数、情感风格验证 | -| 修改类型定义 | `DialogueTurn.ts` | `DialogueTurn`、`Dialogue`、`TextSegment`、`Substitution` | -| 添加单元测试 | `*.spec.ts` | 与源码同目录,Jest 配置在 package.json | +| Modify WebSocket communication logic | `MsEdgeTTS.ts` | Connection initialization, message exchange, boundary metadata processing | +| Add new audio format | `Output.ts` | `OUTPUT_FORMAT` enum + `OUTPUT_EXTENSIONS` mapping | +| Modify voice options | `Prosody.ts` | `ProsodyOptions` class (rate/pitch/volume) | +| Modify dialogue builder | `DialogueBuilder.ts` | Chained builder + `buildDialogueSSML()` function | +| Add SSML utilities | `SSMLUtils.ts` | Escape functions, emotional style validation | +| Modify type definitions | `DialogueTurn.ts` | `DialogueTurn`, `Dialogue`, `TextSegment`, `Substitution` | +| Add unit tests | `*.spec.ts` | Same directory as source, Jest config in package.json | --- @@ -28,102 +28,102 @@ MsEdgeTTS 核心源代码目录 - 包含 WebSocket 通信、SSML 生成、音频 ``` src/ -├── index.ts # Barrel export(6 个导出) -├── MsEdgeTTS.ts # 核心类(457 行) -├── MsEdgeTTS.spec.ts # 单元测试 -├── Output.ts # OUTPUT_FORMAT 枚举 + OUTPUT_EXTENSIONS -├── Prosody.ts # ProsodyOptions 类 + RATE/PITCH/VOLUME 枚举 -├── DialogueTurn.ts # DialogueTurn/Dialogue/TextSegment/Substitution 类型 -├── DialogueBuilder.ts # DialogueBuilder 类 + buildDialogueSSML() 函数 +├── index.ts # Barrel export (6 exports) +├── MsEdgeTTS.ts # Core class (457 lines) +├── MsEdgeTTS.spec.ts # Unit tests +├── Output.ts # OUTPUT_FORMAT enum + OUTPUT_EXTENSIONS +├── Prosody.ts # ProsodyOptions class + RATE/PITCH/VOLUME enums +├── DialogueTurn.ts # DialogueTurn/Dialogue/TextSegment/Substitution types +├── DialogueBuilder.ts # DialogueBuilder class + buildDialogueSSML() function ├── SSMLUtils.ts # escapeSSML/replaceText/validateStyle/validateStyleDegree -└── utils.ts # joinPath() 路径拼接工具 +└── utils.ts # joinPath() path joining utility ``` --- ## CODE MAP -| Symbol | Type | 文件 | 作用 | +| Symbol | Type | File | Role | |--------|------|------|------| -| `MsEdgeTTS` | Class | `MsEdgeTTS.ts` | 主类:WebSocket 连接、语音合成、流处理 | -| `OUTPUT_FORMAT` | Enum | `Output.ts` | 支持的音频格式(MP3/WEBM 多种比特率) | -| `OUTPUT_EXTENSIONS` | Const | `Output.ts` | 格式到文件扩展名映射(`.mp3`/`.webm`) | -| `ProsodyOptions` | Class | `Prosody.ts` | 语速/音调/音量配置选项 | -| `RATE` | Enum | `Prosody.ts` | 语速预设(x-slow 到 x-fast) | -| `PITCH` | Enum | `Prosody.ts` | 音调预设(x-low 到 x-high) | -| `VOLUME` | Enum | `Prosody.ts` | 音量预设(silent 到 x-LOUD) | -| `DialogueBuilder` | Class | `DialogueBuilder.ts` | 链式对话构建器 | -| `buildDialogueSSML` | Function | `DialogueBuilder.ts` | 函数式 SSML 生成 | -| `validateStyle` | Function | `SSMLUtils.ts` | 验证 28 种官方情感风格 | -| `escapeSSML` | Function | `SSMLUtils.ts` | XML 转义(& < > " ') | +| `MsEdgeTTS` | Class | `MsEdgeTTS.ts` | Main class: WebSocket connection, speech synthesis, stream processing | +| `OUTPUT_FORMAT` | Enum | `Output.ts` | Supported audio formats (MP3/WEBM multiple bitrates) | +| `OUTPUT_EXTENSIONS` | Const | `Output.ts` | Format to file extension mapping (`.mp3`/`.webm`) | +| `ProsodyOptions` | Class | `Prosody.ts` | Rate/pitch/volume configuration options | +| `RATE` | Enum | `Prosody.ts` | Speaking rate presets (x-slow to x-fast) | +| `PITCH` | Enum | `Prosody.ts` | Pitch presets (x-low to x-high) | +| `VOLUME` | Enum | `Prosody.ts` | Volume presets (silent to x-LOUD) | +| `DialogueBuilder` | Class | `DialogueBuilder.ts` | Chained dialogue builder | +| `buildDialogueSSML` | Function | `DialogueBuilder.ts` | Functional SSML generation | +| `validateStyle` | Function | `SSMLUtils.ts` | Validate 28 official emotional styles | +| `escapeSSML` | Function | `SSMLUtils.ts` | XML escape (& < > " ') | --- ## CONVENTIONS -**TypeScript 配置**: -- `module`: CommonJS(非 ESM,为兼容性) +**TypeScript Configuration**: +- `module`: CommonJS (not ESM, for compatibility) - `target`: ESNext - `skipLibCheck`: true -- 编译排除:`src/**/*.spec.ts` +- Compilation exclusion: `src/**/*.spec.ts` -**测试约定**: -- 测试文件与源码同目录:`*.spec.ts` -- Jest 配置内联在 `package.json` -- 测试超时:15000ms +**Testing Conventions**: +- Test files in same directory as source: `*.spec.ts` +- Jest config inline in `package.json` +- Test timeout: 15000ms -**导出模式**: -- 使用 barrel export(`index.ts` 统一导出) -- 6 个公共 API:`MsEdgeTTS`, `OUTPUT_FORMAT`, `ProsodyOptions`, `DialogueTurn`, `DialogueBuilder`, `buildDialogueSSML` +**Export Pattern**: +- Use barrel export (`index.ts` unified export) +- 6 public APIs: `MsEdgeTTS`, `OUTPUT_FORMAT`, `ProsodyOptions`, `DialogueTurn`, `DialogueBuilder`, `buildDialogueSSML` -**SSML 处理**: -- 仅支持 `speak`/`voice`/`prosody`/`mstts:express-as`/`lang`/`sub` 元素 -- 不支持完整 SSML 规范 +**SSML Processing**: +- Only supports `speak`/`voice`/`prosody`/`mstts:express-as`/`lang`/`sub` elements +- Full SSML specification not supported --- ## ANTI-PATTERNS (SRC) -- ❌ **不要** 在浏览器中使用 - API 需要 Edge User-Agent(仅服务器端) -- ❌ **不要** 修改 `MsEdgeTTS.ts` 中的 Sec-MS-GEC 哈希算法 - 依赖 Azure 认证机制 -- ❌ **不要** 删除 `isomorphic-ws` 依赖 - 实现跨环境兼容 -- ❌ **不要** 使用回调 API - 仅支持 Promise +- ❌ **Do NOT** use in browser - API requires Edge User-Agent (server-side only) +- ❌ **Do NOT** modify Sec-MS-GEC hash algorithm in `MsEdgeTTS.ts` - depends on Azure authentication mechanism +- ❌ **Do NOT** remove `isomorphic-ws` dependency - enables cross-environment compatibility +- ❌ **Do NOT** use callback API - Promise only --- ## UNIQUE STYLES -**WebSocket 通信**: -- Sec-MS-GEC 哈希认证(SHA-256 + Windows Tick 时间戳) -- 自定义 UUID 生成(非 `crypto.randomUUID`) -- 消息分隔符:`\r\n\r\n` +**WebSocket Communication**: +- Sec-MS-GEC hash authentication (SHA-256 + Windows Tick timestamp) +- Custom UUID generation (not `crypto.randomUUID`) +- Message delimiter: `\r\n\r\n` -**日志系统**: -- 可选 logger(`enableLogger` 选项) -- 仅记录连接状态、消息收发 +**Logging System**: +- Optional logger (`enableLogger` option) +- Only logs connection status, message exchange -**多人对话支持**: -- `DialogueBuilder` 链式调用 -- `buildDialogueSSML()` 函数式 API -- 支持 28 种情感风格 + 强度控制(0.01-2.0) -- 支持文本替换(`` 标签) -- 支持多语言混合(``) +**Multi-Speaker Dialogue Support**: +- `DialogueBuilder` chained calls +- `buildDialogueSSML()` functional API +- Supports 28 emotional styles + intensity control (0.01-2.0) +- Supports text substitution (`` tags) +- Supports multi-language mixing (``) --- ## COMMANDS ```bash -# 编译 src/ 到 dist/ +# Compile src/ to dist/ pnpm run build -# 运行测试(src/*.spec.ts) +# Run tests (src/*.spec.ts) pnpm test -# 测试监听模式 +# Test watch mode pnpm run test:watch -# 测试覆盖率 +# Test coverage pnpm run test:cov ``` @@ -131,15 +131,15 @@ pnpm run test:cov ## NOTES -**关键限制**: -- 2025 年 12 月更新:API 需要 Edge User-Agent,**浏览器中无法使用** -- 语音列表需要可信客户端 Token(硬编码:`6A5AA1D4EAFF4E9FB37E23D68491D6F4`) +**Key Limitations**: +- December 2025 update: API requires Edge User-Agent, **cannot be used in browsers** +- Voice list requires trusted client Token (hardcoded: `6A5AA1D4EAFF4E9FB37E23D68491D6F4`) -**已知问题**: -- `MsEdgeTTS.ts` 约 457 行 - 复杂度较高,建议拆分 +**Known Issues**: +- `MsEdgeTTS.ts` approximately 457 lines - high complexity, recommended to split -**添加新功能流程**: -1. 在 `src/` 同级创建 `.ts` 文件 -2. 在 `index.ts` 添加导出 -3. 创建同名 `.spec.ts` 测试文件 -4. 运行 `pnpm test` 验证 +**Adding New Features Process**: +1. Create `.ts` file at same level in `src/` +2. Add export in `index.ts` +3. Create `.spec.ts` test file with same name +4. Run `pnpm test` to verify