绝对静音类型(带有 `-exact` 后缀)会替换任何其他自然的前导或尾随静音。 绝对静音类型优先于相应的非绝对类型。 例如,如果同时设置了 `Leading` 和 `Leading-exact` 类型,则 `Leading-exact` 类型将生效。 [WordBoundary 事件](how-to-speech-synthesis#subscribe-to-synthesizer-events) 优先于标点符号相关的静音设置,包括 `Comma-exact`、`Semicolon-exact` 或 `Enumerationcomma-exact`。 同时使用 `WordBoundary` 事件和与标点符号相关的静音设置时,与标点符号相关的静音设置不会生效。 | 必选 |
+| `value` | 暂停持续时间,以秒为单位(例如 `2s`)或以毫秒为单位(例如 `500ms`)。 有效值的范围为 0 到 20000 毫秒。 如果设置的值大于支持的最大值,则服务将使用 `20000ms`。 | 必选 |
+
+### MSTTS 静音示例
+
+`mstts:silence`介绍了 元素属性支持的值。
+
+在本例中,`mstts:silence` 用于在两个句子之间添加 200 毫秒的静音。
+
+```xml
+
+
+
+If we're home schooling, the best we can do is roll with what each day brings and try to have fun along the way.
+A good place to start is by trying out the slew of educational apps that are helping children stay happy and smash their schooling at the same time.
+
+
+```
+
+在此示例中,`mstts:silence` 用于在逗号处添加 50 毫秒的静音,在分号处添加 100 毫秒的静音,在枚举逗号处添加 150 毫秒的静音。
+
+```xml
+
+
+你好呀,云希、晓晓;你好呀。
+
+
+```
+
+## 指定段落和句子
+
+`p` 和 `s` 元素分别用于表示段落和句子。 如果缺少这些元素,则语音服务会自动确定 SSML 文档的结构。
+
+### 段落和句子示例
+
+以下示例定义了两个段落,其中每个段落包含句子。 在第二个段落中,语音服务会自动确定句子结构,因为它们未在 SSML 文档中定义。
+
+```xml
+
+
+
+ Introducing the sentence element.
+ Used to mark individual sentences.
+
+
+ Another simple paragraph.
+ Sentence structure in this paragraph is not explicitly marked.
+
绝对静音类型(带有 `-exact` 后缀)会替换任何其他自然的前导或尾随静音。 绝对静音类型优先于相应的非绝对类型。 例如,如果同时设置了 `Leading` 和 `Leading-exact` 类型,则 `Leading-exact` 类型将生效。 [WordBoundary 事件](how-to-speech-synthesis#subscribe-to-synthesizer-events) 优先于标点符号相关的静音设置,包括 `Comma-exact`、`Semicolon-exact` 或 `Enumerationcomma-exact`。 同时使用 `WordBoundary` 事件和与标点符号相关的静音设置时,与标点符号相关的静音设置不会生效。 | 必选 |
-| `value` | 暂停持续时间,以秒为单位(例如 `2s`)或以毫秒为单位(例如 `500ms`)。 有效值的范围为 0 到 20000 毫秒。 如果设置的值大于支持的最大值,则服务将使用 `20000ms`。 | 必选 |
+| `type` | Specifies where and how silence is added. The following silence types are supported: - `Leading` – Additional silence at the beginning of text. The set value is added to the natural silence before the beginning of the text. - `Leading-exact` – Silence at the beginning of text. The value is the absolute silence length. - `Tailing` – Additional silence at the end of text. The set value is added to the natural silence after the last word. - `Tailing-exact` – Silence at the end of text. The value is the absolute silence length. - `Sentenceboundary` – Additional silence between adjacent sentences. The actual silence length for this type includes the natural silence after the last word of the previous sentence, the value set for this type, and the natural silence before the starting word of the next sentence. - `Sentenceboundary-exact` – Silence between adjacent sentences. The value is the absolute silence length. - `Comma-exact` – Silence at half-width or full-width commas. The value is the absolute silence length. - `Semicolon-exact` – Silence at half-width or full-width semicolons. The value is the absolute silence length. - `Enumerationcomma-exact` – Silence at full-width enumeration commas. The value is the absolute silence length.
Absolute silence types (with the `-exact` suffix) replace any other natural leading or trailing silence. Absolute silence types take precedence over their corresponding non-absolute types. For example, if both `Leading` and `Leading-exact` types are set, the `Leading-exact` type takes effect. [WordBoundary events](how-to-speech-synthesis#subscribe-to-synthesizer-events) take precedence over punctuation-related silence settings, including `Comma-exact`, `Semicolon-exact`, or `Enumerationcomma-exact`. When using both `WordBoundary` events and punctuation-related silence settings, the punctuation-related silence settings will not take effect. | Required |
+| `value` | The pause duration, in seconds (for example `2s`) or milliseconds (for example `500ms`). Valid values range from 0 to 20000 milliseconds. If the set value is greater than the supported maximum, the service will use `20000ms`. | Required |
-### MSTTS 静音示例
+### MSTTS Silence Examples
-`mstts:silence`介绍了 元素属性支持的值。
+The following introduces the values supported by the `mstts:silence` element attributes.
-在本例中,`mstts:silence` 用于在两个句子之间添加 200 毫秒的静音。
+In this example, `mstts:silence` is used to add 200ms of silence between two sentences.
```xml
@@ -194,7 +194,7 @@ A good place to start is by trying out the slew of educational apps that are hel
```
-在此示例中,`mstts:silence` 用于在逗号处添加 50 毫秒的静音,在分号处添加 100 毫秒的静音,在枚举逗号处添加 150 毫秒的静音。
+In this example, `mstts:silence` is used to add 50ms of silence at commas, 100ms of silence at semicolons, and 150ms of silence at enumeration commas.
```xml
@@ -204,13 +204,13 @@ A good place to start is by trying out the slew of educational apps that are hel
```
-## 指定段落和句子
+## Specifying Paragraphs and Sentences
-`p` 和 `s` 元素分别用于表示段落和句子。 如果缺少这些元素,则语音服务会自动确定 SSML 文档的结构。
+The `p` and `s` elements are used to represent paragraphs and sentences, respectively. If these elements are missing, the Speech Service will automatically determine the structure of the SSML document.
-### 段落和句子示例
+### Paragraphs and Sentences Example
-以下示例定义了两个段落,其中每个段落包含句子。 在第二个段落中,语音服务会自动确定句子结构,因为它们未在 SSML 文档中定义。
+The following example defines two paragraphs, where each paragraph contains sentences. In the second paragraph, the Speech Service automatically determines the sentence structure because they are not explicitly defined in the SSML document.
```xml
@@ -227,21 +227,21 @@ A good place to start is by trying out the slew of educational apps that are hel
```
-## Bookmark 元素
+## Bookmark Element
-可以使用 SSML 中的 `bookmark` 元素来引用文本或标签序列中的特定位置。 然后使用语音 SDK 并订阅 `BookmarkReached` 事件以获取音频流中每个标记的偏移量。 `bookmark` 元素没有被读出。 有关详细信息,请参阅 [订阅合成器事件](how-to-speech-synthesis#subscribe-to-synthesizer-events)。
+You can use the `bookmark` element in SSML to reference specific positions in text or a sequence of tags. Then use the Speech SDK and subscribe to the `BookmarkReached` event to get the offset of each bookmark in the audio stream. The `bookmark` element is not spoken aloud. For more information, see [Subscribe to Synthesizer Events](how-to-speech-synthesis#subscribe-to-synthesizer-events).
-下表描述了 `bookmark` 元素的属性用法。
+The following table describes the attribute usage for the `bookmark` element.
-| Attribute | 说明 | 必需还是可选 |
+| Attribute | Description | Required or Optional |
| --- | --- | --- |
-| `mark` | `bookmark` 元素的引用文本。 | 必选 |
+| `mark` | The reference text for the `bookmark` element. | Required |
-### Bookmark 示例
+### Bookmark Example
-`bookmark`介绍了 元素属性支持的值。
+The following introduces the values supported by the `bookmark` element attributes.
-你可能想知道以下代码片断中每个与花相关的词的时间偏移量:
+You might want to know the time offset of each flower-related word in the following code snippet:
```xml
@@ -250,3 +250,7 @@ A good place to start is by trying out the slew of educational apps that are hel
```
+
+---
+
+*This documentation is adapted from Microsoft Azure Speech Service official documentation. All SSML specifications and element descriptions are based on Microsoft's technical documentation.*
diff --git a/docs/ssml-voice.md b/docs/ssml-voice.md
index 72db6ce..2f97a03 100644
--- a/docs/ssml-voice.md
+++ b/docs/ssml-voice.md
@@ -1,25 +1,25 @@
-# 语音合成标记语言 (SSML) 的语音和声音 - 语音服务 - Foundry Tools | Microsoft Learn
+# Voice and Sounds in Speech Synthesis Markup Language (SSML) - Speech Service - Foundry Tools | Microsoft Learn
-可以使用语音合成标记语言 (SSML) 为语音输出指定文本转语音的声音、语言、名称、风格和角色。 还可以在单个 SSML 文档中使用多种语音,并调整重音、语速、音调和音量。 此外,SSML 还能够插入预先录制的音频,例如音效或音符。
+You can use Speech Synthesis Markup Language (SSML) to specify the voice, language, name, style, and role for text-to-speech output. You can also use multiple voices in a single SSML document and adjust stress, speech rate, pitch, and volume. Additionally, SSML allows insertion of pre-recorded audio, such as sound effects or musical notes.
-本文介绍了如何使用 SSML 元素来指定语音和声音。 有关 SSML 语法的详细信息,请参阅 [SSML 文档结构和事件](speech-synthesis-markup-structure)。
+This article describes how to use SSML elements to specify voice and sounds. For more information about SSML syntax, see [SSML document structure and events](speech-synthesis-markup-structure).
-## 使用语音元素
+## Using the voice element
-必须在每个 SSML `voice` 元素中至少指定一个 元素。 此元素可确定用于文本转语音的声音。
+You must specify at least one `name` attribute in each SSML `voice` element. This attribute determines the voice used for text-to-speech.
-可以在单个 SSML 文档中包含多个 `voice` 元素。 每个 `voice` 元素可以指定不同的语音。 还可以通过不同的设置多次使用同一语音,例如,当 [更改句子之间的静音持续时间](speech-synthesis-markup-structure#add-silence) 时。
+You can include multiple `voice` elements in a single SSML document. Each `voice` element can specify a different voice. You can also use the same voice multiple times with different settings, for example, when [changing the duration of silence between sentences](speech-synthesis-markup-structure#add-silence).
-下表介绍 `voice` 元素的属性的用法:
+The following table describes the usage of `voice` element attributes:
-| Attribute | 说明 | 必需还是可选 |
+| Attribute | Description | Required or Optional |
| --- | --- | --- |
-| `name` | 用于文本转语音输出的声音。 有关支持的标准语音的完整列表,请参阅 [语言支持](language-support?tabs=tts)。 | 必选 |
-| `effect` | 音频效果处理器,用于在设备上针对特定方案优化合成语音输出的质量。 对于生产环境中的某些方案,听觉体验可能会因某些设备上的播放失真而降级。 例如,由于扬声器响应、房间混响和背景噪音等环境因素,来自汽车扬声器的合成语音可能会听起来迟钝而低沉。 乘客可能必须调高音量才能听得更清楚。 为了避免在这种情况下进行手动操作,音频效果处理器可以通过补偿播放失真来让声音更清晰。支持以下值: - `eq_car` - 在汽车、公共汽车和其他封闭车辆中提供高保真语音时,优化听觉体验。 - `eq_telecomhp8k` - 优化电信或电话方案中窄带语音的听觉体验。 应使用 8 kHz 的采样率。 如果采样率不是 8 kHz,则不会优化输出语音的听觉质量。
如果值缺失或无效,则会忽略此属性,而不会应用任何效果。 | 可选 |
+| `name` | The voice used for text-to-speech output. For a complete list of supported standard voices, see [Language support](language-support?tabs=tts). | Required |
+| `effect` | Audio effect processor used to optimize the quality of synthesized speech output on devices for specific scenarios. In certain production scenarios, the listening experience may be degraded due to playback distortion on certain devices. For example, synthesized speech from car speakers may sound dull and muffled due to environmental factors such as speaker response, room reverberation, and background noise. Passengers may have to turn up the volume to hear more clearly. To avoid manual operation in this situation, the audio effect processor can make the voice clearer by compensating for playback distortion. The following values are supported: - `eq_car` - Optimizes the listening experience when delivering high-fidelity speech in cars, buses, and other enclosed vehicles. - `eq_telecomhp8k` - Optimizes the listening experience for narrowband speech in telecommunications or telephony scenarios. A sample rate of 8 kHz should be used. If the sample rate is not 8 kHz, the listening quality of the output speech will not be optimized.
If the value is missing or invalid, this attribute is ignored and no effect is applied. | Optional |
-### 语音示例
+### Voice examples
-#### 单一声音示例
+#### Single voice example
```xml
@@ -29,7 +29,7 @@
```
-#### 多个语音的示例
+#### Multiple voices example
```xml
@@ -42,7 +42,7 @@
```
-#### 音频效果示例
+#### Audio effect example
```xml
@@ -52,7 +52,7 @@
```
-#### 多讲话人语音示例
+#### Multi-speaker voice example
```xml
@@ -65,99 +65,99 @@
```
-## 使用说话风格和角色
+## Using speaking styles and roles
-默认情况下,神经网络声音采用中性讲话风格。 可在句子层面调整讲话风格、风格强度和角色。
+By default, neural voices use a neutral speaking style. You can adjust the speaking style, style intensity, and role at the sentence level.
-下表介绍 `mstts:express-as` 元素的属性的用法:
+The following table describes the usage of `mstts:express-as` element attributes:
-| Attribute | 说明 | 必需还是可选 |
+| Attribute | Description | Required or Optional |
| --- | --- | --- |
-| `style` | 特定声音的说话风格。 可以表达快乐、同情和平静等情绪。 | 必选 |
-| `styledegree` | 讲话风格的强度。 可接受值的范围为:`0.01` 到 `2`(含)。 默认值为 `1`。 | 可选 |
-| `role` | 说话时的角色扮演。 声音可以模仿不同的年龄和性别。 | 可选 |
+| `style` | The speaking style for a specific voice. Can express emotions such as happiness, sympathy, and calmness. | Required |
+| `styledegree` | The intensity of the speaking style. Acceptable values range from `0.01` to `2` (inclusive). Default value is `1`. | Optional |
+| `role` | Role-playing when speaking. Voices can imitate different ages and genders. | Optional |
-### 支持的风格 (Style)
+### Supported styles
-| Style | 说明 |
+| Style | Description |
| --- | --- |
-| `advertisement_upbeat` | 用兴奋和精力充沛的语气推广产品或服务。 |
-| `affectionate` | 以较高的音调和音量表达温暖而亲切的语气。 |
-| `angry` | 表达生气和厌恶的语气。 |
-| `assistant` | 以温暖且轻松的语气说话,用于数字助手。 |
-| `calm` | 以沉着冷静的态度说话。 |
-| `chat` | 表达轻松随意的语气。 |
-| `cheerful` | 表达积极愉快的语气。 |
-| `customerservice` | 以友好热情的语气为客户提供支持。 |
-| `depressed` | 调低音调和音量来表达忧郁、沮丧的语气。 |
-| `documentary-narration` | 用轻松、感兴趣和信息丰富的风格讲述纪录片。 |
-| `empathetic` | 表达关心和理解。 |
-| `excited` | 表达乐观和充满希望的语气。 |
-| `fearful` | 以较高的音调、较高的音量和较快的语速来表达恐惧。 |
-| `friendly` | 表达一种愉快、怡人且温暖的语气。 |
-| `gentle` | 以较低的音调和音量表达温和、礼貌和愉快的语气。 |
-| `hopeful` | 以温暖和向往的语气说话。 |
-| `lyrical` | 以优美又带感伤的方式表达情感。 |
-| `narration-professional` | 以专业、客观的语气朗读内容。 |
-| `narration-relaxed` | 以舒缓且悦耳的语气说话,用于内容朗读。 |
-| `newscast` | 以正式专业的语气叙述新闻。 |
-| `newscast-casual` | 以通用、随意的语气发布一般新闻。 |
-| `newscast-formal` | 以正式、自信和权威的语气发布新闻。 |
-| `poetry-reading` | 在读诗时表达出带情感和节奏的语气。 |
-| `sad` | 表达悲伤语气。 |
-| `serious` | 表达严肃和命令的语气。 |
-| `shouting` | 以一种听起来好像语音在远处或在另一个位置说话。 |
-| `sports_commentary` | 表达一种既轻松又感兴趣的语气,用于播报体育赛事。 |
-| `sports_commentary_excited` | 用快速且充满活力的语气播报体育赛事精彩瞬间。 |
-| `whispering` | 以试图发出轻柔、温和声音的柔和语气说话。 |
-| `terrified` | 表达一种害怕的语气,语速快且声音颤抖。 |
-| `unfriendly` | 表达一种冷淡无情的语气。 |
-
-### 支持的角色 (Role)
-
-| 角色 | 说明 |
+| `advertisement_upbeat` | Promote products or services with an excited and energetic tone. |
+| `affectionate` | Express warm and affectionate tone with higher pitch and volume. |
+| `angry` | Express angry and disgusted tone. |
+| `assistant` | Speak in a warm and relaxed tone, used for digital assistants. |
+| `calm` | Speak with composure and calmness. |
+| `chat` | Express a relaxed and casual tone. |
+| `cheerful` | Express a positive and pleasant tone. |
+| `customerservice` | Provide support to customers with a friendly and enthusiastic tone. |
+| `depressed` | Express melancholy and depressed tone with lower pitch and volume. |
+| `documentary-narration` | Narrate documentaries in a relaxed, interested, and informative style. |
+| `empathetic` | Express care and understanding. |
+| `excited` | Express an optimistic and hopeful tone. |
+| `fearful` | Express fear with higher pitch, higher volume, and faster speech rate. |
+| `friendly` | Express a pleasant, charming, and warm tone. |
+| `gentle` | Express a mild, polite, and pleasant tone with lower pitch and volume. |
+| `hopeful` | Speak in a warm and longing tone. |
+| `lyrical` | Express emotions in a graceful and slightly sentimental way. |
+| `narration-professional` | Read content in a professional and objective tone. |
+| `narration-relaxed` | Speak in a soothing and pleasant tone, used for content narration. |
+| `newscast` | Narrate news in a formal and professional tone. |
+| `newscast-casual` | Deliver general news in a common, casual tone. |
+| `newscast-formal` | Deliver news in a formal, confident, and authoritative tone. |
+| `poetry-reading` | Express emotional and rhythmic tone when reading poetry. |
+| `sad` | Express a sorrowful tone. |
+| `serious` | Express a serious and commanding tone. |
+| `shouting` | Sound as if speaking from a distance or in another location. |
+| `sports_commentary` | Express a relaxed yet interested tone for broadcasting sports events. |
+| `sports_commentary_excited` | Broadcast sports event highlights with a fast and energetic tone. |
+| `terrified` | Express a fearful tone with fast speech rate and trembling voice. |
+| `unfriendly` | Express a cold and indifferent tone. |
+| `whispering` | Speak in a soft tone trying to produce a gentle and mild sound. |
+
+### Supported roles
+
+| Role | Description |
| --- | --- |
-| `Girl` | 声音模仿女孩。 |
-| `Boy` | 声音模仿男孩。 |
-| `YoungAdultFemale` | 声音模仿年轻的成年女性。 |
-| `YoungAdultMale` | 声音模仿年轻的成年男性。 |
-| `OlderAdultFemale` | 声音模仿年长的成年女性。 |
-| `OlderAdultMale` | 声音模仿年长的成年男性。 |
-| `SeniorFemale` | 声音模仿年老女性。 |
-| `SeniorMale` | 声音模仿年老男性。 |
+| `Girl` | Voice imitates a girl. |
+| `Boy` | Voice imitates a boy. |
+| `YoungAdultFemale` | Voice imitates a young adult female. |
+| `YoungAdultMale` | Voice imitates a young adult male. |
+| `OlderAdultFemale` | Voice imitates an older adult female. |
+| `OlderAdultMale` | Voice imitates an older adult male. |
+| `SeniorFemale` | Voice imitates an elderly female. |
+| `SeniorMale` | Voice imitates an elderly male. |
-### 风格和程度示例
+### Style and style degree examples
```xml
- 快走吧,路上一定要注意安全,早去早回。
+ Hurry up, be careful on the road, and come back early.
```
-### 角色示例
+### Role examples
```xml
- 女儿看见父亲走了进来,问道:
+ The daughter saw her father walk in and asked:
- "您来的挺快的,怎么过来的?"
+ "You came pretty fast, how did you get here?"
- 父亲放下手提包,说:
+ The father put down his bag and said:
- "刚打车过来的,路上还挺顺畅。"
+ "I just took a taxi, the traffic was smooth."
```
-## 调整讲话语言
+## Adjusting speaking language
-使用 `` 元素调整多语言语音的说话语言。
+Use the `` element to adjust the speaking language for multilingual voices.
```xml
@@ -169,19 +169,19 @@
```
-## 调整韵律
+## Adjusting prosody
-使用 `prosody` 元素指定音高、语调、范围、速率和音量的变化。
+Use the `prosody` element to specify variations in pitch, intonation, range, speech rate, and volume.
-| Attribute | 说明 |
+| Attribute | Description |
| --- | --- |
-| `contour` | 升降曲线表示音高的变化。 |
-| `pitch` | 基线音节。 可用值:`x-low`, `low`, `medium`, `high`, `x-high`, 或相对值(如 `+20Hz`, `-2st`)。 |
-| `range` | 音节范围。 |
-| `rate` | 语速。 可用值:`x-slow`, `slow`, `medium`, `fast`, `x-fast`, 或相对值(如 `+30%`)。 |
-| `volume` | 音量。 可用值:`silent`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`, 或相对值(如 `+20`)。 |
+| `contour` | Contour curve representing pitch variations. |
+| `pitch` | Baseline pitch. Available values: `x-low`, `low`, `medium`, `high`, `x-high`, or relative values (e.g., `+20Hz`, `-2st`). |
+| `range` | Pitch range. |
+| `rate` | Speech rate. Available values: `x-slow`, `slow`, `medium`, `fast`, `x-fast`, or relative values (e.g., `+30%`). |
+| `volume` | Volume level. Available values: `silent`, `x-soft`, `soft`, `medium`, `loud`, `x-loud`, or relative values (e.g., `+20`). |
-### 韵律示例
+### Prosody example
```xml
@@ -193,7 +193,7 @@
```
-## 添加录制的音频
+## Adding recorded audio
```xml
@@ -204,7 +204,7 @@
```
-## 添加背景音频
+## Adding background audio
```xml
@@ -215,7 +215,7 @@
```
-## 语音转换元素
+## Voice conversion element
```xml
@@ -224,3 +224,15 @@
```
+
+---
+
+## Related Links
+
+- [Microsoft Azure Speech Service Documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/)
+- [SSML Specification](https://www.w3.org/TR/speech-synthesis11/)
+- [Language Support for Text-to-Speech](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts)
+
+---
+
+*This documentation is translated from Microsoft official documentation. All rights reserved to Microsoft.*
diff --git a/example/README.md b/example/README.md
index dc275ba..c5cb3ce 100644
--- a/example/README.md
+++ b/example/README.md
@@ -1,16 +1,16 @@
-# TTS Pro API 示例代码
+# TTS Pro API Example Code
-## 快速开始
+## Quick Start
-### 1. 配置账户信息
+### 1. Configure Account Information
-复制配置模板并填写你的邮箱和密码:
+Copy the configuration template and fill in your email and password:
```bash
cp config.example.json config.json
```
-编辑 `config.json`:
+Edit `config.json`:
```json
{
"user_email": "your-email@example.com",
@@ -21,13 +21,13 @@ cp config.example.json config.json
}
```
-### 2. 编译项目
+### 2. Build Project
```bash
pnpm run build
```
-### 3. 运行示例
+### 3. Run Examples
```bash
# Example 1: Multi-Speaker Dialogue (Chained)
@@ -97,24 +97,24 @@ Demonstrate the `styleDegree` parameter (range: 0.01-2.0).
Demonstrate the `substitutions` parameter for replacing technical terms.
**Features**:
-- W3C → 万维网联盟
-- HTTP → 超文本传输协议
+- W3C → World Wide Web Consortium
+- HTTP → HyperText Transfer Protocol
- CEO → Chief Executive Officer
**Output**: `example/output/05-text-substitution-demo.mp3`
-## API 参数说明
+## API Parameters
-| 参数名 | 必填 | 说明 | 默认值 |
-|--------|------|------|--------|
-| `user_email` | ✅ | 用户邮箱 | - |
-| `user_pass` | ✅ | 用户密码 | - |
+| Parameter | Required | Description | Default |
+|-----------|----------|-------------|---------|
+| `user_email` | ✅ | User email | - |
+| `user_pass` | ✅ | User password | - |
| `type` | ❌ | `getSpeek`/`getBig`/`setBig` | `getSpeek` |
-| `ssml` | ✅ | SSML 内容 | - |
-| `kbitrate` | ❌ | 音频质量 | `audio-16khz-32kbitrate-mono-mp3` |
-| `output_format` | ❌ | 返回类型:`二进制`/`url` | `二进制` |
+| `ssml` | ✅ | SSML content | - |
+| `kbitrate` | ❌ | Audio quality | `audio-16khz-32kbitrate-mono-mp3` |
+| `output_format` | ❌ | Return type: `binary`/`url` | `binary` |
-## 输出目录
+## Output Directory
All generated audio files are saved in:
```
@@ -126,20 +126,20 @@ example/output/
└── 05-text-substitution-demo.mp3
```
-## 注意事项
+## Notes
-1. **账户安全**: `config.json` 已被 `.gitignore` 忽略,不会提交到 Git
-2. **网络连接**: 运行示例需要网络连接以调用 API
-3. **编译要求**: 运行前必须先执行 `pnpm run build`
-4. **Node 版本**: 需要 Node.js 18+(支持 `fetch` API)
+1. **Account Security**: `config.json` is ignored by `.gitignore` and will not be committed to Git
+2. **Network Connection**: Running examples requires network connection to call the API
+3. **Build Requirement**: You must run `pnpm run build` before running examples
+4. **Node Version**: Requires Node.js 18+ (supports `fetch` API)
-## 常见问题
+## FAQ
-### Q: 提示 "config.json 不存在"
-A: 请复制 `config.example.json` 为 `config.json` 并填写邮箱和密码
+### Q: It says "config.json does not exist"
+A: Please copy `config.example.json` to `config.json` and fill in your email and password
-### Q: 音频生成失败
-A: 检查网络连接,确认邮箱和密码正确
+### Q: Audio generation failed
+A: Check network connection and verify that email and password are correct
-### Q: 如何修改音频质量?
-A: 编辑 `config.json` 中的 `kbitrate` 字段
+### Q: How to change audio quality?
+A: Edit the `kbitrate` field in `config.json`
diff --git a/example/run.sh b/example/run.sh
index fec09da..e511d15 100755
--- a/example/run.sh
+++ b/example/run.sh
@@ -1,61 +1,61 @@
#!/bin/bash
-# 示例运行脚本
-# 解决 ts-node 无法正确处理中文文件名的问题
+# Example run script
+# Solve ts-node's inability to properly handle Chinese filenames
-# 检查配置文件
+# Check configuration file
if [ ! -f "config.json" ]; then
- echo "❌ 错误:config.json 不存在"
- echo "📝 请复制 config.example.json 为 config.json 并填写邮箱和密码"
+ echo "❌ Error: config.json does not exist"
+ echo "📝 Please copy config.example.json to config.json and fill in email and password"
exit 1
fi
-# 编译项目
-echo "🔨 正在编译项目..."
+# Build project
+echo "🔨 Building project..."
pnpm run build
-# 复制 config.json 到 dist/example
-echo "📋 复制配置文件到输出目录..."
+# Copy config.json to dist/example
+echo "📋 Copying configuration file to output directory..."
cp config.json ../dist/example/
-# 切换到 dist/example 目录运行示例
+# Switch to dist/example directory to run examples
cd ../dist/example
- # 运行示例
+ # Run example
case "$1" in
0)
- echo "🎙️ 运行示例 0: 简单对话演示"
+ echo "🎙️ Running Example 0: Simple Dialogue Demo"
node "00-简单对话演示.js"
;;
1)
- echo "🎙️ 运行示例 1: 多说话人对话 - 链式调用"
+ echo "🎙️ Running Example 1: Multi-Speaker Dialogue - Chained"
node "01-多说话人对话 - 链式调用.js"
;;
2)
- echo "🎙️ 运行示例 2: 多说话人对话 - 函数式"
+ echo "🎙️ Running Example 2: Multi-Speaker Dialogue - Functional"
node "02-多说话人对话 - 函数式.js"
;;
3)
- echo "🎙️ 运行示例 3: 31 种情感风格演示"
+ echo "🎙️ Running Example 3: 31 Emotional Styles Demo"
node "03-31 种情感风格演示.js"
;;
4)
- echo "🎙️ 运行示例 4: 情感强度控制演示"
+ echo "🎙️ Running Example 4: Style Degree Control Demo"
node "04-情感强度控制演示.js"
;;
5)
- echo "🎙️ 运行示例 5: 文本替换功能演示"
+ echo "🎙️ Running Example 5: Text Substitution Demo"
node "05-文本替换功能演示.js"
;;
*)
- echo "用法:./run.sh <示例编号>"
+ echo "Usage: ./run.sh "
echo ""
- echo "可用示例:"
- echo " 0 - 简单对话演示"
- echo " 1 - 多说话人对话 - 链式调用"
- echo " 2 - 多说话人对话 - 函数式"
- echo " 3 - 31 种情感风格演示"
- echo " 4 - 情感强度控制演示"
- echo " 5 - 文本替换功能演示"
+ echo "Available examples:"
+ echo " 0 - Simple Dialogue Demo"
+ echo " 1 - Multi-Speaker Dialogue - Chained"
+ echo " 2 - Multi-Speaker Dialogue - Functional"
+ echo " 3 - 31 Emotional Styles Demo"
+ echo " 4 - Style Degree Control Demo"
+ echo " 5 - Text Substitution Demo"
exit 1
;;
esac
From b3b97daef65c88012b3ccb919c88f88777af1fa7 Mon Sep 17 00:00:00 2001
From: huan-zz3 <2805033624@qq.com>
Date: Sun, 22 Mar 2026 19:32:01 +0800
Subject: [PATCH 10/10] docs: translate src/AGENTS.md to English
Additional translation - src/ directory knowledge base
---
src/AGENTS.md | 150 +++++++++++++++++++++++++-------------------------
1 file changed, 75 insertions(+), 75 deletions(-)
diff --git a/src/AGENTS.md b/src/AGENTS.md
index 991391c..e8fd790 100644
--- a/src/AGENTS.md
+++ b/src/AGENTS.md
@@ -1,26 +1,26 @@
-# src/ 目录知识库
+# src/ Directory Knowledge Base
-**所属模块**: 核心 TTS 功能实现
+**Module**: Core TTS Functionality Implementation
---
## OVERVIEW
-MsEdgeTTS 核心源代码目录 - 包含 WebSocket 通信、SSML 生成、音频输出控制等全部功能实现。
+MsEdgeTTS core source code directory - Contains all functionality implementations including WebSocket communication, SSML generation, audio output control, etc.
---
## WHERE TO LOOK
-| 任务 | 文件 | 说明 |
+| Task | File | Description |
|------|------|------|
-| 修改 WebSocket 通信逻辑 | `MsEdgeTTS.ts` | 连接初始化、消息收发、边界元数据处理 |
-| 添加新音频格式 | `Output.ts` | `OUTPUT_FORMAT` 枚举 + `OUTPUT_EXTENSIONS` 映射 |
-| 修改语音选项 | `Prosody.ts` | `ProsodyOptions` 类(rate/pitch/volume) |
-| 修改对话构建器 | `DialogueBuilder.ts` | 链式调用构建器 + `buildDialogueSSML()` 函数 |
-| 添加 SSML 工具 | `SSMLUtils.ts` | 转义函数、情感风格验证 |
-| 修改类型定义 | `DialogueTurn.ts` | `DialogueTurn`、`Dialogue`、`TextSegment`、`Substitution` |
-| 添加单元测试 | `*.spec.ts` | 与源码同目录,Jest 配置在 package.json |
+| Modify WebSocket communication logic | `MsEdgeTTS.ts` | Connection initialization, message exchange, boundary metadata processing |
+| Add new audio format | `Output.ts` | `OUTPUT_FORMAT` enum + `OUTPUT_EXTENSIONS` mapping |
+| Modify voice options | `Prosody.ts` | `ProsodyOptions` class (rate/pitch/volume) |
+| Modify dialogue builder | `DialogueBuilder.ts` | Chained builder + `buildDialogueSSML()` function |
+| Add SSML utilities | `SSMLUtils.ts` | Escape functions, emotional style validation |
+| Modify type definitions | `DialogueTurn.ts` | `DialogueTurn`, `Dialogue`, `TextSegment`, `Substitution` |
+| Add unit tests | `*.spec.ts` | Same directory as source, Jest config in package.json |
---
@@ -28,102 +28,102 @@ MsEdgeTTS 核心源代码目录 - 包含 WebSocket 通信、SSML 生成、音频
```
src/
-├── index.ts # Barrel export(6 个导出)
-├── MsEdgeTTS.ts # 核心类(457 行)
-├── MsEdgeTTS.spec.ts # 单元测试
-├── Output.ts # OUTPUT_FORMAT 枚举 + OUTPUT_EXTENSIONS
-├── Prosody.ts # ProsodyOptions 类 + RATE/PITCH/VOLUME 枚举
-├── DialogueTurn.ts # DialogueTurn/Dialogue/TextSegment/Substitution 类型
-├── DialogueBuilder.ts # DialogueBuilder 类 + buildDialogueSSML() 函数
+├── index.ts # Barrel export (6 exports)
+├── MsEdgeTTS.ts # Core class (457 lines)
+├── MsEdgeTTS.spec.ts # Unit tests
+├── Output.ts # OUTPUT_FORMAT enum + OUTPUT_EXTENSIONS
+├── Prosody.ts # ProsodyOptions class + RATE/PITCH/VOLUME enums
+├── DialogueTurn.ts # DialogueTurn/Dialogue/TextSegment/Substitution types
+├── DialogueBuilder.ts # DialogueBuilder class + buildDialogueSSML() function
├── SSMLUtils.ts # escapeSSML/replaceText/validateStyle/validateStyleDegree
-└── utils.ts # joinPath() 路径拼接工具
+└── utils.ts # joinPath() path joining utility
```
---
## CODE MAP
-| Symbol | Type | 文件 | 作用 |
+| Symbol | Type | File | Role |
|--------|------|------|------|
-| `MsEdgeTTS` | Class | `MsEdgeTTS.ts` | 主类:WebSocket 连接、语音合成、流处理 |
-| `OUTPUT_FORMAT` | Enum | `Output.ts` | 支持的音频格式(MP3/WEBM 多种比特率) |
-| `OUTPUT_EXTENSIONS` | Const | `Output.ts` | 格式到文件扩展名映射(`.mp3`/`.webm`) |
-| `ProsodyOptions` | Class | `Prosody.ts` | 语速/音调/音量配置选项 |
-| `RATE` | Enum | `Prosody.ts` | 语速预设(x-slow 到 x-fast) |
-| `PITCH` | Enum | `Prosody.ts` | 音调预设(x-low 到 x-high) |
-| `VOLUME` | Enum | `Prosody.ts` | 音量预设(silent 到 x-LOUD) |
-| `DialogueBuilder` | Class | `DialogueBuilder.ts` | 链式对话构建器 |
-| `buildDialogueSSML` | Function | `DialogueBuilder.ts` | 函数式 SSML 生成 |
-| `validateStyle` | Function | `SSMLUtils.ts` | 验证 28 种官方情感风格 |
-| `escapeSSML` | Function | `SSMLUtils.ts` | XML 转义(& < > " ') |
+| `MsEdgeTTS` | Class | `MsEdgeTTS.ts` | Main class: WebSocket connection, speech synthesis, stream processing |
+| `OUTPUT_FORMAT` | Enum | `Output.ts` | Supported audio formats (MP3/WEBM multiple bitrates) |
+| `OUTPUT_EXTENSIONS` | Const | `Output.ts` | Format to file extension mapping (`.mp3`/`.webm`) |
+| `ProsodyOptions` | Class | `Prosody.ts` | Rate/pitch/volume configuration options |
+| `RATE` | Enum | `Prosody.ts` | Speaking rate presets (x-slow to x-fast) |
+| `PITCH` | Enum | `Prosody.ts` | Pitch presets (x-low to x-high) |
+| `VOLUME` | Enum | `Prosody.ts` | Volume presets (silent to x-LOUD) |
+| `DialogueBuilder` | Class | `DialogueBuilder.ts` | Chained dialogue builder |
+| `buildDialogueSSML` | Function | `DialogueBuilder.ts` | Functional SSML generation |
+| `validateStyle` | Function | `SSMLUtils.ts` | Validate 28 official emotional styles |
+| `escapeSSML` | Function | `SSMLUtils.ts` | XML escape (& < > " ') |
---
## CONVENTIONS
-**TypeScript 配置**:
-- `module`: CommonJS(非 ESM,为兼容性)
+**TypeScript Configuration**:
+- `module`: CommonJS (not ESM, for compatibility)
- `target`: ESNext
- `skipLibCheck`: true
-- 编译排除:`src/**/*.spec.ts`
+- Compilation exclusion: `src/**/*.spec.ts`
-**测试约定**:
-- 测试文件与源码同目录:`*.spec.ts`
-- Jest 配置内联在 `package.json`
-- 测试超时:15000ms
+**Testing Conventions**:
+- Test files in same directory as source: `*.spec.ts`
+- Jest config inline in `package.json`
+- Test timeout: 15000ms
-**导出模式**:
-- 使用 barrel export(`index.ts` 统一导出)
-- 6 个公共 API:`MsEdgeTTS`, `OUTPUT_FORMAT`, `ProsodyOptions`, `DialogueTurn`, `DialogueBuilder`, `buildDialogueSSML`
+**Export Pattern**:
+- Use barrel export (`index.ts` unified export)
+- 6 public APIs: `MsEdgeTTS`, `OUTPUT_FORMAT`, `ProsodyOptions`, `DialogueTurn`, `DialogueBuilder`, `buildDialogueSSML`
-**SSML 处理**:
-- 仅支持 `speak`/`voice`/`prosody`/`mstts:express-as`/`lang`/`sub` 元素
-- 不支持完整 SSML 规范
+**SSML Processing**:
+- Only supports `speak`/`voice`/`prosody`/`mstts:express-as`/`lang`/`sub` elements
+- Full SSML specification not supported
---
## ANTI-PATTERNS (SRC)
-- ❌ **不要** 在浏览器中使用 - API 需要 Edge User-Agent(仅服务器端)
-- ❌ **不要** 修改 `MsEdgeTTS.ts` 中的 Sec-MS-GEC 哈希算法 - 依赖 Azure 认证机制
-- ❌ **不要** 删除 `isomorphic-ws` 依赖 - 实现跨环境兼容
-- ❌ **不要** 使用回调 API - 仅支持 Promise
+- ❌ **Do NOT** use in browser - API requires Edge User-Agent (server-side only)
+- ❌ **Do NOT** modify Sec-MS-GEC hash algorithm in `MsEdgeTTS.ts` - depends on Azure authentication mechanism
+- ❌ **Do NOT** remove `isomorphic-ws` dependency - enables cross-environment compatibility
+- ❌ **Do NOT** use callback API - Promise only
---
## UNIQUE STYLES
-**WebSocket 通信**:
-- Sec-MS-GEC 哈希认证(SHA-256 + Windows Tick 时间戳)
-- 自定义 UUID 生成(非 `crypto.randomUUID`)
-- 消息分隔符:`\r\n\r\n`
+**WebSocket Communication**:
+- Sec-MS-GEC hash authentication (SHA-256 + Windows Tick timestamp)
+- Custom UUID generation (not `crypto.randomUUID`)
+- Message delimiter: `\r\n\r\n`
-**日志系统**:
-- 可选 logger(`enableLogger` 选项)
-- 仅记录连接状态、消息收发
+**Logging System**:
+- Optional logger (`enableLogger` option)
+- Only logs connection status, message exchange
-**多人对话支持**:
-- `DialogueBuilder` 链式调用
-- `buildDialogueSSML()` 函数式 API
-- 支持 28 种情感风格 + 强度控制(0.01-2.0)
-- 支持文本替换(`` 标签)
-- 支持多语言混合(``)
+**Multi-Speaker Dialogue Support**:
+- `DialogueBuilder` chained calls
+- `buildDialogueSSML()` functional API
+- Supports 28 emotional styles + intensity control (0.01-2.0)
+- Supports text substitution (`` tags)
+- Supports multi-language mixing (``)
---
## COMMANDS
```bash
-# 编译 src/ 到 dist/
+# Compile src/ to dist/
pnpm run build
-# 运行测试(src/*.spec.ts)
+# Run tests (src/*.spec.ts)
pnpm test
-# 测试监听模式
+# Test watch mode
pnpm run test:watch
-# 测试覆盖率
+# Test coverage
pnpm run test:cov
```
@@ -131,15 +131,15 @@ pnpm run test:cov
## NOTES
-**关键限制**:
-- 2025 年 12 月更新:API 需要 Edge User-Agent,**浏览器中无法使用**
-- 语音列表需要可信客户端 Token(硬编码:`6A5AA1D4EAFF4E9FB37E23D68491D6F4`)
+**Key Limitations**:
+- December 2025 update: API requires Edge User-Agent, **cannot be used in browsers**
+- Voice list requires trusted client Token (hardcoded: `6A5AA1D4EAFF4E9FB37E23D68491D6F4`)
-**已知问题**:
-- `MsEdgeTTS.ts` 约 457 行 - 复杂度较高,建议拆分
+**Known Issues**:
+- `MsEdgeTTS.ts` approximately 457 lines - high complexity, recommended to split
-**添加新功能流程**:
-1. 在 `src/` 同级创建 `.ts` 文件
-2. 在 `index.ts` 添加导出
-3. 创建同名 `.spec.ts` 测试文件
-4. 运行 `pnpm test` 验证
+**Adding New Features Process**:
+1. Create `.ts` file at same level in `src/`
+2. Add export in `index.ts`
+3. Create `.spec.ts` test file with same name
+4. Run `pnpm test` to verify