Skip to content

feat: Multi-Speaker Dialogue, 28 Emotional Styles#29

Open
huan-zz3 wants to merge 10 commits into
Migushthe2nd:mainfrom
huan-zz3:main
Open

feat: Multi-Speaker Dialogue, 28 Emotional Styles#29
huan-zz3 wants to merge 10 commits into
Migushthe2nd:mainfrom
huan-zz3:main

Conversation

@huan-zz3
Copy link
Copy Markdown

🙏 First PR Notice

🌟 This is my first Pull Request ever! 🌟

I'm incredibly excited (and a bit nervous) to be contributing to this project. While I've done my best to ensure everything is perfect, I understand there might be areas that need improvement.

If you find anything:

  • Incomplete translations
  • Awkward phrasing
  • Missing edge cases
  • Better ways to structure things
  • Any issues whatsoever

Please don't hesitate to let me know! I'm committed to making this PR as good as it needs to be. I'll respond quickly to any feedback and make changes until you're completely satisfied with the result.

Your guidance and review would mean a lot to me as I learn the contribution process. Thank you for your time and consideration! 🙏

✨ New Features Added

1. Multi-Speaker Dialogue Support

Build multi-speaker conversations effortlessly with chainable API:

import { DialogueBuilder, buildDialogueSSML } from "msedge-tts";

// Method 1: Chained builder
const dialogue = new DialogueBuilder()
  .addTurn({
    voice: "zh-CN-XiaoxiaoNeural",
    text: "Hello everyone!",
    style: "cheerful"
  })
  .addTurn({
    voice: "en-US-AndrewNeural",
    text: "Welcome to our podcast!",
    lang: "en-US"
  })
  .build();

// Method 2: Functional API
const ssml = buildDialogueSSML([
  { voice: "zh-CN-YunxiNeural", text: "Today we discuss AI", style: "documentary-narration" },
  { voice: "en-US-AriaNeural", text: "That's right!", style: "excited", lang: "en-US" }
]);

2. 28 Emotional Styles

Full support for Microsoft Azure's official emotional styles:

const ssml = buildDialogueSSML([
  { voice: "zh-CN-XiaomoNeural", text: "I'm so happy!", style: "cheerful" },
  { voice: "zh-CN-XiaoxiaoNeural", text: "Welcome to customer service", style: "customerservice" },
  { voice: "en-US-JennyNeural", text: "Breaking news!", style: "newscast-formal", lang: "en-US" }
]);

Supported Styles:

  • advertisement_upbeat, affectionate, angry, assistant
  • calm, chat, cheerful, customerservice
  • depressed, documentary-narration, empathetic, excited
  • fearful, friendly, gentle, hopeful
  • lyrical, narration-professional, narration-relaxed
  • newscast, newscast-casual, newscast-formal
  • poetry-reading, sad, serious, shouting
  • sports_commentary, sports_commentary_excited
  • terrified, unfriendly, whispering

3. Style Degree Control

Fine-tune emotional intensity from 0.01 to 2.0:

const ssml = buildDialogueSSML([
  { 
    voice: "zh-CN-XiaomoNeural", 
    text: "This is normal",
    style: "sad",
    styleDegree: 0.5  // Weaker emotion
  },
  { 
    voice: "zh-CN-XiaomoNeural", 
    text: "This is heartbreaking!",
    style: "sad",
    styleDegree: 2.0  // Strongest emotion
  }
]);

4. Text Substitution

Replace abbreviations and technical terms with full pronunciations:

const ssml = buildDialogueSSML([
  { 
    voice: "zh-CN-XiaoxiaoNeural",
    text: "W3C制定了 Web 标准,API 基于 HTTP 协议",
    substitutions: [
      { text: "W3C", alias: "万维网联盟" },
      { text: "Web", alias: "万维网" },
      { text: "HTTP", alias: "超文本传输协议" }
    ],
    style: "narration-professional"
  }
]);

5. Comprehensive Examples

6 ready-to-run examples demonstrating all features:

Example Description File
0 Simple dialogue demo 00-simple-dialogue-demo.ts
1 Multi-speaker (chained) 01-multi-speaker-dialogue-chained.ts
2 Multi-speaker (functional) 02-multi-speaker-dialogue-functional.ts
3 31 emotional styles 03-31-emotional-styles-demo.ts
4 Style degree control 04-style-degree-control-demo.ts
5 Text substitution 05-text-substitution-demo.ts

huan-zz3 and others added 10 commits March 18, 2026 16:32
Add comprehensive project documentation tailored for AI agents and
developers. Includes project overview, directory structure, code map,
and development conventions.

- Define core stack (TypeScript, WebSocket, Jest, pnpm)
- Map tasks to specific file locations
- List code symbols and their roles
- Document TypeScript and testing configurations
- Specify anti-patterns and unique styles (SSML, WebSocket)
- Include SSML reference documentation for speech synthesis

This ensures consistent understanding of the codebase architecture
and constraints during automated development or refactoring.
Add new TypeScript examples to the example directory to demonstrate
core library features and API usage patterns.

- Add .gitignore to exclude sensitive config and generated audio files
- Add dialogue demo showing SSML structure for multi-role conversation
- Add text substitution demo showcasing professional term handling

These scripts serve as reference implementations for API integration
and help users verify functionality with real-world scenarios.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
- Delete test-multi-speaker-demo.ts (non-standard location)
- Add src/AGENTS.md for core source code documentation
- Update root AGENTS.md with complete project structure
- Add new source files: DialogueBuilder.ts, DialogueTurn.ts, SSMLUtils.ts
Update project documentation to reflect new API capabilities and conventions.

- Add DialogueBuilder class and interfaces (DialogueTurn, TextSegment)
- Document SSML utilities (escapeSSML, validateStyle, validateStyleDegree)
- Update project overview with current code scale and feature list
- Add sections for error handling, logging, and SSML processing conventions
- List specific error scenarios and anti-patterns for contributors
- Renamed 6 example TypeScript files to English names
- Updated example/README.md with new filenames
- All example files now use English naming convention
- Git history preserved via git mv
- Translated DialogueTurn.ts: interfaces and class comments
- Translated SSMLUtils.ts: function and constant comments
- Translated DialogueBuilder.ts: class and method comments
- Standardized terminology (Dialogue, Turn, Substitution, SSML, etc.)
- All Chinese characters removed from src/ JSDoc comments
- Build verification: pnpm run build passes
- Translated all 6 example TypeScript files
- Updated all comments, console messages, and error messages to English
- Task 7: 00-simple-dialogue-demo.ts - Chinese SSML retained (multilingual demo)
- Task 8: 01-multi-speaker-dialogue-chained.ts - Chinese dialogue retained (multilingual demo)
- Task 9: 02-multi-speaker-dialogue-functional.ts - Chinese dialogue retained (multilingual demo)
- Task 10: 03-31-emotional-styles-demo.ts - Changed to English examples
- Task 11: 04-style-degree-control-demo.ts - Changed to English examples
- Task 12: 05-text-substitution-demo.ts - Changed substitution examples to English
- All output filenames updated to English
- Build verification: pnpm run build passes
Wave 4 - Documentation Translation:
- example/run.sh: Translated all shell comments and echo messages
- example/README.md: Translated complete example documentation
- AGENTS.md: Translated project knowledge base (184 lines)
- docs/ssml-structure.md: Translated SSML structure documentation (252 lines)
- docs/ssml-voice.md: Translated SSML voice documentation (226 lines)
- docs/ssml-pronunciation.md: Translated SSML pronunciation docs (199 lines)

All documentation now in English with:
- Technical documentation style
- Accurate SSML terminology
- Microsoft documentation attribution retained
- Build verification: pnpm run build passes
Additional translation - src/ directory knowledge base
@Migushthe2nd
Copy link
Copy Markdown
Owner

Migushthe2nd commented Mar 23, 2026

I like the idea. I could maybe add a utils feature that could provide this.
Please however

  • remove ai docs
  • remove ai agent files
  • use an xml escape library, not a custom-made function (this library could provide a automatic escaping in the non-raw functions)
  • review your code, examples and config, to remove any reliances on and mentions of your "https://ttspro.cn/" website, and do not use a config like this in the first place. (See test scripts for examples)
  • please verify for me that all your voice types, parameters, and ssml speech tags work. I feel like AI just assumed all of them work (while some are blocked - see readme.md)

@huan-zz3
Copy link
Copy Markdown
Author

thanks for teaching, I will make changes as your requirment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants