Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Build output
dist
dist/

# General files
node_modules
Expand Down Expand Up @@ -28,4 +28,7 @@ package-lock.json
example_audio.webm
example_audio_pitched.webm

msedgetts-test/
# Generated test files and AI-generated content
msedgetts-test/
.sisyphus/
.github/
184 changes: 184 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
# PROJECT KNOWLEDGE BASE

**Generated:** 2026-03-22
**Commit:** main branch
**Branch:** main

## OVERVIEW

Microsoft Edge TTS Text-to-Speech Library - A Node.js/TypeScript module using Azure Speech Service (Microsoft Edge Read Aloud API). Supports speech synthesis, SSML, multi-speaker dialogue, emotional style control, and multiple audio format output.

**Core Stack**: TypeScript, WebSocket, Jest (testing), pnpm (package manager)
**Code Size**: ~1010 lines of TypeScript (src/ directory)
**Last Updated**: 2026-03-22

## STRUCTURE

```
./
โ”œโ”€โ”€ src/ # All source code (9 TypeScript files)
โ”‚ โ”œโ”€โ”€ index.ts # Main entry point (barrel exports, 6 exports)
โ”‚ โ”œโ”€โ”€ MsEdgeTTS.ts # Core TTS class (~499 lines, WebSocket communication)
โ”‚ โ”œโ”€โ”€ MsEdgeTTS.spec.ts # Unit tests
โ”‚ โ”œโ”€โ”€ Output.ts # Audio output format enum + extension mapping
โ”‚ โ”œโ”€โ”€ Prosody.ts # Rate/pitch/volume options class
โ”‚ โ”œโ”€โ”€ DialogueTurn.ts # Dialogue turn type definition
โ”‚ โ”œโ”€โ”€ DialogueBuilder.ts # Dialogue builder class + SSML builder function
โ”‚ โ”œโ”€โ”€ SSMLUtils.ts # SSML utility functions (escape, validate)
โ”‚ โ””โ”€โ”€ utils.ts # Path joining utility
โ”œโ”€โ”€ example/ # Example demo code (6 Chinese-named files)
โ”‚ โ”œโ”€โ”€ 00-็ฎ€ๅ•ๅฏน่ฏๆผ”็คบ.ts
โ”‚ โ”œโ”€โ”€ 01-ๅคš่ฏด่ฏไบบๅฏน่ฏ - ้“พๅผ่ฐƒ็”จ.ts
โ”‚ โ”œโ”€โ”€ 02-ๅคš่ฏด่ฏไบบๅฏน่ฏ - ๅ‡ฝๆ•ฐๅผ.ts
โ”‚ โ”œโ”€โ”€ 03-31 ็งๆƒ…ๆ„Ÿ้ฃŽๆ ผๆผ”็คบ.ts
โ”‚ โ”œโ”€โ”€ 04-ๆƒ…ๆ„ŸๅผบๅบฆๆŽงๅˆถๆผ”็คบ.ts
โ”‚ โ””โ”€โ”€ 05-ๆ–‡ๆœฌๆ›ฟๆขๅŠŸ่ƒฝๆผ”็คบ.ts
โ”œโ”€โ”€ .github/workflows/
โ”‚ โ””โ”€โ”€ deploy_docs.yml # CI/CD: Documentation deployment to gh-pages only
โ”œโ”€โ”€ docs/ # Manually written SSML documentation
โ”œโ”€โ”€ package.json # Dependencies + Jest config (inline)
โ”œโ”€โ”€ tsconfig.json # TypeScript compilation configuration
โ””โ”€โ”€ README.md # API documentation
```

## WHERE TO LOOK

| Task | Location | Description |
|------|------|------|
| Add new feature | `src/` | Create `.ts` file at same level |
| Modify core logic | `src/MsEdgeTTS.ts` | WebSocket communication, SSML processing |
| Add audio format | `src/Output.ts` | `OUTPUT_FORMAT` enum |
| Modify voice options | `src/Prosody.ts` | `ProsodyOptions` class |
| Add tests | `src/*.spec.ts` | Tests in same directory as source |
| Modify CI/CD | `.github/workflows/` | Documentation deployment flow only |
| Configure Jest | `package.json` | Jest config inline in package.json |

## CODE MAP

| Symbol | Type | Location | Role |
|--------|------|----------|------|
| `MsEdgeTTS` | Class | `src/MsEdgeTTS.ts` | Main class: WebSocket connection, speech synthesis |
| `OUTPUT_FORMAT` | Enum | `src/Output.ts` | Supported audio output formats (MP3, WEBM) |
| `OUTPUT_EXTENSIONS` | Const | `src/Output.ts` | Format to file extension mapping |
| `ProsodyOptions` | Class | `src/Prosody.ts` | Rate/pitch/volume configuration options |
| `RATE` | Enum | `src/Prosody.ts` | Speaking rate presets (x-slow to x-fast) |
| `PITCH` | Enum | `src/Prosody.ts` | Pitch presets (x-low to x-high) |
| `VOLUME` | Enum | `src/Prosody.ts` | Volume presets (silent to x-LOUD) |
| `Voice` | Type | `src/MsEdgeTTS.ts` | Voice metadata structure |
| `MetadataOptions` | Class | `src/MsEdgeTTS.ts` | Boundary metadata options (sentence/word) |
| `DialogueBuilder` | Class | `src/DialogueBuilder.ts` | Chained dialogue builder |
| `buildDialogueSSML` | Function | `src/DialogueBuilder.ts` | Functional SSML generation |
| `escapeSSML` | Function | `src/SSMLUtils.ts` | XML escape (& < > " ') |
| `validateStyle` | Function | `src/SSMLUtils.ts` | Validate 28 official emotional styles |
| `validateStyleDegree` | Function | `src/SSMLUtils.ts` | Validate styleDegree range (0.01-2.0) |
| `joinPath` | Function | `src/utils.ts` | Path joining utility |

## CONVENTIONS

**TypeScript Configuration**:
- `target`: ESNext
- `module`: CommonJS
- `outDir`: dist/
- Skip library check (skipLibCheck: true)

**Testing Conventions**:
- Test files: `*.spec.ts` in same directory as source
- Jest config inline in `package.json`
- Test timeout: 15 seconds

**Package Manager**:
- pnpm required (preinstall hook)
- Version lock: pnpm-lock.yaml

**Error Handling Conventions**:
- Throw clear Error on validation failure (see SSMLUtils.ts)
- Invalid input throws immediately, no fallback

**Logging Conventions**:
- Optional logger via `enableLogger` option
- Private `_log()` method for logging
- Log only connection status, message exchange

**SSML Processing Conventions**:
- Escape & first, then others, to prevent double escaping
- Only `speak`, `voice`, `prosody` elements supported

## ANTI-PATTERNS (THIS PROJECT)

- โŒ **Do NOT** use npm/yarn - project requires pnpm
- โŒ **Do NOT** move tests to separate directory - keep `*.spec.ts` alongside source
- โŒ **Do NOT** modify tsconfig module/moduleResolution - depends on CommonJS
- โŒ **Do NOT** modify Sec-MS-GEC hash algorithm - depends on Azure authentication
- โŒ **Do NOT** remove `isomorphic-ws` dependency - enables cross-environment compatibility
- โŒ **Do NOT** use callback API - Promise only
- โŒ **Do NOT** use in browser - API requires Edge User-Agent (server-side only)
- โŒ **Do NOT** delete files outside `dist/` - publish includes only dist directory

## ERROR HANDLING

**Error Throwing Scenarios**:
- Metadata not configured: `"Speech synthesis not configured yet..."`
- Invalid voiceLocale: `"Could not infer voiceLocale from voiceName..."`
- Invalid style: `'Invalid style "xxx". Valid styles: ...'`
- styleDegree out of range: `"styleDegree must be between 0.01 and 2.0"`
- Empty voice name: `"voice name is required and cannot be empty"`
- Empty text: `"text cannot be empty string"`

## UNIQUE STYLES

**SSML Template**:
- Default template: `<speak>` โ†’ `<voice>` โ†’ `<prosody>`
- Only `speak`, `voice`, `prosody` elements supported
- Full SSML not supported

**WebSocket Communication**:
- Uses `isomorphic-ws` for browser/Node compatibility
- Custom UUID generation (not crypto.randomUUID)
- Sec-MS-GEC hash authentication mechanism

**Logging System**:
- Optional logger (enableLogger option)
- Logs only connection status, message exchange

## COMMANDS

```bash
# Install dependencies
pnpm install

# Development (build + run tests)
pnpm run dev

# Compile TypeScript
pnpm run build

# Run tests
pnpm test

# Tests (watch mode)
pnpm run test:watch

# Tests (coverage)
pnpm run test:cov

# Publish to npm
pnpm run publish
```

## NOTES

**Key Limitations**:
- December 2025 update: API requires Edge User-Agent, **cannot be used in browsers**
- Promise API only, no callback support
- Voice list requires trusted client Token (hardcoded in source)

**Known Issues**:
- `src/test/test.ts` and `src/test/jest-e2e.json` in package.json do not exist (legacy config)
- Insufficient test coverage: only 1 test file (MsEdgeTTS.spec.ts), 11% coverage
- utils.ts is too simplified (only 6 lines), could be merged
- example/ directory mixes non-TS files (config.json, run.sh, etc.)

**Publish Flow**:
1. `pnpm run build` compiles to dist/
2. `pnpm publish --access=public`
3. Documentation auto-deploys to gh-pages (via GitHub Actions)
Loading