Skip to content

feat: Chatterbox TTS engine with multilingual voice cloning#257

Merged
jamiepine merged 4 commits intomainfrom
feat/chatterbox
Mar 13, 2026
Merged

feat: Chatterbox TTS engine with multilingual voice cloning#257
jamiepine merged 4 commits intomainfrom
feat/chatterbox

Conversation

@jamiepine
Copy link
Owner

@jamiepine jamiepine commented Mar 13, 2026

Summary

Adds Chatterbox TTS as the third voice engine in Voicebox, supporting 23 languages including Hebrew. Also fixes download progress tracking for all non-Qwen engines and the GpuAcceleration crash on the server settings screen.

Chatterbox Engine

  • Backend: Full ChatterboxTTSBackend with CPU-forced loading on macOS (monkey-patched torch.load for CUDA-saved weights), eager attention fix for transformers >= 4.36, per-language generation defaults, and trim_tts_output() post-processing to cut trailing silence/hallucination
  • Frontend: Added to the flat model dropdown (Chatterbox), Hebrew (he) language support, instruct/model_size suppressed for non-Qwen engines, model management under "Other Voice Models"
  • Install: chatterbox-tts installed with --no-deps (it pins numpy<1.26, torch==2.6.0, etc. incompatible with our stack); sub-dependencies listed explicitly in requirements.txt
  • HF Repo: ResembleAI/chatterbox (~3.2GB total: t3_mtl23ls_v2.safetensors 2.14GB, s3gen.pt 1.06GB, ve.pt 5.7MB)

Download Progress Fix

  • Added HFProgressTracker (tqdm interception) to both LuxTTS and Chatterbox backends — previously only Qwen piped file-level progress to the frontend
  • Extended ActiveDownloadTask with progress/current/total/filename fields so the /tasks/active polling endpoint carries progress data
  • Inline progress bar + byte counts in the model list items and detail modal
  • Polling rate increases to 1s during active downloads (5s otherwise)

GpuAcceleration Fix

  • Fixed ReferenceError: Cannot access uninitialized variable crash — cudaStatusLoading was used inside its own useQuery declaration before being initialized

Files Changed

New

  • backend/backends/chatterbox_backend.py — Core Chatterbox backend (315 lines)
  • backend/utils/audio.pytrim_tts_output() utility

Modified (Backend)

  • backend/backends/__init__.py — Engine registry
  • backend/backends/luxtts_backend.py — Added HFProgressTracker
  • backend/main.py — 5 endpoint dispatch points + progress in /tasks/active
  • backend/models.py — Engine/language regexes + ActiveDownloadTask fields
  • backend/requirements.txt — Chatterbox sub-dependencies
  • justfile / Makefile--no-deps chatterbox-tts install

Modified (Frontend)

  • app/src/lib/api/types.ts — Engine union + ActiveDownloadTask fields
  • app/src/lib/constants/languages.ts — Hebrew
  • app/src/lib/hooks/useGenerationForm.ts — Chatterbox model mapping
  • app/src/components/Generation/FloatingGenerateBox.tsx — Dropdown
  • app/src/components/Generation/GenerationForm.tsx — Dropdown + description
  • app/src/components/ServerSettings/ModelManagement.tsx — Inline progress UI
  • app/src/components/ServerSettings/GpuAcceleration.tsx — Crash fix

Testing

  • Verified Chatterbox model loading + English generation (1.64s audio, ~13s on CPU)
  • Verified Hebrew generation (3.56s audio, ~21s on CPU)
  • TypeScript compilation clean (no new errors)

Summary by CodeRabbit

  • New Features

    • Added Chatterbox TTS engine for multilingual voice generation with Hebrew language support.
    • Redesigned model management interface with detailed metadata display, download progress tracking, status badges, and action controls.
    • Enhanced download progress visibility with per-file tracking and improved error handling.
  • Bug Fixes

    • Fixed model status polling behavior to improve query responsiveness.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9829d0c2-96f3-4a29-8e30-29c5d20276df

📥 Commits

Reviewing files that changed from the base of the PR and between 3576521 and c54ee14.

⛔ Files ignored due to path filters (2)
  • tauri/src-tauri/gen/Assets.car is excluded by !**/gen/**
  • tauri/src-tauri/gen/voicebox.icns is excluded by !**/gen/**
📒 Files selected for processing (19)
  • Makefile
  • app/src/components/Generation/FloatingGenerateBox.tsx
  • app/src/components/Generation/GenerationForm.tsx
  • app/src/components/ModelsTab/ModelsTab.tsx
  • app/src/components/ServerSettings/GpuAcceleration.tsx
  • app/src/components/ServerSettings/ModelManagement.tsx
  • app/src/components/StoriesTab/StoriesTab.tsx
  • app/src/lib/api/types.ts
  • app/src/lib/constants/languages.ts
  • app/src/lib/hooks/useGenerationForm.ts
  • backend/backends/__init__.py
  • backend/backends/chatterbox_backend.py
  • backend/backends/luxtts_backend.py
  • backend/main.py
  • backend/models.py
  • backend/requirements.txt
  • backend/utils/audio.py
  • justfile
  • scripts/test_download_progress.py

📝 Walkthrough

Walkthrough

This PR introduces comprehensive support for a new multilingual TTS engine called Chatterbox, including a complete backend implementation, frontend UI integration, type system extensions, Hebrew language support, and enhanced model management with HuggingFace metadata and progress tracking.

Changes

Cohort / File(s) Summary
Chatterbox Backend Implementation
backend/backends/chatterbox_backend.py, backend/backends/__init__.py
Introduces new ChatterboxTTSBackend class with device selection, async model loading, HF cache checking, voice prompt handling, and per-language generation defaults. Registers chatterbox in TTS_ENGINES mapping and get_tts_backend_for_engine routing.
Backend API Integration
backend/main.py
Extends /generate, /generate/stream, /models/status, /models/download, and /models/delete endpoints to support chatterbox engine with model caching, background downloads, post-processing via trim_tts_output, and progress tracking.
Type Definitions & Models
app/src/lib/api/types.ts, backend/models.py
Adds chatterbox to engine enum in GenerationRequest, introduces HuggingFaceModelInfo interface, extends ActiveDownloadTask with progress/current/total/filename fields, adds hf_repo_id to ModelStatus, and extends language patterns to include Hebrew (he).
Frontend Engine Selection
app/src/components/Generation/FloatingGenerateBox.tsx, app/src/components/Generation/GenerationForm.tsx
Adds chatterbox as a selectable engine option with conditional UI rendering, description text ("Multilingual, incl. Hebrew"), and engine mapping logic for model size selection.
Form & Hook Logic
app/src/lib/hooks/useGenerationForm.ts
Extends generation schema to recognize chatterbox engine, adds modelName/displayName resolution for chatterbox-tts, and adjusts payload construction to conditionally include model_size and instruct only for Qwen.
Language & Utility
app/src/lib/constants/languages.ts, backend/utils/audio.py
Adds Hebrew language support to SUPPORTED_LANGUAGES constant. Introduces trim_tts_output function for silence trimming and audio post-processing.
Model Management Refactor
app/src/components/ServerSettings/ModelManagement.tsx
Replaces flat list UI with sectioned, modal-driven detail view, adds HuggingFace metadata fetching, progress bar display, model status badges, error console panel, and dynamic action buttons (Download, Cancel, Delete, Retry).
Backend Progress Tracking
backend/backends/luxtts_backend.py
Wraps model loading with HFProgressTracker for structured progress reporting during HuggingFace downloads.
UI & Props
app/src/components/StoriesTab/StoriesTab.tsx, app/src/components/ModelsTab/ModelsTab.tsx
Adds isPlayerOpen prop to FloatingGenerateBox and integrates audioUrl state from player store. Updates ModelsTab layout from spaced column to padded full-height column.
Server & GPU Status
app/src/components/ServerSettings/GpuAcceleration.tsx
Changes refetchInterval logic from static loading state to dynamic function checking query status pending state.
Build & Setup
Makefile, justfile, backend/requirements.txt
Adds chatterbox-tts package installation (with \-\-no-deps flag), includes sub-dependencies (conformer, diffusers, omegaconf, pykakasi, resemble-perth, s3tokenizer, spacy-pkuseg, pyloudnorm) and auxiliary libraries (numpy, numba, httpx, python-multipart, Pillow).
Testing Utilities
scripts/test_download_progress.py
New script for analyzing HuggingFace download progress via tqdm instrumentation; includes dispatch functions for Qwen, LuxTTS, and Chatterbox model downloads with progress event logging.

Sequence Diagram(s)

sequenceDiagram
    participant Client as React Client
    participant Server as FastAPI Server
    participant HF as HuggingFace
    participant Cache as Local Cache
    participant CB as Chatterbox<br/>Backend

    Client->>Server: POST /generate (engine: chatterbox)
    Server->>Cache: Check if model cached
    alt Model Not Cached
        Server->>Server: Return 202 Accepted<br/>(downloading: true)
        Server->>HF: Download model (background)
        HF->>Cache: Store model files
        Server->>Server: Signal download complete
    end
    
    Server->>CB: load_model()
    CB->>Cache: Verify model files exist
    CB->>CB: Initialize ChatterboxMultilingualTTS
    
    Client->>Server: POST /generate (after model ready)
    Server->>CB: create_voice_prompt(audio_path)
    CB->>CB: Load reference audio
    Server->>CB: generate(text, voice_prompt, language)
    CB->>CB: Run generation via model.generate()
    CB->>CB: trim_tts_output(audio)
    Server->>Client: Return generated audio
Loading
sequenceDiagram
    participant User as User
    participant UI as ModelManagement UI
    participant Server as Backend API
    participant HF as HuggingFace
    participant Store as Model Store

    User->>UI: Click on model card
    UI->>UI: Set selectedModel, open detail modal
    UI->>Server: GET hfModelInfo (if hf_repo_id exists)
    Server->>HF: Fetch model metadata
    HF->>Server: Return metadata
    Server->>UI: Return HuggingFaceModelInfo
    UI->>UI: Display license, downloads,<br/>tags, modified date

    alt Model Not Downloaded
        User->>UI: Click Download
        UI->>Server: POST /models/download
        Server->>Store: Background download via HF
        Server->>UI: Poll /models/status
        UI->>UI: Show progress bar
    else Model Downloaded
        User->>UI: Click Delete
        UI->>Server: POST /models/delete
        Server->>Store: Remove from cache
        UI->>UI: Update model status
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 Hoppy news, the chatterbox sings!
Hebrew whispers on multilingual wings,
Cache it, trim it, progress displays,
A fluffy new engine in all the right ways! ✨🎵

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/chatterbox
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

- New ChatterboxTTSBackend wrapping ChatterboxMultilingualTTS (ResembleAI/chatterbox)
- Supports 23 languages including Hebrew, forces CPU on macOS (MPS issue)
- Monkey-patches torch.load for CPU loading, forces eager attention for compatibility
- trim_tts_output utility cuts trailing silence/hallucination from Chatterbox output
- Full engine integration: /generate, /generate/stream, model status/download/delete
- Hebrew (he) added to supported languages in frontend and backend validation
- Single flat model dropdown extended with Chatterbox option in both generation UIs
- ModelManagement UI groups LuxTTS and Chatterbox under 'Other Voice Models' section
chatterbox-tts 0.1.6 pins numpy<1.26 and torch==2.6 which are
incompatible with Python 3.12+. Install with --no-deps and list
its sub-dependencies explicitly in requirements.txt.

Also removes HFProgressTracker from chatterbox backend to avoid
'generator didn't stop after throw()' errors from tqdm patching.
- Add HFProgressTracker to LuxTTS and Chatterbox backends so tqdm-based
  file-level download progress reaches the frontend (previously only Qwen
  had this, LuxTTS/Chatterbox showed a static spinner)
- Add progress/current/total/filename fields to ActiveDownloadTask so the
  /tasks/active polling endpoint carries progress data
- Show inline progress bar + bytes in the model list and detail modal,
  poll at 1s during active downloads (5s otherwise)
- Fix GpuAcceleration crash: cudaStatusLoading was referenced before
  initialization in its own useQuery declaration
… loaded models, fix generate box overlapping player on stories route
@jamiepine jamiepine changed the base branch from feat/luxtts to main March 13, 2026 09:10
@jamiepine jamiepine merged commit 3e6513c into main Mar 13, 2026
1 check was pending
jamiepine added a commit that referenced this pull request Mar 13, 2026
- Reflects merged PRs: #254 (LuxTTS/multi-engine), #257 (Chatterbox), #252 (CUDA swap), #238 (download UI)
- Updated architecture diagram to show all 4 TTS engines
- Added TTS engine comparison table and multi-engine architecture section
- Marked resolved bottlenecks (singleton backend, frontend Qwen assumptions)
- Updated PR triage: marked #194 and #33 as superseded
- Added 'Adding a New Engine' guide (now ~1 day effort)
- Updated recommended priorities to reflect current state
- Added new API endpoints (CUDA, cancel, active tasks)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant