feat: Chatterbox TTS engine with multilingual voice cloning by jamiepine · Pull Request #257 · jamiepine/voicebox

jamiepine · 2026-03-13T08:36:03Z

Summary

Adds Chatterbox TTS as the third voice engine in Voicebox, supporting 23 languages including Hebrew. Also fixes download progress tracking for all non-Qwen engines and the GpuAcceleration crash on the server settings screen.

Chatterbox Engine

Backend: Full ChatterboxTTSBackend with CPU-forced loading on macOS (monkey-patched torch.load for CUDA-saved weights), eager attention fix for transformers >= 4.36, per-language generation defaults, and trim_tts_output() post-processing to cut trailing silence/hallucination
Frontend: Added to the flat model dropdown (Chatterbox), Hebrew (he) language support, instruct/model_size suppressed for non-Qwen engines, model management under "Other Voice Models"
Install: chatterbox-tts installed with --no-deps (it pins numpy<1.26, torch==2.6.0, etc. incompatible with our stack); sub-dependencies listed explicitly in requirements.txt
HF Repo: ResembleAI/chatterbox (~3.2GB total: t3_mtl23ls_v2.safetensors 2.14GB, s3gen.pt 1.06GB, ve.pt 5.7MB)

Download Progress Fix

Added HFProgressTracker (tqdm interception) to both LuxTTS and Chatterbox backends — previously only Qwen piped file-level progress to the frontend
Extended ActiveDownloadTask with progress/current/total/filename fields so the /tasks/active polling endpoint carries progress data
Inline progress bar + byte counts in the model list items and detail modal
Polling rate increases to 1s during active downloads (5s otherwise)

GpuAcceleration Fix

Fixed ReferenceError: Cannot access uninitialized variable crash — cudaStatusLoading was used inside its own useQuery declaration before being initialized

Files Changed

New

backend/backends/chatterbox_backend.py — Core Chatterbox backend (315 lines)
backend/utils/audio.py — trim_tts_output() utility

Modified (Backend)

backend/backends/__init__.py — Engine registry
backend/backends/luxtts_backend.py — Added HFProgressTracker
backend/main.py — 5 endpoint dispatch points + progress in /tasks/active
backend/models.py — Engine/language regexes + ActiveDownloadTask fields
backend/requirements.txt — Chatterbox sub-dependencies
justfile / Makefile — --no-deps chatterbox-tts install

Modified (Frontend)

app/src/lib/api/types.ts — Engine union + ActiveDownloadTask fields
app/src/lib/constants/languages.ts — Hebrew
app/src/lib/hooks/useGenerationForm.ts — Chatterbox model mapping
app/src/components/Generation/FloatingGenerateBox.tsx — Dropdown
app/src/components/Generation/GenerationForm.tsx — Dropdown + description
app/src/components/ServerSettings/ModelManagement.tsx — Inline progress UI
app/src/components/ServerSettings/GpuAcceleration.tsx — Crash fix

Testing

Verified Chatterbox model loading + English generation (1.64s audio, ~13s on CPU)
Verified Hebrew generation (3.56s audio, ~21s on CPU)
TypeScript compilation clean (no new errors)

Summary by CodeRabbit

New Features
- Added Chatterbox TTS engine for multilingual voice generation with Hebrew language support.
- Redesigned model management interface with detailed metadata display, download progress tracking, status badges, and action controls.
- Enhanced download progress visibility with per-file tracking and improved error handling.
Bug Fixes
- Fixed model status polling behavior to improve query responsiveness.

coderabbitai · 2026-03-13T08:38:26Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9829d0c2-96f3-4a29-8e30-29c5d20276df

📥 Commits

Reviewing files that changed from the base of the PR and between 3576521 and c54ee14.

⛔ Files ignored due to path filters (2)

tauri/src-tauri/gen/Assets.car is excluded by !**/gen/**
tauri/src-tauri/gen/voicebox.icns is excluded by !**/gen/**

📒 Files selected for processing (19)

Makefile
app/src/components/Generation/FloatingGenerateBox.tsx
app/src/components/Generation/GenerationForm.tsx
app/src/components/ModelsTab/ModelsTab.tsx
app/src/components/ServerSettings/GpuAcceleration.tsx
app/src/components/ServerSettings/ModelManagement.tsx
app/src/components/StoriesTab/StoriesTab.tsx
app/src/lib/api/types.ts
app/src/lib/constants/languages.ts
app/src/lib/hooks/useGenerationForm.ts
backend/backends/__init__.py
backend/backends/chatterbox_backend.py
backend/backends/luxtts_backend.py
backend/main.py
backend/models.py
backend/requirements.txt
backend/utils/audio.py
justfile
scripts/test_download_progress.py

📝 Walkthrough

Walkthrough

This PR introduces comprehensive support for a new multilingual TTS engine called Chatterbox, including a complete backend implementation, frontend UI integration, type system extensions, Hebrew language support, and enhanced model management with HuggingFace metadata and progress tracking.

Changes

Cohort / File(s)	Summary
Chatterbox Backend Implementation `backend/backends/chatterbox_backend.py`, `backend/backends/__init__.py`	Introduces new ChatterboxTTSBackend class with device selection, async model loading, HF cache checking, voice prompt handling, and per-language generation defaults. Registers chatterbox in TTS_ENGINES mapping and get_tts_backend_for_engine routing.
Backend API Integration `backend/main.py`	Extends `/generate`, `/generate/stream`, `/models/status`, `/models/download`, and `/models/delete` endpoints to support chatterbox engine with model caching, background downloads, post-processing via trim_tts_output, and progress tracking.
Type Definitions & Models `app/src/lib/api/types.ts`, `backend/models.py`	Adds chatterbox to engine enum in GenerationRequest, introduces HuggingFaceModelInfo interface, extends ActiveDownloadTask with progress/current/total/filename fields, adds hf_repo_id to ModelStatus, and extends language patterns to include Hebrew (he).
Frontend Engine Selection `app/src/components/Generation/FloatingGenerateBox.tsx`, `app/src/components/Generation/GenerationForm.tsx`	Adds chatterbox as a selectable engine option with conditional UI rendering, description text ("Multilingual, incl. Hebrew"), and engine mapping logic for model size selection.
Form & Hook Logic `app/src/lib/hooks/useGenerationForm.ts`	Extends generation schema to recognize chatterbox engine, adds modelName/displayName resolution for chatterbox-tts, and adjusts payload construction to conditionally include model_size and instruct only for Qwen.
Language & Utility `app/src/lib/constants/languages.ts`, `backend/utils/audio.py`	Adds Hebrew language support to SUPPORTED_LANGUAGES constant. Introduces trim_tts_output function for silence trimming and audio post-processing.
Model Management Refactor `app/src/components/ServerSettings/ModelManagement.tsx`	Replaces flat list UI with sectioned, modal-driven detail view, adds HuggingFace metadata fetching, progress bar display, model status badges, error console panel, and dynamic action buttons (Download, Cancel, Delete, Retry).
Backend Progress Tracking `backend/backends/luxtts_backend.py`	Wraps model loading with HFProgressTracker for structured progress reporting during HuggingFace downloads.
UI & Props `app/src/components/StoriesTab/StoriesTab.tsx`, `app/src/components/ModelsTab/ModelsTab.tsx`	Adds isPlayerOpen prop to FloatingGenerateBox and integrates audioUrl state from player store. Updates ModelsTab layout from spaced column to padded full-height column.
Server & GPU Status `app/src/components/ServerSettings/GpuAcceleration.tsx`	Changes refetchInterval logic from static loading state to dynamic function checking query status pending state.
Build & Setup `Makefile`, `justfile`, `backend/requirements.txt`	Adds chatterbox-tts package installation (with \-\-no-deps flag), includes sub-dependencies (conformer, diffusers, omegaconf, pykakasi, resemble-perth, s3tokenizer, spacy-pkuseg, pyloudnorm) and auxiliary libraries (numpy, numba, httpx, python-multipart, Pillow).
Testing Utilities `scripts/test_download_progress.py`	New script for analyzing HuggingFace download progress via tqdm instrumentation; includes dispatch functions for Qwen, LuxTTS, and Chatterbox model downloads with progress event logging.

Sequence Diagram(s)

sequenceDiagram
    participant Client as React Client
    participant Server as FastAPI Server
    participant HF as HuggingFace
    participant Cache as Local Cache
    participant CB as Chatterbox<br/>Backend

    Client->>Server: POST /generate (engine: chatterbox)
    Server->>Cache: Check if model cached
    alt Model Not Cached
        Server->>Server: Return 202 Accepted<br/>(downloading: true)
        Server->>HF: Download model (background)
        HF->>Cache: Store model files
        Server->>Server: Signal download complete
    end
    
    Server->>CB: load_model()
    CB->>Cache: Verify model files exist
    CB->>CB: Initialize ChatterboxMultilingualTTS
    
    Client->>Server: POST /generate (after model ready)
    Server->>CB: create_voice_prompt(audio_path)
    CB->>CB: Load reference audio
    Server->>CB: generate(text, voice_prompt, language)
    CB->>CB: Run generation via model.generate()
    CB->>CB: trim_tts_output(audio)
    Server->>Client: Return generated audio

sequenceDiagram
    participant User as User
    participant UI as ModelManagement UI
    participant Server as Backend API
    participant HF as HuggingFace
    participant Store as Model Store

    User->>UI: Click on model card
    UI->>UI: Set selectedModel, open detail modal
    UI->>Server: GET hfModelInfo (if hf_repo_id exists)
    Server->>HF: Fetch model metadata
    HF->>Server: Return metadata
    Server->>UI: Return HuggingFaceModelInfo
    UI->>UI: Display license, downloads,<br/>tags, modified date

    alt Model Not Downloaded
        User->>UI: Click Download
        UI->>Server: POST /models/download
        Server->>Store: Background download via HF
        Server->>UI: Poll /models/status
        UI->>UI: Show progress bar
    else Model Downloaded
        User->>UI: Click Delete
        UI->>Server: POST /models/delete
        Server->>Store: Remove from cache
        UI->>UI: Update model status
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: LuxTTS integration — multi-engine TTS support #254: Multi-engine TTS support infrastructure; directly related as both implement engine registry, backend routing, frontend engine selection, and generation flow modifications that enable this chatterbox addition.
Added download cancel/clear UI, fixed model downloading #238: Model download/error handling and ModelManagement UI refactoring; related through shared changes to ActiveDownloadTask fields, error state management, and model card UI patterns.

Poem

🐰 Hoppy news, the chatterbox sings!
Hebrew whispers on multilingual wings,
Cache it, trim it, progress displays,
A fluffy new engine in all the right ways! ✨🎵

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/chatterbox

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- New ChatterboxTTSBackend wrapping ChatterboxMultilingualTTS (ResembleAI/chatterbox) - Supports 23 languages including Hebrew, forces CPU on macOS (MPS issue) - Monkey-patches torch.load for CPU loading, forces eager attention for compatibility - trim_tts_output utility cuts trailing silence/hallucination from Chatterbox output - Full engine integration: /generate, /generate/stream, model status/download/delete - Hebrew (he) added to supported languages in frontend and backend validation - Single flat model dropdown extended with Chatterbox option in both generation UIs - ModelManagement UI groups LuxTTS and Chatterbox under 'Other Voice Models' section

chatterbox-tts 0.1.6 pins numpy<1.26 and torch==2.6 which are incompatible with Python 3.12+. Install with --no-deps and list its sub-dependencies explicitly in requirements.txt. Also removes HFProgressTracker from chatterbox backend to avoid 'generator didn't stop after throw()' errors from tqdm patching.

- Add HFProgressTracker to LuxTTS and Chatterbox backends so tqdm-based file-level download progress reaches the frontend (previously only Qwen had this, LuxTTS/Chatterbox showed a static spinner) - Add progress/current/total/filename fields to ActiveDownloadTask so the /tasks/active polling endpoint carries progress data - Show inline progress bar + bytes in the model list and detail modal, poll at 1s during active downloads (5s otherwise) - Fix GpuAcceleration crash: cudaStatusLoading was referenced before initialization in its own useQuery declaration

… loaded models, fix generate box overlapping player on stories route

- Reflects merged PRs: #254 (LuxTTS/multi-engine), #257 (Chatterbox), #252 (CUDA swap), #238 (download UI) - Updated architecture diagram to show all 4 TTS engines - Added TTS engine comparison table and multi-engine architecture section - Marked resolved bottlenecks (singleton backend, frontend Qwen assumptions) - Updated PR triage: marked #194 and #33 as superseded - Added 'Adding a New Engine' guide (now ~1 day effort) - Updated recommended priorities to reflect current state - Added new API endpoints (CUDA, cancel, active tasks)

jamiepine added 4 commits March 13, 2026 02:09

fix: model loaded icon uses accent-colored CircleCheck, show size for…

c54ee14

… loaded models, fix generate box overlapping player on stories route

jamiepine changed the base branch from feat/luxtts to main March 13, 2026 09:10

jamiepine force-pushed the feat/chatterbox branch from bc7606d to c54ee14 Compare March 13, 2026 09:10

jamiepine merged commit 3e6513c into main Mar 13, 2026
1 check was pending

jamiepine mentioned this pull request Mar 13, 2026

feat: add Hebrew language support with Chatterbox TTS and ivrit-ai Whisper #194

Closed

7 tasks

This was referenced Mar 13, 2026

feat: paralinguistic tag autocomplete for Chatterbox Turbo #265

Merged

feat: chunked TTS generation for long text (engine-agnostic) #266

Merged

feat: async generation queue #269

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Chatterbox TTS engine with multilingual voice cloning#257

feat: Chatterbox TTS engine with multilingual voice cloning#257
jamiepine merged 4 commits intomainfrom
feat/chatterbox

jamiepine commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamiepine commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Chatterbox Engine

Download Progress Fix

GpuAcceleration Fix

Files Changed

New

Modified (Backend)

Modified (Frontend)

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading