feat: Chatterbox TTS engine with multilingual voice cloning#257
feat: Chatterbox TTS engine with multilingual voice cloning#257
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (19)
📝 WalkthroughWalkthroughThis PR introduces comprehensive support for a new multilingual TTS engine called Chatterbox, including a complete backend implementation, frontend UI integration, type system extensions, Hebrew language support, and enhanced model management with HuggingFace metadata and progress tracking. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as React Client
participant Server as FastAPI Server
participant HF as HuggingFace
participant Cache as Local Cache
participant CB as Chatterbox<br/>Backend
Client->>Server: POST /generate (engine: chatterbox)
Server->>Cache: Check if model cached
alt Model Not Cached
Server->>Server: Return 202 Accepted<br/>(downloading: true)
Server->>HF: Download model (background)
HF->>Cache: Store model files
Server->>Server: Signal download complete
end
Server->>CB: load_model()
CB->>Cache: Verify model files exist
CB->>CB: Initialize ChatterboxMultilingualTTS
Client->>Server: POST /generate (after model ready)
Server->>CB: create_voice_prompt(audio_path)
CB->>CB: Load reference audio
Server->>CB: generate(text, voice_prompt, language)
CB->>CB: Run generation via model.generate()
CB->>CB: trim_tts_output(audio)
Server->>Client: Return generated audio
sequenceDiagram
participant User as User
participant UI as ModelManagement UI
participant Server as Backend API
participant HF as HuggingFace
participant Store as Model Store
User->>UI: Click on model card
UI->>UI: Set selectedModel, open detail modal
UI->>Server: GET hfModelInfo (if hf_repo_id exists)
Server->>HF: Fetch model metadata
HF->>Server: Return metadata
Server->>UI: Return HuggingFaceModelInfo
UI->>UI: Display license, downloads,<br/>tags, modified date
alt Model Not Downloaded
User->>UI: Click Download
UI->>Server: POST /models/download
Server->>Store: Background download via HF
Server->>UI: Poll /models/status
UI->>UI: Show progress bar
else Model Downloaded
User->>UI: Click Delete
UI->>Server: POST /models/delete
Server->>Store: Remove from cache
UI->>UI: Update model status
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- New ChatterboxTTSBackend wrapping ChatterboxMultilingualTTS (ResembleAI/chatterbox) - Supports 23 languages including Hebrew, forces CPU on macOS (MPS issue) - Monkey-patches torch.load for CPU loading, forces eager attention for compatibility - trim_tts_output utility cuts trailing silence/hallucination from Chatterbox output - Full engine integration: /generate, /generate/stream, model status/download/delete - Hebrew (he) added to supported languages in frontend and backend validation - Single flat model dropdown extended with Chatterbox option in both generation UIs - ModelManagement UI groups LuxTTS and Chatterbox under 'Other Voice Models' section
chatterbox-tts 0.1.6 pins numpy<1.26 and torch==2.6 which are incompatible with Python 3.12+. Install with --no-deps and list its sub-dependencies explicitly in requirements.txt. Also removes HFProgressTracker from chatterbox backend to avoid 'generator didn't stop after throw()' errors from tqdm patching.
- Add HFProgressTracker to LuxTTS and Chatterbox backends so tqdm-based file-level download progress reaches the frontend (previously only Qwen had this, LuxTTS/Chatterbox showed a static spinner) - Add progress/current/total/filename fields to ActiveDownloadTask so the /tasks/active polling endpoint carries progress data - Show inline progress bar + bytes in the model list and detail modal, poll at 1s during active downloads (5s otherwise) - Fix GpuAcceleration crash: cudaStatusLoading was referenced before initialization in its own useQuery declaration
… loaded models, fix generate box overlapping player on stories route
bc7606d to
c54ee14
Compare
- Reflects merged PRs: #254 (LuxTTS/multi-engine), #257 (Chatterbox), #252 (CUDA swap), #238 (download UI) - Updated architecture diagram to show all 4 TTS engines - Added TTS engine comparison table and multi-engine architecture section - Marked resolved bottlenecks (singleton backend, frontend Qwen assumptions) - Updated PR triage: marked #194 and #33 as superseded - Added 'Adding a New Engine' guide (now ~1 day effort) - Updated recommended priorities to reflect current state - Added new API endpoints (CUDA, cancel, active tasks)
Summary
Adds Chatterbox TTS as the third voice engine in Voicebox, supporting 23 languages including Hebrew. Also fixes download progress tracking for all non-Qwen engines and the GpuAcceleration crash on the server settings screen.
Chatterbox Engine
ChatterboxTTSBackendwith CPU-forced loading on macOS (monkey-patchedtorch.loadfor CUDA-saved weights), eager attention fix for transformers >= 4.36, per-language generation defaults, andtrim_tts_output()post-processing to cut trailing silence/hallucinationChatterbox), Hebrew (he) language support, instruct/model_size suppressed for non-Qwen engines, model management under "Other Voice Models"chatterbox-ttsinstalled with--no-deps(it pinsnumpy<1.26,torch==2.6.0, etc. incompatible with our stack); sub-dependencies listed explicitly inrequirements.txtResembleAI/chatterbox(~3.2GB total:t3_mtl23ls_v2.safetensors2.14GB,s3gen.pt1.06GB,ve.pt5.7MB)Download Progress Fix
HFProgressTracker(tqdm interception) to both LuxTTS and Chatterbox backends — previously only Qwen piped file-level progress to the frontendActiveDownloadTaskwithprogress/current/total/filenamefields so the/tasks/activepolling endpoint carries progress dataGpuAcceleration Fix
ReferenceError: Cannot access uninitialized variablecrash —cudaStatusLoadingwas used inside its ownuseQuerydeclaration before being initializedFiles Changed
New
backend/backends/chatterbox_backend.py— Core Chatterbox backend (315 lines)backend/utils/audio.py—trim_tts_output()utilityModified (Backend)
backend/backends/__init__.py— Engine registrybackend/backends/luxtts_backend.py— Added HFProgressTrackerbackend/main.py— 5 endpoint dispatch points + progress in/tasks/activebackend/models.py— Engine/language regexes + ActiveDownloadTask fieldsbackend/requirements.txt— Chatterbox sub-dependenciesjustfile/Makefile—--no-deps chatterbox-ttsinstallModified (Frontend)
app/src/lib/api/types.ts— Engine union + ActiveDownloadTask fieldsapp/src/lib/constants/languages.ts— Hebrewapp/src/lib/hooks/useGenerationForm.ts— Chatterbox model mappingapp/src/components/Generation/FloatingGenerateBox.tsx— Dropdownapp/src/components/Generation/GenerationForm.tsx— Dropdown + descriptionapp/src/components/ServerSettings/ModelManagement.tsx— Inline progress UIapp/src/components/ServerSettings/GpuAcceleration.tsx— Crash fixTesting
Summary by CodeRabbit
New Features
Bug Fixes