feat: Chatterbox Turbo engine + per-engine language lists#258
feat: Chatterbox Turbo engine + per-engine language lists#258
Conversation
- New ChatterboxTurboTTSBackend wrapping ChatterboxTurboTTS (ResembleAI/chatterbox-turbo) - English-only 350M model with paralinguistic tag support ([laugh], [cough], [chuckle]) - Bypasses upstream token=True bug by calling snapshot_download(token=None) + from_local() - Same CPU-on-macOS forcing and torch.load monkey-patching as multilingual backend - Full engine integration: generate, stream, model status/download/delete endpoints - Language dropdown now shows only languages supported by the selected engine - Per-engine language maps: Qwen (10), LuxTTS (en), Chatterbox (23), Turbo (en) - Auto-switches to English when selecting English-only engines - Backend language regex expanded to accept all 23 Chatterbox languages
📝 WalkthroughWalkthroughA new TTS engine variant "chatterbox_turbo" is added across frontend and backend. Frontend language selection now derives engine-specific options via a helper. Backend gains a Chatterbox Turbo TTS backend with model caching, download/load management, and generation endpoints updated to handle the new engine and download status flows. Changes
Sequence DiagramsequenceDiagram
participant User
participant Frontend as FloatingGenerateBox / GenerationForm
participant Types as GenerationRequest / Types
participant API as Backend API (main.py)
participant Registry as Engine Registry (backends/__init__.py)
participant Backend as ChatterboxTurbo Backend
participant Cache as HF Hub Cache
participant Model as TTS Model
User->>Frontend: Select engine: chatterbox_turbo
Frontend->>Frontend: Derive language options for engine
User->>Frontend: Submit generation request
Frontend->>Types: Validate form values
Types->>API: POST /generate (engine=chatterbox_turbo)
API->>Registry: get_tts_backend_for_engine('chatterbox_turbo')
Registry->>Backend: Instantiate / request backend
Backend->>Cache: Check model cached
alt Model not cached
Backend->>API: Return 202 (download task)
API-->>User: 202 Download in progress
Backend->>Cache: snapshot_download model
end
Backend->>Model: Load model
Backend->>Backend: Create voice prompt (optional)
Backend->>Model: Generate audio (text + tags + seed)
Model-->>Backend: Audio buffer + sample rate
Backend-->>API: Return audio response
API-->>User: Audio file / stream
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Reflects merged PRs: #254 (LuxTTS/multi-engine), #257 (Chatterbox), #252 (CUDA swap), #238 (download UI) - Updated architecture diagram to show all 4 TTS engines - Added TTS engine comparison table and multi-engine architecture section - Marked resolved bottlenecks (singleton backend, frontend Qwen assumptions) - Updated PR triage: marked #194 and #33 as superseded - Added 'Adding a New Engine' guide (now ~1 day effort) - Updated recommended priorities to reflect current state - Added new API endpoints (CUDA, cancel, active tasks)
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
app/src/components/Generation/GenerationForm.tsx (1)
116-129:⚠️ Potential issue | 🟠 MajorReset
languagewhen the engine narrows its supported set.If a user selects a Chatterbox-only language like
arand then switches back to Qwen, this handler keeps the oldlanguageeven though the new option list no longer contains it. The select falls out of sync, and submit sends an invalid engine/language pair. Recompute the allowed languages after every engine change and coerce the current value when it is no longer supported. The same selector logic is duplicated inapp/src/components/Generation/FloatingGenerateBox.tsx.Also applies to: 156-181
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/Generation/GenerationForm.tsx` around lines 116 - 129, The onValueChange handler in GenerationForm.tsx sets engine/modelSize but leaves language unchanged, causing invalid engine/language pairs (e.g., switching from a Chatterbox-only language to Qwen); update the handler to recompute the allowed language list after any engine change and if the current form.getValues().language is not in that allowed set, coerce it with form.setValue('language', <firstAllowedLanguage>) or a safe default (e.g., 'en'); apply the same fix to the identical selector logic in FloatingGenerateBox.tsx so both handlers consistently reset language when the chosen engine cannot support the previously selected language.backend/main.py (1)
55-60:⚠️ Potential issue | 🟠 MajorUse
_create_background_task()for background downloads consistently.The Turbo and Chatterbox branches bypass the helper at lines 55-60 and drop the returned
Task, making these fire-and-forget paths less reliable and skipping the bookkeeping that other background-download branches (Qwen, LuxTTS, Whisper) already use. Lines 690 and 713 should call_create_background_task()to ensure proper task lifecycle management.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/main.py` around lines 55 - 60, The Turbo and Chatterbox code paths create fire-and-forget tasks without using the helper _create_background_task, skipping the shared bookkeeping; locate the places in those branches where asyncio.create_task(...) is invoked (the Turbo and Chatterbox background-download calls) and replace those direct create_task calls (and any discarded Task results) with calls to _create_background_task(coro) so the returned Task is tracked in _background_tasks and the add_done_callback behavior is preserved.
🧹 Nitpick comments (1)
backend/backends/chatterbox_turbo_backend.py (1)
285-303: The sample rate fallback to 24000 is correct; consider adding comments for generation parameters.The sample rate handling is accurate—ChatterboxTurboTTS outputs 24,000 Hz audio, making the fallback value of
24000in the code correct.The hardcoded generation parameters (
temperature=0.8,top_k=1000,top_p=0.95,repetition_penalty=1.2) are reasonable defaults for the English-only Chatterbox Turbo model. Consider adding a brief inline comment explaining why these specific values were chosen to improve maintainability.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/chatterbox_turbo_backend.py` around lines 285 - 303, The code correctly falls back to 24000Hz for sample_rate, but the hardcoded generation parameters lack context; add concise inline comments near the call to self.model.generate (and next to the temperature, top_k, top_p, repetition_penalty arguments) explaining why those defaults were chosen for the English-only Chatterbox Turbo model (e.g., balance of diversity vs stability, large top_k for sampling, slight repetition_penalty to reduce loops) and also add a short note by the sample_rate lookup (getattr(self.model, "sr"...) or sample_rate) clarifying that ChatterboxTurboTTS outputs 24kHz so the fallback is intentional.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/backends/chatterbox_turbo_backend.py`:
- Around line 117-152: tracker_context.__enter__() is called before the try
where exceptions could occur, so tracker_context.__exit__() may never run; move
the enter/exit into the same try/finally scope or move
tracker_context.__enter__() inside the try that later calls
snapshot_download/local_path and model initialization (e.g., around the block
that imports torch, calls snapshot_download, and constructs ChatterboxTurboTTS)
so that any early exception triggers the corresponding
tracker_context.__exit__(...) in a finally, ensuring tracker_context is always
cleaned up.
In `@backend/main.py`:
- Around line 715-722: The handler's broad "except Exception" is swallowing
HTTPException raised in generate_speech() (the 202 "downloading" branch), so add
an explicit "except HTTPException as e:" just before the generic exception
handler that re-raises/passes through the HTTPException (i.e., raise e) and also
invoke whatever cleanup function clears the in-progress generation task (e.g.,
clear_generation_task(model_name) or the handler's existing task-clearing logic)
so the download state is reset; apply the same change to the second occurrence
of the same pattern around the other Turbo cold-start block referenced in the
comment.
In `@backend/models.py`:
- Line 14: GenerationRequest.language currently uses a single static regex (old
11-code set) which rejects new chatterbox languages; replace the static pattern
with engine-aware validation: remove the global regex on
GenerationRequest.language and add a Pydantic validator (e.g., `@root_validator`
or `@validator` on the GenerationRequest model) that checks the provided language
against the allowed language set for the selected engine (use the same expanded
Chatterbox set used by VoiceProfileCreate or the central engine->languages
mapping). Apply the same engine-aware validation approach to the other affected
models/fields referenced around lines 52-60 so language acceptance is validated
per selected engine rather than against the outdated global regex.
---
Outside diff comments:
In `@app/src/components/Generation/GenerationForm.tsx`:
- Around line 116-129: The onValueChange handler in GenerationForm.tsx sets
engine/modelSize but leaves language unchanged, causing invalid engine/language
pairs (e.g., switching from a Chatterbox-only language to Qwen); update the
handler to recompute the allowed language list after any engine change and if
the current form.getValues().language is not in that allowed set, coerce it with
form.setValue('language', <firstAllowedLanguage>) or a safe default (e.g.,
'en'); apply the same fix to the identical selector logic in
FloatingGenerateBox.tsx so both handlers consistently reset language when the
chosen engine cannot support the previously selected language.
In `@backend/main.py`:
- Around line 55-60: The Turbo and Chatterbox code paths create fire-and-forget
tasks without using the helper _create_background_task, skipping the shared
bookkeeping; locate the places in those branches where asyncio.create_task(...)
is invoked (the Turbo and Chatterbox background-download calls) and replace
those direct create_task calls (and any discarded Task results) with calls to
_create_background_task(coro) so the returned Task is tracked in
_background_tasks and the add_done_callback behavior is preserved.
---
Nitpick comments:
In `@backend/backends/chatterbox_turbo_backend.py`:
- Around line 285-303: The code correctly falls back to 24000Hz for sample_rate,
but the hardcoded generation parameters lack context; add concise inline
comments near the call to self.model.generate (and next to the temperature,
top_k, top_p, repetition_penalty arguments) explaining why those defaults were
chosen for the English-only Chatterbox Turbo model (e.g., balance of diversity
vs stability, large top_k for sampling, slight repetition_penalty to reduce
loops) and also add a short note by the sample_rate lookup (getattr(self.model,
"sr"...) or sample_rate) clarifying that ChatterboxTurboTTS outputs 24kHz so the
fallback is intentional.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: de640645-d52b-480b-b3fa-0ce5df7f0bbe
📒 Files selected for processing (9)
app/src/components/Generation/FloatingGenerateBox.tsxapp/src/components/Generation/GenerationForm.tsxapp/src/lib/api/types.tsapp/src/lib/constants/languages.tsapp/src/lib/hooks/useGenerationForm.tsbackend/backends/__init__.pybackend/backends/chatterbox_turbo_backend.pybackend/main.pybackend/models.py
| tracker_context = tracker.patch_download() | ||
| tracker_context.__enter__() | ||
|
|
||
| if not is_cached: | ||
| task_manager.start_download(model_name) | ||
| progress_manager.update_progress( | ||
| model_name=model_name, | ||
| current=0, | ||
| total=0, | ||
| filename="Connecting to HuggingFace...", | ||
| status="downloading", | ||
| ) | ||
|
|
||
| try: | ||
| device = self._get_device() | ||
| self._device = device | ||
|
|
||
| logger.info(f"Loading Chatterbox Turbo TTS on {device}...") | ||
|
|
||
| import torch | ||
| from huggingface_hub import snapshot_download | ||
| from chatterbox.tts_turbo import ChatterboxTurboTTS | ||
|
|
||
| # Download model files ourselves so we can pass token=None | ||
| # (upstream from_pretrained passes token=True which requires | ||
| # a stored HF token even though the repo is public). | ||
| try: | ||
| local_path = snapshot_download( | ||
| repo_id=CHATTERBOX_TURBO_HF_REPO, | ||
| token=None, | ||
| allow_patterns=[ | ||
| "*.safetensors", "*.json", "*.txt", "*.pt", "*.model", | ||
| ], | ||
| ) | ||
| finally: | ||
| tracker_context.__exit__(None, None, None) |
There was a problem hiding this comment.
Context manager cleanup gap on early exceptions.
If an exception occurs between tracker_context.__enter__() (line 118) and entering the try block (line 130), the context manager's __exit__ won't be called. Consider wrapping more tightly:
🛠️ Suggested fix
- tracker_context = tracker.patch_download()
- tracker_context.__enter__()
-
- if not is_cached:
+ with tracker.patch_download():
+ if not is_cached:
+ task_manager.start_download(model_name)
+ progress_manager.update_progress(
+ model_name=model_name,
+ current=0,
+ total=0,
+ filename="Connecting to HuggingFace...",
+ status="downloading",
+ )
+
+ try:
+ device = self._get_device()
+ self._device = device
# ... rest of loading logic ...
- try:
- # ...
- try:
- local_path = snapshot_download(...)
- finally:
- tracker_context.__exit__(None, None, None)Alternatively, if the current structure is intentional (to exit the tracker before model instantiation), move __enter__ inside the outer try block to ensure cleanup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/backends/chatterbox_turbo_backend.py` around lines 117 - 152,
tracker_context.__enter__() is called before the try where exceptions could
occur, so tracker_context.__exit__() may never run; move the enter/exit into the
same try/finally scope or move tracker_context.__enter__() inside the try that
later calls snapshot_download/local_path and model initialization (e.g., around
the block that imports torch, calls snapshot_download, and constructs
ChatterboxTurboTTS) so that any early exception triggers the corresponding
tracker_context.__exit__(...) in a finally, ensuring tracker_context is always
cleaned up.
| raise HTTPException( | ||
| status_code=202, | ||
| detail={ | ||
| "message": "Chatterbox Turbo model is being downloaded. Please wait and try again.", | ||
| "model_name": model_name, | ||
| "downloading": True, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Let HTTPException escape generate_speech().
The Turbo cold-start branch now raises HTTPException(status_code=202, ...), but the blanket except Exception at the bottom of this handler catches it and turns it into a 500. First-time Turbo requests will therefore fail instead of returning “download in progress.” Add an except HTTPException pass-through before the generic handler and clear the generation task there as well.
Suggested fix
- except ValueError as e:
+ except HTTPException:
+ task_manager.complete_generation(generation_id)
+ raise
+ except ValueError as e:
task_manager.complete_generation(generation_id)
raise HTTPException(status_code=400, detail=str(e))Also applies to: 773-778
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/main.py` around lines 715 - 722, The handler's broad "except
Exception" is swallowing HTTPException raised in generate_speech() (the 202
"downloading" branch), so add an explicit "except HTTPException as e:" just
before the generic exception handler that re-raises/passes through the
HTTPException (i.e., raise e) and also invoke whatever cleanup function clears
the in-progress generation task (e.g., clear_generation_task(model_name) or the
handler's existing task-clearing logic) so the download state is reset; apply
the same change to the second occurrence of the same pattern around the other
Turbo cold-start block referenced in the comment.
| name: str = Field(..., min_length=1, max_length=100) | ||
| description: Optional[str] = Field(None, max_length=500) | ||
| language: str = Field(default="en", pattern="^(zh|en|ja|ko|de|fr|ru|pt|es|it|he)$") | ||
| language: str = Field(default="en", pattern="^(zh|en|ja|ko|de|fr|ru|pt|es|it|he|ar|da|el|fi|hi|ms|nl|no|pl|sv|sw|tr)$") |
There was a problem hiding this comment.
Keep generation-language validation in sync with the new engine lists.
VoiceProfileCreate.language now accepts the expanded Chatterbox set, but GenerationRequest.language still only allows the old 11-code regex. The frontend now exposes ar, da, fi, sv, tr, etc. for chatterbox, so /generate and /generate/stream will reject those requests with a 422 before they ever reach the backend. Please validate language against the selected engine here instead of a single global regex.
Also applies to: 52-60
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/models.py` at line 14, GenerationRequest.language currently uses a
single static regex (old 11-code set) which rejects new chatterbox languages;
replace the static pattern with engine-aware validation: remove the global regex
on GenerationRequest.language and add a Pydantic validator (e.g.,
`@root_validator` or `@validator` on the GenerationRequest model) that checks the
provided language against the allowed language set for the selected engine (use
the same expanded Chatterbox set used by VoiceProfileCreate or the central
engine->languages mapping). Apply the same engine-aware validation approach to
the other affected models/fields referenced around lines 52-60 so language
acceptance is validated per selected engine rather than against the outdated
global regex.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/plans/PROJECT_STATUS.md`:
- Line 205: Two table notes are sentence fragments—replace the fragment "May be
closeable." in the row referencing PR `#194` (and PR `#257`) and the fragment "Can
likely be closed." elsewhere with complete sentences that include an explicit
subject; e.g., change to "This PR may be closed because it was superseded by PR
`#257`." and "This PR can likely be closed." Update the table cells containing the
exact phrases "May be closeable." and "Can likely be closed." so they read as
full sentences.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3d2ee332-ca79-41ff-a0b8-1cc4da2e7ef9
📒 Files selected for processing (1)
docs/plans/PROJECT_STATUS.md
| | **#123** | added docker | Low | Minimal Docker PR. Overlaps with #161 and #124. | | ||
| | **#227** | fix: harden input validation & file safety | Medium | Follow-up to #225. Atomic writes, threading locks, input validation. Good hardening but coupled to the custom models feature. | | ||
| | **#225** | feat: custom HuggingFace model support | High | Arbitrary HF repo loading. May need rework given multi-engine arch is now shipped. | | ||
| | **#194** | feat: Hebrew + Chatterbox TTS | High | **Superseded** by PR #257 which shipped Chatterbox multilingual (23 langs incl. Hebrew). May be closeable. | |
There was a problem hiding this comment.
Use complete sentences in PR notes for clarity.
Line 205 (“May be closeable.”) and Line 224 (“Can likely be closed.”) read as sentence fragments in table notes. Please rewrite with an explicit subject to improve readability and avoid style-lint noise.
✍️ Suggested wording
-| **#194** | feat: Hebrew + Chatterbox TTS | High | **Superseded** by PR `#257` which shipped Chatterbox multilingual (23 langs incl. Hebrew). May be closeable. |
+| **#194** | feat: Hebrew + Chatterbox TTS | High | **Superseded** by PR `#257` which shipped Chatterbox multilingual (23 langs incl. Hebrew). This PR is likely closeable. |
...
-| **#194** (Hebrew + Chatterbox) | PR `#257` (merged) | `#257` ships Chatterbox multilingual with 23 languages including Hebrew. `#194` took a different approach (route by language). Can likely be closed. |
+| **#194** (Hebrew + Chatterbox) | PR `#257` (merged) | `#257` ships Chatterbox multilingual with 23 languages including Hebrew. `#194` took a different approach (route by language), and this PR can likely be closed. |Also applies to: 224-224
🧰 Tools
🪛 LanguageTool
[style] ~205-~205: To form a complete sentence, be sure to include a subject.
Context: ...x multilingual (23 langs incl. Hebrew). May be closeable. | | #195 | feat: per-...
(MISSING_IT_THERE)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/plans/PROJECT_STATUS.md` at line 205, Two table notes are sentence
fragments—replace the fragment "May be closeable." in the row referencing PR
`#194` (and PR `#257`) and the fragment "Can likely be closed." elsewhere with
complete sentences that include an explicit subject; e.g., change to "This PR
may be closed because it was superseded by PR `#257`." and "This PR can likely be
closed." Update the table cells containing the exact phrases "May be closeable."
and "Can likely be closed." so they read as full sentences.
Summary
[laugh],[cough],[chuckle], etc.), added as a standalone model in the flat dropdownDetails
Chatterbox Turbo backend
ChatterboxTurboTTSBackendwrappingChatterboxTurboTTSfromchatterbox-ttspackageResembleAI/chatterbox-turbo(separate from multilingualResembleAI/chatterbox)token=Truebug by callingsnapshot_download(token=None)ourselves thenfrom_local()torch.loadmonkey-patching as the multilingual backend/generate,/generate/stream, model status/download/deletePer-engine language filtering
languages.tsrewritten withENGINE_LANGUAGESmap andgetLanguageOptionsForEngine()helperFloatingGenerateBox,GenerationForm) now filter languages dynamicallyFiles changed
backend/backends/chatterbox_turbo_backend.py— new (307 lines)backend/backends/__init__.py— engine registrybackend/main.py— 5 endpoint dispatch pointsbackend/models.py— engine + language regexapp/src/lib/constants/languages.ts— per-engine language mapsapp/src/lib/api/types.ts— engine union typeapp/src/lib/hooks/useGenerationForm.ts— schema + model mappingapp/src/components/Generation/FloatingGenerateBox.tsx— dropdown + dynamic languagesapp/src/components/Generation/GenerationForm.tsx— dropdown + dynamic languagesSummary by CodeRabbit
New Features
Documentation