feat: LuxTTS integration — multi-engine TTS support by jamiepine · Pull Request #254 · jamiepine/voicebox

jamiepine · 2026-03-12T16:52:57Z

Summary

Adds LuxTTS (ZipVoice) as a second TTS engine alongside Qwen TTS, introducing multi-engine support to Voicebox
Users can select between Qwen TTS (multi-language, two model sizes) and LuxTTS (fast, English-focused, 48kHz, ~1GB VRAM) via a new engine dropdown in the generation form
Fully backward-compatible — engine defaults to "qwen" so existing workflows are unchanged

Changes

Backend

backend/backends/luxtts_backend.py — New LuxTTSBackend wrapping zipvoice.luxvoice.LuxTTS (encode_prompt, generate_speech, model caching, device auto-detection)
backend/backends/__init__.py — Multi-engine registry with get_tts_backend_for_engine(engine) replacing the singleton pattern; TTS_ENGINES dict for supported engines
backend/models.py — engine field on GenerationRequest (optional, default "qwen", validated ^(qwen|luxtts)$)
backend/main.py — Engine dispatch in /generate and /generate/stream endpoints; LuxTTS added to model status, download trigger, and delete maps
backend/profiles.py — create_voice_prompt_for_profile accepts engine param, uses engine-specific backend
backend/requirements.txt — Added Zipvoice (git install from LuxTTS repo), onnxruntime, piper-phonemize, lhotse, pydub, inflect; pinned transformers<=4.57.6

Frontend

GenerationForm.tsx — New TTS Engine selector dropdown; Model Size and Delivery Instructions fields hidden when LuxTTS is selected (not applicable)
useGenerationForm.ts — engine added to Zod schema with default 'qwen'; engine-aware model name resolution and API payload construction
types.ts — engine and instruct fields added to GenerationRequest TypeScript type
ModelManagement.tsx — New "LuxTTS Models" section (conditionally rendered when LuxTTS models exist in status)

Key design decisions

Engine-prefixed cache keys (luxtts_ prefix) prevent voice prompt collisions between Qwen and LuxTTS
LuxTTS has one model size — no size selector needed; the model auto-downloads from HuggingFace (YatharthS/LuxTTS) on first use
LuxTTS ignores instruct — delivery instructions are a Qwen-only feature, hidden in UI when LuxTTS is selected
48kHz output — LuxTTS generates at 48kHz natively (vs Qwen's 24kHz)

Depends on

PR feat: CUDA backend swap via binary download and restart #252 (feat/cuda-backend-swap)

Summary by CodeRabbit

New Features
- Added LuxTTS voice-cloning option alongside Qwen TTS; unified model selector and engine-aware UI (hides incompatible controls).
- Engine-aware generation and streaming with background model download/management and progress handling.
Documentation
- Switched Quick Start/Development to Just-based workflow; added Project Structure and expanded Performance platform details.
Chores
- Added Just-based dev automation and updated dependencies to support the new TTS integration.

coderabbitai · 2026-03-12T16:53:07Z

📝 Walkthrough

Walkthrough

Adds multi-engine TTS support (LuxTTS + Qwen): new LuxTTS backend, engine-dispatching in backend, engine-aware API and frontend changes, form/types updates, dependency additions, and migration of developer docs/workflow from Makefile to a Just-based setup.

Changes

Cohort / File(s)	Summary
Documentation & Project Setup `CONTRIBUTING.md`, `README.md`, `justfile`	Replaced primary developer workflow recommendations with a Just-based workflow; added a comprehensive Justfile with setup/dev/build/lint/db utilities; updated docs, prerequisites, and project structure guidance.
Dev Utilities `scripts/setup-dev-sidecar.js`	Removed verbose user-facing logs and simplified placeholder binary creation flow; formatting and import reordering only.
Frontend — Generation UI `app/src/components/Generation/FloatingGenerateBox.tsx`, `app/src/components/Generation/GenerationForm.tsx`	Unified model selection into a single Select combining engine+modelSize (options: `qwen:1.7B`, `qwen:0.6B`, `luxtts`); conditionally hide auxiliary controls (including Delivery Instructions) for LuxTTS; adapt selection logic to set engine and modelSize appropriately.
Frontend — Model Management `app/src/components/ServerSettings/ModelManagement.tsx`	Added a LuxTTS models section that lists models prefixed with `luxtts` and exposes download/delete actions analogous to existing model groups.
Frontend Types & Hooks `app/src/lib/api/types.ts`, `app/src/lib/hooks/useGenerationForm.ts`	Added optional `engine` and `instruct` to request types; form schema now includes `engine` (default `qwen`); submission logic and model naming adjusted per engine; reset preserves engine-related fields.
Backend — Engine Dispatcher & Caching `backend/backends/__init__.py`	Introduced per-engine backend registry, thread-safe caching, `TTS_ENGINES` map, and `get_tts_backend_for_engine(engine)` to resolve/memoize backends.
Backend — LuxTTS Implementation `backend/backends/luxtts_backend.py`	New `LuxTTSBackend` class: device selection, lazy load/unload, model caching and download progress handling, voice-prompt encoding, combining multi-samples, and text→speech generation with seed/instruct support.
Backend — API, Profiles & Flow `backend/main.py`, `backend/profiles.py`, `backend/models.py`	GenerationRequest gains `engine` (default `qwen`) with validation; `create_voice_prompt_for_profile` signature now accepts `engine` and `use_cache`; generation endpoints and model management updated to load/unload/download per selected engine and to propagate engine/use_cache through flows.
Backend — Dependencies `backend/requirements.txt`	Bound `transformers` version range; added git-based dependencies for LinaCodec and LuxTTS (Zipvoice); added a `--find-links` entry for piper_phonemize and explanatory comments for LuxTTS integration.

Sequence Diagram

sequenceDiagram
    participant User
    participant Frontend as Frontend UI
    participant API as Backend API / Dispatcher
    participant Qwen as Qwen Backend
    participant Lux as LuxTTS Backend
    participant Cache as Model Cache

    User->>Frontend: Select engine (qwen | luxtts) and submit
    Frontend->>API: POST generate_speech(text, engine)
    API->>API: get_tts_backend_for_engine(engine)

    alt engine == "qwen"
        API->>Qwen: load_model()
        Qwen->>Cache: check/download model
        Cache-->>Qwen: model ready
        API->>Qwen: create_voice_prompt(profile, use_cache)
        Qwen-->>API: voice_prompt
        API->>Qwen: generate(text, voice_prompt)
        Qwen-->>API: audio
    else engine == "luxtts"
        API->>Lux: load_model()
        Lux->>Cache: check/download model
        Cache-->>Lux: model ready
        API->>Lux: create_voice_prompt(profile, use_cache)
        Lux-->>API: voice_prompt
        API->>Lux: generate(text, voice_prompt, instruct)
        Lux-->>API: audio
    end

    API-->>Frontend: stream audio
    Frontend->>User: play audio

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰✨ Hopped in quick with code to bless,

Qwen and Lux now sing, no less.
Backends lined up, prompts combined,
Forms that choose which voice you'll find,
Justfile set — devs ready to press.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately describes the main change: adding LuxTTS as a second TTS engine alongside Qwen TTS to enable multi-engine TTS support.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/luxtts

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Introduce LuxTTS (ZipVoice) alongside Qwen TTS, enabling users to choose between engines at generation time. LuxTTS offers fast, English-focused voice cloning at 48kHz with ~1GB VRAM. Backend: - Add LuxTTSBackend with encode_prompt/generate_speech integration - Multi-engine registry (get_tts_backend_for_engine) replacing singleton - Engine-prefixed voice prompt cache keys to avoid collisions - Engine field on GenerationRequest (default 'qwen' for backward compat) - Engine dispatch in /generate and /generate/stream endpoints - LuxTTS in model status, download, and delete maps Frontend: - TTS Engine selector dropdown in GenerationForm (Qwen TTS / LuxTTS) - Conditionally hide Model Size and Delivery Instructions for LuxTTS - Engine field added to TypeScript types and Zod schema - LuxTTS section in Model Management page

Adds 'just' as the recommended dev tool: 'just setup' for one-time install, 'just dev' to run backend + frontend in one terminal. Updates CONTRIBUTING.md to document just as the primary setup method.

piper-phonemize has no PyPI wheels — needs custom find-links URL from k2-fsa.github.io. Removed redundant transitive deps that Zipvoice already declares.

- Combine engine + model size into one flat dropdown (Qwen3-TTS 1.7B, Qwen3-TTS 0.6B, LuxTTS) in both FloatingGenerateBox and GenerationForm - Add linacodec git dep to requirements.txt (uv-only source, pip can't resolve it from Zipvoice's pyproject.toml) - Remove redundant transitive deps from requirements.txt - Quiet the sidecar setup script (was printing misleading instructions)

- Fix silent Zod validation failure when LuxTTS selected (modelSize was set to 'default' which failed enum validation, preventing form submit) - Preserve engine, model size, and language after successful generation instead of resetting to defaults

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (5)

justfile (1)

62-63: Consider a readiness check instead of fixed sleep.

The sleep 2 is a timing assumption that may be insufficient if the backend takes longer to initialize (e.g., first run with model downloads). A health check loop would be more robust:

♻️ Optional: Replace sleep with health check

     echo "Starting backend on http://localhost:17493 ..."
     {{ venv_bin }}/uvicorn backend.main:app --reload --port 17493 &
-    sleep 2
+    # Wait for backend to be ready (up to 30s)
+    for i in {1..30}; do
+        curl -sf http://localhost:17493/health >/dev/null && break
+        sleep 1
+    done

     echo "Starting Tauri desktop app..."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@justfile` around lines 62 - 63, The fixed 2-second sleep after starting the
server (the line running "{{ venv_bin }}/uvicorn backend.main:app --reload
--port 17493 &" followed by "sleep 2") is brittle; replace it with a
readiness/health-check loop that polls a known endpoint (e.g., /health or /docs)
on localhost:17493 until it returns a successful status or a timeout is reached,
retrying with short sleeps between attempts; ensure the loop runs after
launching uvicorn in the background and fails the script with a clear message if
the timeout is exceeded.

backend/backends/luxtts_backend.py (1)

174-192: Minor: cache_key computed twice.

The cache_key is computed on line 176 for the cache lookup, then again on line 191 for caching. Consider reusing the variable.

♻️ Proposed fix

     async def create_voice_prompt(
         self,
         audio_path: str,
         reference_text: str,
         use_cache: bool = True,
     ) -> Tuple[dict, bool]:
         # ...
         await self.load_model()
 
+        cache_key = "luxtts_" + get_cache_key(audio_path, reference_text) if use_cache else None
+
         if use_cache:
-            cache_key = "luxtts_" + get_cache_key(audio_path, reference_text)
             cached = get_cached_voice_prompt(cache_key)
             if cached is not None and isinstance(cached, dict):
                 return cached, True
 
         # ... encode ...
 
         if use_cache:
-            cache_key = "luxtts_" + get_cache_key(audio_path, reference_text)
             cache_voice_prompt(cache_key, encoded)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/luxtts_backend.py` around lines 174 - 192, The cache_key is
computed twice; compute it once before the cache lookup and reuse it for both
get_cached_voice_prompt and cache_voice_prompt when use_cache is true: move the
call to get_cache_key(audio_path, reference_text) into a single cache_key
variable (prefixed with "luxtts_") before calling get_cached_voice_prompt, keep
the same cache_key to decide early return, then after awaiting
asyncio.to_thread(_encode_sync) call cache_voice_prompt(cache_key, encoded);
adjust the block that defines _encode_sync and references to use_cache,
cache_key, get_cached_voice_prompt, cache_voice_prompt, and
self.model.encode_prompt accordingly.

app/src/components/Generation/FloatingGenerateBox.tsx (1)

405-439: Consider wrapping in FormField for consistency.

The model/engine selector uses form.watch() and form.setValue() directly rather than FormField with render prop like the language selector (lines 381-403). While this works, it's inconsistent with the rest of the form and won't display validation errors via FormMessage.

Since this is a composite field controlling two form values (engine and modelSize), the current approach is pragmatic. No immediate fix needed, but worth noting for future refactors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/Generation/FloatingGenerateBox.tsx` around lines 405 -
439, The Select block directly uses form.watch('engine') and form.setValue(...)
to manage engine and modelSize, which is inconsistent and bypasses validation
UI; wrap this composite selector inside a FormField (like the language selector)
by creating a FormField for a virtual/compound field that renders the Select via
its render prop, and inside that render use field.onChange/field.value or
continue calling form.setValue but expose FormMessage for validation; reference
the Select component and the form fields 'engine' and 'modelSize' and ensure the
Select's onValueChange still sets form.setValue('engine', ...) and
form.setValue('modelSize', ...) while the surrounding FormField provides
FormItem/FormControl/FormMessage for consistency.

backend/main.py (2)

618-634: Fire-and-forget task references may cause issues.

The background download tasks created on lines 625 and 649 are not stored. While unlikely in practice, these tasks could theoretically be garbage collected before completion.

♻️ Proposed fix - store task references

+# Module-level set to keep background tasks alive
+_background_tasks: set = set()
+
 # In generate_speech:
-                asyncio.create_task(download_model_background())
+                task = asyncio.create_task(download_model_background())
+                _background_tasks.add(task)
+                task.add_done_callback(_background_tasks.discard)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` around lines 618 - 634, The fire-and-forget
asyncio.create_task call for download_model_background risks GC before
completion; capture and retain the Task (e.g., assign the result of
asyncio.create_task(...) to a variable and add it to a long-lived collection or
let task_manager track it) so the background task is referenced until finished,
and ensure download_model_background handles exceptions and removes the task
from the tracking collection on completion; update locations referencing
download_model_background, tts_model.load_model_async,
task_manager.start_download, and the create_task invocation to implement this
task-tracking approach.

1360-1367: check_luxtts_loaded may instantiate backend unnecessarily.

Calling get_tts_backend_for_engine("luxtts") to check if it's loaded will create the LuxTTSBackend instance if it doesn't exist yet. This is fine since LuxTTSBackend.__init__() doesn't load the model, but it's worth noting that this differs from the Qwen check which uses the existing tts.get_tts_model() singleton.

Consider checking if the backend exists in the registry first:

♻️ Alternative approach

     def check_luxtts_loaded():
         try:
-            from .backends import get_tts_backend_for_engine
-            backend = get_tts_backend_for_engine("luxtts")
-            return backend.is_loaded()
+            from .backends import _tts_backends
+            backend = _tts_backends.get("luxtts")
+            return backend.is_loaded() if backend else False
         except Exception:
             return False

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` around lines 1360 - 1367, The current check_luxtts_loaded
calls get_tts_backend_for_engine("luxtts") which may instantiate a
LuxTTSBackend; change it to first inspect the TTS backend registry for an
existing "luxtts" entry and only call get_tts_backend_for_engine if an instance
is already registered. Concretely: import the backends module used by
get_tts_backend_for_engine (from .backends import get_tts_backend_for_engine,
<registry_name>), check the registry container for the key "luxtts" (or the
registry API that lists available/registered backends) and if present call
backend = get_tts_backend_for_engine("luxtts") and return backend.is_loaded(),
otherwise return False; update check_luxtts_loaded to use that registry check
instead of unconditionally calling get_tts_backend_for_engine.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/ServerSettings/ModelManagement.tsx`:
- Around line 306-320: The LuxTTS block renders <ModelItem> without required
props from <ModelItemProps>, causing type errors and missing cancel/dismiss
behavior; update the <ModelItem> invocation to pass the missing props: provide
onCancel (call the same cancel handler used elsewhere, e.g., the function used
for downloads/cancels), pass isCancelling (compare a cancellingModel state to
model.model_name) and isDismissed (use the dismissal state/lookup used for other
models), while keeping existing props like model, onDownload (handleDownload),
onDelete (setModelToDelete + setDeleteDialogOpen), isDownloading
(downloadingModel === model.model_name) and formatSize so LuxTTS items support
cancel/dismiss handling and satisfy the type checker.

In `@backend/backends/__init__.py`:
- Around line 145-165: The backend lookup/creation for _tts_backends using the
check-then-act pattern is prone to race conditions; wrap the creation/store
sequence in a module-level threading.Lock (e.g., _tts_backends_lock) so only one
thread can create and assign a backend for a given engine; acquire the lock
before re-checking "if engine in _tts_backends", instantiate the correct class
(MLXTTSBackend via get_backend_type(), PyTorchTTSBackend, or LuxTTSBackend),
store it into _tts_backends[engine], then release the lock to ensure a single
instance is created and avoid duplicate model loads.

In `@backend/backends/luxtts_backend.py`:
- Around line 206-217: In the loop that builds combined_audio, the sample rate
return from load_audio is assigned to sr but never used; change the unpacking to
use a throwaway variable (e.g., audio, _ = load_audio(path, sample_rate=24000))
so the unused value is clearly ignored; update the line inside the function that
calls load_audio (referenced in the code block using load_audio,
normalize_audio, combined_audio, mixed, combined_text) to prefix the unused
variable with an underscore.
- Around line 259-262: The returned tensor from self.model.generate_speech
(variable wav) may reside on GPU/MPS; change the conversion to ensure the tensor
is moved to CPU and detached before calling .numpy() (e.g., call .cpu() and
.detach() on wav) so audio = ... then .squeeze() works regardless of device;
update the conversion around the lines that convert wav to numpy in the function
that calls self.model.generate_speech to use wav.detach().cpu().numpy() (or
equivalent) instead of wav.numpy().

In `@backend/requirements.txt`:
- Around line 21-22: The requirements file currently references linacodec and
Zipvoice from git HEAD which is unstable; pin each git dependency to a specific
commit hash in backend/requirements.txt (replace the current git URLs for
"linacodec" and "Zipvoice" with the same repo URLs annotated with the chosen
commit hashes, e.g. include @<commit-hash> after the repo URL) so installations
are reproducible; after pinning, verify backend/backends/luxtts_backend.py (look
for LuxTTS constructor usage and generate_speech() calls) still match the pinned
commit API and update those callsites if the pinned version has different
parameters.

In `@scripts/setup-dev-sidecar.js`:
- Around line 214-236: The COFF Machine field and Optional Header Magic in the
byte array are hardcoded for AMD64/PE32+ which will produce invalid binaries for
32-bit Windows targets; update scripts/setup-dev-sidecar.js to detect the target
triple (e.g., check for "i686-pc-windows-msvc") and conditionally set the
Machine bytes (use 0x14,0x01 for IMAGE_FILE_MACHINE_I386) and the Optional
Header Magic (use 0x0b,0x01 for PE32) instead of the current 0x64,0x86 and
0x0b,0x02 values, or alternately generate matching PE headers for the detected
target so the produced binary format matches the target architecture.

---

Nitpick comments:
In `@app/src/components/Generation/FloatingGenerateBox.tsx`:
- Around line 405-439: The Select block directly uses form.watch('engine') and
form.setValue(...) to manage engine and modelSize, which is inconsistent and
bypasses validation UI; wrap this composite selector inside a FormField (like
the language selector) by creating a FormField for a virtual/compound field that
renders the Select via its render prop, and inside that render use
field.onChange/field.value or continue calling form.setValue but expose
FormMessage for validation; reference the Select component and the form fields
'engine' and 'modelSize' and ensure the Select's onValueChange still sets
form.setValue('engine', ...) and form.setValue('modelSize', ...) while the
surrounding FormField provides FormItem/FormControl/FormMessage for consistency.

In `@backend/backends/luxtts_backend.py`:
- Around line 174-192: The cache_key is computed twice; compute it once before
the cache lookup and reuse it for both get_cached_voice_prompt and
cache_voice_prompt when use_cache is true: move the call to
get_cache_key(audio_path, reference_text) into a single cache_key variable
(prefixed with "luxtts_") before calling get_cached_voice_prompt, keep the same
cache_key to decide early return, then after awaiting
asyncio.to_thread(_encode_sync) call cache_voice_prompt(cache_key, encoded);
adjust the block that defines _encode_sync and references to use_cache,
cache_key, get_cached_voice_prompt, cache_voice_prompt, and
self.model.encode_prompt accordingly.

In `@backend/main.py`:
- Around line 618-634: The fire-and-forget asyncio.create_task call for
download_model_background risks GC before completion; capture and retain the
Task (e.g., assign the result of asyncio.create_task(...) to a variable and add
it to a long-lived collection or let task_manager track it) so the background
task is referenced until finished, and ensure download_model_background handles
exceptions and removes the task from the tracking collection on completion;
update locations referencing download_model_background,
tts_model.load_model_async, task_manager.start_download, and the create_task
invocation to implement this task-tracking approach.
- Around line 1360-1367: The current check_luxtts_loaded calls
get_tts_backend_for_engine("luxtts") which may instantiate a LuxTTSBackend;
change it to first inspect the TTS backend registry for an existing "luxtts"
entry and only call get_tts_backend_for_engine if an instance is already
registered. Concretely: import the backends module used by
get_tts_backend_for_engine (from .backends import get_tts_backend_for_engine,
<registry_name>), check the registry container for the key "luxtts" (or the
registry API that lists available/registered backends) and if present call
backend = get_tts_backend_for_engine("luxtts") and return backend.is_loaded(),
otherwise return False; update check_luxtts_loaded to use that registry check
instead of unconditionally calling get_tts_backend_for_engine.

In `@justfile`:
- Around line 62-63: The fixed 2-second sleep after starting the server (the
line running "{{ venv_bin }}/uvicorn backend.main:app --reload --port 17493 &"
followed by "sleep 2") is brittle; replace it with a readiness/health-check loop
that polls a known endpoint (e.g., /health or /docs) on localhost:17493 until it
returns a successful status or a timeout is reached, retrying with short sleeps
between attempts; ensure the loop runs after launching uvicorn in the background
and fails the script with a clear message if the timeout is exceeded.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 42548d0a-ba2d-4505-a3f0-e926aa815e54

📥 Commits

Reviewing files that changed from the base of the PR and between 6359dee and 1e5afc2.

📒 Files selected for processing (15)

CONTRIBUTING.md
README.md
app/src/components/Generation/FloatingGenerateBox.tsx
app/src/components/Generation/GenerationForm.tsx
app/src/components/ServerSettings/ModelManagement.tsx
app/src/lib/api/types.ts
app/src/lib/hooks/useGenerationForm.ts
backend/backends/__init__.py
backend/backends/luxtts_backend.py
backend/main.py
backend/models.py
backend/profiles.py
backend/requirements.txt
justfile
scripts/setup-dev-sidecar.js

coderabbitai · 2026-03-13T07:31:30Z

app/src/components/ServerSettings/ModelManagement.tsx

+                      <ModelItem
+                        key={model.model_name}
+                        model={model}
+                        onDownload={() => handleDownload(model.model_name)}
+                        onDelete={() => {
+                          setModelToDelete({
+                            name: model.model_name,
+                            displayName: model.display_name,
+                            sizeMb: model.size_mb,
+                          });
+                          setDeleteDialogOpen(true);
+                        }}
+                        isDownloading={downloadingModel === model.model_name}
+                        formatSize={formatSize}
+                      />


⚠️ Potential issue | 🔴 Critical

Pass required ModelItem props in the LuxTTS block (type-check blocker).

At Line 306, ModelItem is rendered without onCancel, isCancelling, and isDismissed, but those props are required by ModelItemProps (Lines 478-481). This is a compile-time correctness issue and also drops cancel/error handling behavior for LuxTTS entries.

Suggested fix

<ModelItem key={model.model_name} model={model} onDownload={() => handleDownload(model.model_name)} onDelete={() => { setModelToDelete({ name: model.model_name, displayName: model.display_name, sizeMb: model.size_mb, }); setDeleteDialogOpen(true); }} + onCancel={() => handleCancel(model.model_name)} isDownloading={downloadingModel === model.model_name} + isCancelling={ + cancelMutation.isPending && cancelMutation.variables === model.model_name + } + isDismissed={dismissedErrors.has(model.model_name)} + erroredDownload={erroredDownloads.get(model.model_name)} formatSize={formatSize} />

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 306 - 320, The LuxTTS block renders <ModelItem> without required props from <ModelItemProps>, causing type errors and missing cancel/dismiss behavior; update the <ModelItem> invocation to pass the missing props: provide onCancel (call the same cancel handler used elsewhere, e.g., the function used for downloads/cancels), pass isCancelling (compare a cancellingModel state to model.model_name) and isDismissed (use the dismissal state/lookup used for other models), while keeping existing props like model, onDownload (handleDownload), onDelete (setModelToDelete + setDeleteDialogOpen), isDownloading (downloadingModel === model.model_name) and formatSize so LuxTTS items support cancel/dismiss handling and satisfy the type checker.

backend/backends/__init__.py

backend/backends/luxtts_backend.py

coderabbitai · 2026-03-13T07:31:30Z

backend/requirements.txt

+linacodec @ git+https://github.com/ysharma3501/LinaCodec.git
+Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Get latest commit hashes for pinning echo "LinaCodec latest commit:" curl -s https://api.github.com/repos/ysharma3501/LinaCodec/commits/main | jq -r '.sha' echo "LuxTTS latest commit:" curl -s https://api.github.com/repos/ysharma3501/LuxTTS/commits/main | jq -r '.sha'

Repository: jamiepine/voicebox

Length of output: 118

🏁 Script executed:

#!/bin/bash # Check if repositories exist and their default branches echo "=== LinaCodec Repository Info ===" curl -s -I https://github.com/ysharma3501/LinaCodec echo "" echo "=== LuxTTS Repository Info ===" curl -s -I https://github.com/ysharma3501/LuxTTS echo "" # Try with master branch instead of main echo "=== Trying master branch for LinaCodec ===" curl -s https://api.github.com/repos/ysharma3501/LinaCodec/commits/master | jq -r '.sha' 2>/dev/null || echo "master not found" echo "" echo "=== Trying master branch for LuxTTS ===" curl -s https://api.github.com/repos/ysharma3501/LuxTTS/commits/master | jq -r '.sha' 2>/dev/null || echo "master not found"

Repository: jamiepine/voicebox

Length of output: 9827

🌐 Web query:

ysharma3501 LinaCodec GitHub repository

💡 Result:

GitHub repository: ysharma3501/LinaCodec — “A highly compressive and high-quality neural audio codec for speech models.” It provides a Python package (linacodec) with example usage for encoding audio into tokens + a global embedding and decoding back to 48 kHz audio, plus helpers like voice conversion. [1]

Install (from README):

pip install git+https://github.com/ysharma3501/LinaCodec.git

[1]

Associated model card (points back to the GitHub repo): YatharthS/LinaCodec on Hugging Face. [2]

Sources: [1] [2]

🌐 Web query:

ysharma3501 LuxTTS GitHub repository

💡 Result:

The GitHub repository is ysharma3501/LuxTTS: https://github.com/ysharma3501/LuxTTS (LuxTTS is a lightweight ZipVoice-based TTS model for high-quality voice cloning, advertised at 150× realtime). [1]

Related model page (points back to the same repo): https://huggingface.co/YatharthS/LuxTTS. [2]

Pin git dependencies to specific commits for reproducibility and stability.

The linacodec and Zipvoice packages are installed from git HEAD without version pins. If upstream changes the API (e.g., the LuxTTS constructor signature or generate_speech() parameters used in backend/backends/luxtts_backend.py), builds will silently fail or behave unexpectedly. This is especially critical for custom/non-standard repositories like these.

🔒 Proposed fix to pin commits

-linacodec @ git+https://github.com/ysharma3501/LinaCodec.git -Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git +linacodec @ git+https://github.com/ysharma3501/LinaCodec.git@<commit-hash> +Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git@<commit-hash>

Determine the appropriate commit hashes from each repository and replace <commit-hash> accordingly.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/requirements.txt` around lines 21 - 22, The requirements file currently references linacodec and Zipvoice from git HEAD which is unstable; pin each git dependency to a specific commit hash in backend/requirements.txt (replace the current git URLs for "linacodec" and "Zipvoice" with the same repo URLs annotated with the chosen commit hashes, e.g. include @<commit-hash> after the repo URL) so installations are reproducible; after pinning, verify backend/backends/luxtts_backend.py (look for LuxTTS constructor usage and generate_speech() calls) still match the pinned commit API and update those callsites if the pinned version has different parameters.

coderabbitai · 2026-03-13T07:31:30Z

scripts/setup-dev-sidecar.js

+      0x64,
+      0x86, // Machine: AMD64
+      0x01,
+      0x00, // NumberOfSections: 1
+      0x00,
+      0x00,
+      0x00,
+      0x00, // TimeDateStamp
+      0x00,
+      0x00,
+      0x00,
+      0x00, // PointerToSymbolTable
+      0x00,
+      0x00,
+      0x00,
+      0x00, // NumberOfSymbols
+      0xf0,
+      0x00, // SizeOfOptionalHeader
+      0x22,
+      0x00, // Characteristics: EXECUTABLE_IMAGE | LARGE_ADDRESS_AWARE
      // Optional Header (PE32+)
-      0x0B, 0x02, // Magic: PE32+
-      0x00, 0x00, // Linker version
-      0x00, 0x00, 0x00, 0x00, // SizeOfCode
-      0x00, 0x00, 0x00, 0x00, // SizeOfInitializedData
-      0x00, 0x00, 0x00, 0x00, // SizeOfUninitializedData
-      0x00, 0x10, 0x00, 0x00, // AddressOfEntryPoint
-      0x00, 0x00, 0x00, 0x00, // BaseOfCode
-      0x00, 0x00, 0x00, 0x40, 0x01, 0x00, 0x00, 0x00, // ImageBase
-      0x00, 0x10, 0x00, 0x00, // SectionAlignment
-      0x00, 0x02, 0x00, 0x00, // FileAlignment
-      0x06, 0x00, 0x00, 0x00, // OS version
-      0x00, 0x00, 0x00, 0x00, // Image version
-      0x06, 0x00, 0x00, 0x00, // Subsystem version
-      0x00, 0x00, 0x00, 0x00, // Win32VersionValue
-      0x00, 0x20, 0x00, 0x00, // SizeOfImage
-      0x00, 0x02, 0x00, 0x00, // SizeOfHeaders
-      0x00, 0x00, 0x00, 0x00, // CheckSum
-      0x03, 0x00, // Subsystem: CONSOLE
-      0x60, 0x01, // DllCharacteristics
+      0x0b,
+      0x02, // Magic: PE32+


⚠️ Potential issue | 🟠 Major

Guard non-x64 Windows targets or generate matching PE headers.

At Line 214/Line 215 the COFF machine is hardcoded to AMD64, and Line 235/Line 236 uses PE32+ magic. If target detection returns i686-pc-windows-msvc, this produces a target-named binary with an incompatible executable format.

Proposed defensive fix

if (isWindows) { + if (!targetTriple.startsWith('x86_64-')) { + throw new Error( + `Unsupported Windows target for placeholder PE: ${targetTriple}. ` + + 'Only x86_64 Windows placeholder is currently implemented.', + ); + } + // Create a minimal valid Windows PE executable that exits with code 1 // This is the smallest valid PE that Windows will accept const minimalPE = Buffer.from([

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/setup-dev-sidecar.js` around lines 214 - 236, The COFF Machine field and Optional Header Magic in the byte array are hardcoded for AMD64/PE32+ which will produce invalid binaries for 32-bit Windows targets; update scripts/setup-dev-sidecar.js to detect the target triple (e.g., check for "i686-pc-windows-msvc") and conditionally set the Machine bytes (use 0x14,0x01 for IMAGE_FILE_MACHINE_I386) and the Optional Header Magic (use 0x0b,0x01 for PE32) instead of the current 0x64,0x86 and 0x0b,0x02 values, or alternately generate matching PE headers for the detected target so the produced binary format matches the target architecture.

- Add threading lock to get_tts_backend_for_engine() to prevent race condition where concurrent requests could create duplicate backend instances (double-checked locking pattern) - Fix LuxTTS generate: call .detach().cpu() before .numpy() so it works on GPU/MPS devices, not just CPU - Store background download tasks in a module-level set to prevent garbage collection before completion (asyncio.create_task fire-and- forget pattern) - Deduplicate cache_key computation in LuxTTS create_voice_prompt - Prefix unused sr variable with underscore

coderabbitai

🧹 Nitpick comments (2)

backend/backends/__init__.py (1)

199-204: Consider acquiring lock in reset_backends() for thread safety.

reset_backends() calls _tts_backends.clear() without holding _tts_backends_lock. While dict operations are atomic in CPython, a concurrent get_tts_backend_for_engine() call could see an inconsistent state or re-create a backend immediately after the clear. Since this function is primarily for testing, the risk is low but worth noting.

🔧 Proposed fix

 def reset_backends():
     """Reset backend instances (useful for testing)."""
-    global _tts_backend, _tts_backends, _stt_backend
-    _tts_backend = None
-    _tts_backends.clear()
-    _stt_backend = None
+    global _tts_backend, _tts_backends, _stt_backend
+    with _tts_backends_lock:
+        _tts_backend = None
+        _tts_backends.clear()
+        _stt_backend = None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` around lines 199 - 204, reset_backends()
mutates shared state without synchronizing with _tts_backends_lock; acquire
_tts_backends_lock before clearing or reassigning related globals to avoid race
conditions. Update reset_backends() to acquire _tts_backends_lock, perform
_tts_backends.clear() and set _tts_backend/_stt_backend under the lock, then
release it; ensure the lock used matches the one in get_tts_backend_for_engine()
and other backend-accessing functions.

backend/backends/luxtts_backend.py (1)

88-93: Model loading lacks protection against concurrent load_model calls.

If two coroutines call load_model() concurrently, both may pass the self.model is not None check before either completes loading. This could result in redundant model loading or resource contention.

Based on learnings, a similar race condition exists in PyTorchTTSBackend and is tracked as a future follow-up. This is a pre-existing pattern in the codebase.

🔒 Proposed fix using asyncio.Lock

+import asyncio
+
 class LuxTTSBackend:
     """LuxTTS backend for zero-shot voice cloning."""
 
     def __init__(self):
         self.model = None
         self.model_size = "default"
         self._device = None
+        self._load_lock = asyncio.Lock()
 
     # ...
 
     async def load_model(self, model_size: str = "default") -> None:
         """Load the LuxTTS model."""
-        if self.model is not None:
-            return
-
-        await asyncio.to_thread(self._load_model_sync)
+        async with self._load_lock:
+            if self.model is not None:
+                return
+            await asyncio.to_thread(self._load_model_sync)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/luxtts_backend.py` around lines 88 - 93, The load_model
coroutine suffers a race where multiple callers can pass the "if self.model is
not None" check concurrently; protect it with an asyncio.Lock: add an
asyncio.Lock instance on the backend (e.g., self._load_lock created in __init__
or lazily), then wrap the check-and-load sequence inside "async with
self._load_lock" in load_model, re-check self.model after acquiring the lock,
and only then call await asyncio.to_thread(self._load_model_sync); reference the
methods/attributes load_model, _load_model_sync, and self.model when applying
the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/backends/__init__.py`:
- Around line 199-204: reset_backends() mutates shared state without
synchronizing with _tts_backends_lock; acquire _tts_backends_lock before
clearing or reassigning related globals to avoid race conditions. Update
reset_backends() to acquire _tts_backends_lock, perform _tts_backends.clear()
and set _tts_backend/_stt_backend under the lock, then release it; ensure the
lock used matches the one in get_tts_backend_for_engine() and other
backend-accessing functions.

In `@backend/backends/luxtts_backend.py`:
- Around line 88-93: The load_model coroutine suffers a race where multiple
callers can pass the "if self.model is not None" check concurrently; protect it
with an asyncio.Lock: add an asyncio.Lock instance on the backend (e.g.,
self._load_lock created in __init__ or lazily), then wrap the check-and-load
sequence inside "async with self._load_lock" in load_model, re-check self.model
after acquiring the lock, and only then call await
asyncio.to_thread(self._load_model_sync); reference the methods/attributes
load_model, _load_model_sync, and self.model when applying the change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6fddb006-2901-421a-b297-d941800d80b2

📥 Commits

Reviewing files that changed from the base of the PR and between 1e5afc2 and 753158c.

📒 Files selected for processing (3)

backend/backends/__init__.py
backend/backends/luxtts_backend.py
backend/main.py

- Reflects merged PRs: #254 (LuxTTS/multi-engine), #257 (Chatterbox), #252 (CUDA swap), #238 (download UI) - Updated architecture diagram to show all 4 TTS engines - Added TTS engine comparison table and multi-engine architecture section - Marked resolved bottlenecks (singleton backend, frontend Qwen assumptions) - Updated PR triage: marked #194 and #33 as superseded - Added 'Adding a New Engine' guide (now ~1 day effort) - Updated recommended priorities to reflect current state - Added new API endpoints (CUDA, cancel, active tasks)

jamiepine force-pushed the feat/cuda-backend-swap branch from dccded4 to 2867421 Compare March 13, 2026 07:06

jamiepine force-pushed the feat/luxtts branch 2 times, most recently from 31f72ff to 1318293 Compare March 13, 2026 07:20

jamiepine added 6 commits March 13, 2026 00:21

chore: add justfile for streamlined dev setup and workflow

05cf163

Adds 'just' as the recommended dev tool: 'just setup' for one-time install, 'just dev' to run backend + frontend in one terminal. Updates CONTRIBUTING.md to document just as the primary setup method.

docs: add just commands to README dev quick start

411e91b

fix: add piper-phonemize find-links for LuxTTS install

e1ad7a6

piper-phonemize has no PyPI wheels — needs custom find-links URL from k2-fsa.github.io. Removed redundant transitive deps that Zipvoice already declares.

jamiepine force-pushed the feat/luxtts branch from 1318293 to 1e5afc2 Compare March 13, 2026 07:21

jamiepine changed the base branch from feat/cuda-backend-swap to main March 13, 2026 07:21

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

jamiepine merged commit 3576521 into main Mar 13, 2026
1 check passed

coderabbitai bot mentioned this pull request Mar 13, 2026

feat: Chatterbox TTS engine with multilingual voice cloning #257

Merged

coderabbitai bot mentioned this pull request Mar 13, 2026

feat: Chatterbox Turbo engine + per-engine language lists #258

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LuxTTS integration — multi-engine TTS support#254

feat: LuxTTS integration — multi-engine TTS support#254
jamiepine merged 7 commits intomainfrom
feat/luxtts

jamiepine commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 12, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		linacodec @ git+https://github.com/ysharma3501/LinaCodec.git
		Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git

Conversation

jamiepine commented Mar 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Backend

Frontend

Key design decisions

Depends on

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Mar 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 12, 2026 •

edited

Loading