Skip to content

feat: LuxTTS integration — multi-engine TTS support#254

Merged
jamiepine merged 7 commits intomainfrom
feat/luxtts
Mar 13, 2026
Merged

feat: LuxTTS integration — multi-engine TTS support#254
jamiepine merged 7 commits intomainfrom
feat/luxtts

Conversation

@jamiepine
Copy link
Owner

@jamiepine jamiepine commented Mar 12, 2026

Summary

  • Adds LuxTTS (ZipVoice) as a second TTS engine alongside Qwen TTS, introducing multi-engine support to Voicebox
  • Users can select between Qwen TTS (multi-language, two model sizes) and LuxTTS (fast, English-focused, 48kHz, ~1GB VRAM) via a new engine dropdown in the generation form
  • Fully backward-compatible — engine defaults to "qwen" so existing workflows are unchanged

Changes

Backend

  • backend/backends/luxtts_backend.py — New LuxTTSBackend wrapping zipvoice.luxvoice.LuxTTS (encode_prompt, generate_speech, model caching, device auto-detection)
  • backend/backends/__init__.py — Multi-engine registry with get_tts_backend_for_engine(engine) replacing the singleton pattern; TTS_ENGINES dict for supported engines
  • backend/models.pyengine field on GenerationRequest (optional, default "qwen", validated ^(qwen|luxtts)$)
  • backend/main.py — Engine dispatch in /generate and /generate/stream endpoints; LuxTTS added to model status, download trigger, and delete maps
  • backend/profiles.pycreate_voice_prompt_for_profile accepts engine param, uses engine-specific backend
  • backend/requirements.txt — Added Zipvoice (git install from LuxTTS repo), onnxruntime, piper-phonemize, lhotse, pydub, inflect; pinned transformers<=4.57.6

Frontend

  • GenerationForm.tsx — New TTS Engine selector dropdown; Model Size and Delivery Instructions fields hidden when LuxTTS is selected (not applicable)
  • useGenerationForm.tsengine added to Zod schema with default 'qwen'; engine-aware model name resolution and API payload construction
  • types.tsengine and instruct fields added to GenerationRequest TypeScript type
  • ModelManagement.tsx — New "LuxTTS Models" section (conditionally rendered when LuxTTS models exist in status)

Key design decisions

  • Engine-prefixed cache keys (luxtts_ prefix) prevent voice prompt collisions between Qwen and LuxTTS
  • LuxTTS has one model size — no size selector needed; the model auto-downloads from HuggingFace (YatharthS/LuxTTS) on first use
  • LuxTTS ignores instruct — delivery instructions are a Qwen-only feature, hidden in UI when LuxTTS is selected
  • 48kHz output — LuxTTS generates at 48kHz natively (vs Qwen's 24kHz)

Depends on

Summary by CodeRabbit

  • New Features

    • Added LuxTTS voice-cloning option alongside Qwen TTS; unified model selector and engine-aware UI (hides incompatible controls).
    • Engine-aware generation and streaming with background model download/management and progress handling.
  • Documentation

    • Switched Quick Start/Development to Just-based workflow; added Project Structure and expanded Performance platform details.
  • Chores

    • Added Just-based dev automation and updated dependencies to support the new TTS integration.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 12, 2026

📝 Walkthrough

Walkthrough

Adds multi-engine TTS support (LuxTTS + Qwen): new LuxTTS backend, engine-dispatching in backend, engine-aware API and frontend changes, form/types updates, dependency additions, and migration of developer docs/workflow from Makefile to a Just-based setup.

Changes

Cohort / File(s) Summary
Documentation & Project Setup
CONTRIBUTING.md, README.md, justfile
Replaced primary developer workflow recommendations with a Just-based workflow; added a comprehensive Justfile with setup/dev/build/lint/db utilities; updated docs, prerequisites, and project structure guidance.
Dev Utilities
scripts/setup-dev-sidecar.js
Removed verbose user-facing logs and simplified placeholder binary creation flow; formatting and import reordering only.
Frontend — Generation UI
app/src/components/Generation/FloatingGenerateBox.tsx, app/src/components/Generation/GenerationForm.tsx
Unified model selection into a single Select combining engine+modelSize (options: qwen:1.7B, qwen:0.6B, luxtts); conditionally hide auxiliary controls (including Delivery Instructions) for LuxTTS; adapt selection logic to set engine and modelSize appropriately.
Frontend — Model Management
app/src/components/ServerSettings/ModelManagement.tsx
Added a LuxTTS models section that lists models prefixed with luxtts and exposes download/delete actions analogous to existing model groups.
Frontend Types & Hooks
app/src/lib/api/types.ts, app/src/lib/hooks/useGenerationForm.ts
Added optional engine and instruct to request types; form schema now includes engine (default qwen); submission logic and model naming adjusted per engine; reset preserves engine-related fields.
Backend — Engine Dispatcher & Caching
backend/backends/__init__.py
Introduced per-engine backend registry, thread-safe caching, TTS_ENGINES map, and get_tts_backend_for_engine(engine) to resolve/memoize backends.
Backend — LuxTTS Implementation
backend/backends/luxtts_backend.py
New LuxTTSBackend class: device selection, lazy load/unload, model caching and download progress handling, voice-prompt encoding, combining multi-samples, and text→speech generation with seed/instruct support.
Backend — API, Profiles & Flow
backend/main.py, backend/profiles.py, backend/models.py
GenerationRequest gains engine (default qwen) with validation; create_voice_prompt_for_profile signature now accepts engine and use_cache; generation endpoints and model management updated to load/unload/download per selected engine and to propagate engine/use_cache through flows.
Backend — Dependencies
backend/requirements.txt
Bound transformers version range; added git-based dependencies for LinaCodec and LuxTTS (Zipvoice); added a --find-links entry for piper_phonemize and explanatory comments for LuxTTS integration.

Sequence Diagram

sequenceDiagram
    participant User
    participant Frontend as Frontend UI
    participant API as Backend API / Dispatcher
    participant Qwen as Qwen Backend
    participant Lux as LuxTTS Backend
    participant Cache as Model Cache

    User->>Frontend: Select engine (qwen | luxtts) and submit
    Frontend->>API: POST generate_speech(text, engine)
    API->>API: get_tts_backend_for_engine(engine)

    alt engine == "qwen"
        API->>Qwen: load_model()
        Qwen->>Cache: check/download model
        Cache-->>Qwen: model ready
        API->>Qwen: create_voice_prompt(profile, use_cache)
        Qwen-->>API: voice_prompt
        API->>Qwen: generate(text, voice_prompt)
        Qwen-->>API: audio
    else engine == "luxtts"
        API->>Lux: load_model()
        Lux->>Cache: check/download model
        Cache-->>Lux: model ready
        API->>Lux: create_voice_prompt(profile, use_cache)
        Lux-->>API: voice_prompt
        API->>Lux: generate(text, voice_prompt, instruct)
        Lux-->>API: audio
    end

    API-->>Frontend: stream audio
    Frontend->>User: play audio
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰✨ Hopped in quick with code to bless,

Qwen and Lux now sing, no less.
Backends lined up, prompts combined,
Forms that choose which voice you'll find,
Justfile set — devs ready to press.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: adding LuxTTS as a second TTS engine alongside Qwen TTS to enable multi-engine TTS support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/luxtts
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jamiepine jamiepine force-pushed the feat/cuda-backend-swap branch from dccded4 to 2867421 Compare March 13, 2026 07:06
@jamiepine jamiepine force-pushed the feat/luxtts branch 2 times, most recently from 31f72ff to 1318293 Compare March 13, 2026 07:20
Introduce LuxTTS (ZipVoice) alongside Qwen TTS, enabling users to choose
between engines at generation time. LuxTTS offers fast, English-focused
voice cloning at 48kHz with ~1GB VRAM.

Backend:
- Add LuxTTSBackend with encode_prompt/generate_speech integration
- Multi-engine registry (get_tts_backend_for_engine) replacing singleton
- Engine-prefixed voice prompt cache keys to avoid collisions
- Engine field on GenerationRequest (default 'qwen' for backward compat)
- Engine dispatch in /generate and /generate/stream endpoints
- LuxTTS in model status, download, and delete maps

Frontend:
- TTS Engine selector dropdown in GenerationForm (Qwen TTS / LuxTTS)
- Conditionally hide Model Size and Delivery Instructions for LuxTTS
- Engine field added to TypeScript types and Zod schema
- LuxTTS section in Model Management page
Adds 'just' as the recommended dev tool: 'just setup' for one-time
install, 'just dev' to run backend + frontend in one terminal.
Updates CONTRIBUTING.md to document just as the primary setup method.
piper-phonemize has no PyPI wheels — needs custom find-links URL
from k2-fsa.github.io. Removed redundant transitive deps that
Zipvoice already declares.
- Combine engine + model size into one flat dropdown (Qwen3-TTS 1.7B,
  Qwen3-TTS 0.6B, LuxTTS) in both FloatingGenerateBox and GenerationForm
- Add linacodec git dep to requirements.txt (uv-only source, pip can't
  resolve it from Zipvoice's pyproject.toml)
- Remove redundant transitive deps from requirements.txt
- Quiet the sidecar setup script (was printing misleading instructions)
- Fix silent Zod validation failure when LuxTTS selected (modelSize was
  set to 'default' which failed enum validation, preventing form submit)
- Preserve engine, model size, and language after successful generation
  instead of resetting to defaults
@jamiepine jamiepine changed the base branch from feat/cuda-backend-swap to main March 13, 2026 07:21
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (5)
justfile (1)

62-63: Consider a readiness check instead of fixed sleep.

The sleep 2 is a timing assumption that may be insufficient if the backend takes longer to initialize (e.g., first run with model downloads). A health check loop would be more robust:

♻️ Optional: Replace sleep with health check
     echo "Starting backend on http://localhost:17493 ..."
     {{ venv_bin }}/uvicorn backend.main:app --reload --port 17493 &
-    sleep 2
+    # Wait for backend to be ready (up to 30s)
+    for i in {1..30}; do
+        curl -sf http://localhost:17493/health >/dev/null && break
+        sleep 1
+    done

     echo "Starting Tauri desktop app..."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@justfile` around lines 62 - 63, The fixed 2-second sleep after starting the
server (the line running "{{ venv_bin }}/uvicorn backend.main:app --reload
--port 17493 &" followed by "sleep 2") is brittle; replace it with a
readiness/health-check loop that polls a known endpoint (e.g., /health or /docs)
on localhost:17493 until it returns a successful status or a timeout is reached,
retrying with short sleeps between attempts; ensure the loop runs after
launching uvicorn in the background and fails the script with a clear message if
the timeout is exceeded.
backend/backends/luxtts_backend.py (1)

174-192: Minor: cache_key computed twice.

The cache_key is computed on line 176 for the cache lookup, then again on line 191 for caching. Consider reusing the variable.

♻️ Proposed fix
     async def create_voice_prompt(
         self,
         audio_path: str,
         reference_text: str,
         use_cache: bool = True,
     ) -> Tuple[dict, bool]:
         # ...
         await self.load_model()
 
+        cache_key = "luxtts_" + get_cache_key(audio_path, reference_text) if use_cache else None
+
         if use_cache:
-            cache_key = "luxtts_" + get_cache_key(audio_path, reference_text)
             cached = get_cached_voice_prompt(cache_key)
             if cached is not None and isinstance(cached, dict):
                 return cached, True
 
         # ... encode ...
 
         if use_cache:
-            cache_key = "luxtts_" + get_cache_key(audio_path, reference_text)
             cache_voice_prompt(cache_key, encoded)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/luxtts_backend.py` around lines 174 - 192, The cache_key is
computed twice; compute it once before the cache lookup and reuse it for both
get_cached_voice_prompt and cache_voice_prompt when use_cache is true: move the
call to get_cache_key(audio_path, reference_text) into a single cache_key
variable (prefixed with "luxtts_") before calling get_cached_voice_prompt, keep
the same cache_key to decide early return, then after awaiting
asyncio.to_thread(_encode_sync) call cache_voice_prompt(cache_key, encoded);
adjust the block that defines _encode_sync and references to use_cache,
cache_key, get_cached_voice_prompt, cache_voice_prompt, and
self.model.encode_prompt accordingly.
app/src/components/Generation/FloatingGenerateBox.tsx (1)

405-439: Consider wrapping in FormField for consistency.

The model/engine selector uses form.watch() and form.setValue() directly rather than FormField with render prop like the language selector (lines 381-403). While this works, it's inconsistent with the rest of the form and won't display validation errors via FormMessage.

Since this is a composite field controlling two form values (engine and modelSize), the current approach is pragmatic. No immediate fix needed, but worth noting for future refactors.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/Generation/FloatingGenerateBox.tsx` around lines 405 -
439, The Select block directly uses form.watch('engine') and form.setValue(...)
to manage engine and modelSize, which is inconsistent and bypasses validation
UI; wrap this composite selector inside a FormField (like the language selector)
by creating a FormField for a virtual/compound field that renders the Select via
its render prop, and inside that render use field.onChange/field.value or
continue calling form.setValue but expose FormMessage for validation; reference
the Select component and the form fields 'engine' and 'modelSize' and ensure the
Select's onValueChange still sets form.setValue('engine', ...) and
form.setValue('modelSize', ...) while the surrounding FormField provides
FormItem/FormControl/FormMessage for consistency.
backend/main.py (2)

618-634: Fire-and-forget task references may cause issues.

The background download tasks created on lines 625 and 649 are not stored. While unlikely in practice, these tasks could theoretically be garbage collected before completion.

♻️ Proposed fix - store task references
+# Module-level set to keep background tasks alive
+_background_tasks: set = set()
+
 # In generate_speech:
-                asyncio.create_task(download_model_background())
+                task = asyncio.create_task(download_model_background())
+                _background_tasks.add(task)
+                task.add_done_callback(_background_tasks.discard)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` around lines 618 - 634, The fire-and-forget
asyncio.create_task call for download_model_background risks GC before
completion; capture and retain the Task (e.g., assign the result of
asyncio.create_task(...) to a variable and add it to a long-lived collection or
let task_manager track it) so the background task is referenced until finished,
and ensure download_model_background handles exceptions and removes the task
from the tracking collection on completion; update locations referencing
download_model_background, tts_model.load_model_async,
task_manager.start_download, and the create_task invocation to implement this
task-tracking approach.

1360-1367: check_luxtts_loaded may instantiate backend unnecessarily.

Calling get_tts_backend_for_engine("luxtts") to check if it's loaded will create the LuxTTSBackend instance if it doesn't exist yet. This is fine since LuxTTSBackend.__init__() doesn't load the model, but it's worth noting that this differs from the Qwen check which uses the existing tts.get_tts_model() singleton.

Consider checking if the backend exists in the registry first:

♻️ Alternative approach
     def check_luxtts_loaded():
         try:
-            from .backends import get_tts_backend_for_engine
-            backend = get_tts_backend_for_engine("luxtts")
-            return backend.is_loaded()
+            from .backends import _tts_backends
+            backend = _tts_backends.get("luxtts")
+            return backend.is_loaded() if backend else False
         except Exception:
             return False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` around lines 1360 - 1367, The current check_luxtts_loaded
calls get_tts_backend_for_engine("luxtts") which may instantiate a
LuxTTSBackend; change it to first inspect the TTS backend registry for an
existing "luxtts" entry and only call get_tts_backend_for_engine if an instance
is already registered. Concretely: import the backends module used by
get_tts_backend_for_engine (from .backends import get_tts_backend_for_engine,
<registry_name>), check the registry container for the key "luxtts" (or the
registry API that lists available/registered backends) and if present call
backend = get_tts_backend_for_engine("luxtts") and return backend.is_loaded(),
otherwise return False; update check_luxtts_loaded to use that registry check
instead of unconditionally calling get_tts_backend_for_engine.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/ServerSettings/ModelManagement.tsx`:
- Around line 306-320: The LuxTTS block renders <ModelItem> without required
props from <ModelItemProps>, causing type errors and missing cancel/dismiss
behavior; update the <ModelItem> invocation to pass the missing props: provide
onCancel (call the same cancel handler used elsewhere, e.g., the function used
for downloads/cancels), pass isCancelling (compare a cancellingModel state to
model.model_name) and isDismissed (use the dismissal state/lookup used for other
models), while keeping existing props like model, onDownload (handleDownload),
onDelete (setModelToDelete + setDeleteDialogOpen), isDownloading
(downloadingModel === model.model_name) and formatSize so LuxTTS items support
cancel/dismiss handling and satisfy the type checker.

In `@backend/backends/__init__.py`:
- Around line 145-165: The backend lookup/creation for _tts_backends using the
check-then-act pattern is prone to race conditions; wrap the creation/store
sequence in a module-level threading.Lock (e.g., _tts_backends_lock) so only one
thread can create and assign a backend for a given engine; acquire the lock
before re-checking "if engine in _tts_backends", instantiate the correct class
(MLXTTSBackend via get_backend_type(), PyTorchTTSBackend, or LuxTTSBackend),
store it into _tts_backends[engine], then release the lock to ensure a single
instance is created and avoid duplicate model loads.

In `@backend/backends/luxtts_backend.py`:
- Around line 206-217: In the loop that builds combined_audio, the sample rate
return from load_audio is assigned to sr but never used; change the unpacking to
use a throwaway variable (e.g., audio, _ = load_audio(path, sample_rate=24000))
so the unused value is clearly ignored; update the line inside the function that
calls load_audio (referenced in the code block using load_audio,
normalize_audio, combined_audio, mixed, combined_text) to prefix the unused
variable with an underscore.
- Around line 259-262: The returned tensor from self.model.generate_speech
(variable wav) may reside on GPU/MPS; change the conversion to ensure the tensor
is moved to CPU and detached before calling .numpy() (e.g., call .cpu() and
.detach() on wav) so audio = ... then .squeeze() works regardless of device;
update the conversion around the lines that convert wav to numpy in the function
that calls self.model.generate_speech to use wav.detach().cpu().numpy() (or
equivalent) instead of wav.numpy().

In `@backend/requirements.txt`:
- Around line 21-22: The requirements file currently references linacodec and
Zipvoice from git HEAD which is unstable; pin each git dependency to a specific
commit hash in backend/requirements.txt (replace the current git URLs for
"linacodec" and "Zipvoice" with the same repo URLs annotated with the chosen
commit hashes, e.g. include @<commit-hash> after the repo URL) so installations
are reproducible; after pinning, verify backend/backends/luxtts_backend.py (look
for LuxTTS constructor usage and generate_speech() calls) still match the pinned
commit API and update those callsites if the pinned version has different
parameters.

In `@scripts/setup-dev-sidecar.js`:
- Around line 214-236: The COFF Machine field and Optional Header Magic in the
byte array are hardcoded for AMD64/PE32+ which will produce invalid binaries for
32-bit Windows targets; update scripts/setup-dev-sidecar.js to detect the target
triple (e.g., check for "i686-pc-windows-msvc") and conditionally set the
Machine bytes (use 0x14,0x01 for IMAGE_FILE_MACHINE_I386) and the Optional
Header Magic (use 0x0b,0x01 for PE32) instead of the current 0x64,0x86 and
0x0b,0x02 values, or alternately generate matching PE headers for the detected
target so the produced binary format matches the target architecture.

---

Nitpick comments:
In `@app/src/components/Generation/FloatingGenerateBox.tsx`:
- Around line 405-439: The Select block directly uses form.watch('engine') and
form.setValue(...) to manage engine and modelSize, which is inconsistent and
bypasses validation UI; wrap this composite selector inside a FormField (like
the language selector) by creating a FormField for a virtual/compound field that
renders the Select via its render prop, and inside that render use
field.onChange/field.value or continue calling form.setValue but expose
FormMessage for validation; reference the Select component and the form fields
'engine' and 'modelSize' and ensure the Select's onValueChange still sets
form.setValue('engine', ...) and form.setValue('modelSize', ...) while the
surrounding FormField provides FormItem/FormControl/FormMessage for consistency.

In `@backend/backends/luxtts_backend.py`:
- Around line 174-192: The cache_key is computed twice; compute it once before
the cache lookup and reuse it for both get_cached_voice_prompt and
cache_voice_prompt when use_cache is true: move the call to
get_cache_key(audio_path, reference_text) into a single cache_key variable
(prefixed with "luxtts_") before calling get_cached_voice_prompt, keep the same
cache_key to decide early return, then after awaiting
asyncio.to_thread(_encode_sync) call cache_voice_prompt(cache_key, encoded);
adjust the block that defines _encode_sync and references to use_cache,
cache_key, get_cached_voice_prompt, cache_voice_prompt, and
self.model.encode_prompt accordingly.

In `@backend/main.py`:
- Around line 618-634: The fire-and-forget asyncio.create_task call for
download_model_background risks GC before completion; capture and retain the
Task (e.g., assign the result of asyncio.create_task(...) to a variable and add
it to a long-lived collection or let task_manager track it) so the background
task is referenced until finished, and ensure download_model_background handles
exceptions and removes the task from the tracking collection on completion;
update locations referencing download_model_background,
tts_model.load_model_async, task_manager.start_download, and the create_task
invocation to implement this task-tracking approach.
- Around line 1360-1367: The current check_luxtts_loaded calls
get_tts_backend_for_engine("luxtts") which may instantiate a LuxTTSBackend;
change it to first inspect the TTS backend registry for an existing "luxtts"
entry and only call get_tts_backend_for_engine if an instance is already
registered. Concretely: import the backends module used by
get_tts_backend_for_engine (from .backends import get_tts_backend_for_engine,
<registry_name>), check the registry container for the key "luxtts" (or the
registry API that lists available/registered backends) and if present call
backend = get_tts_backend_for_engine("luxtts") and return backend.is_loaded(),
otherwise return False; update check_luxtts_loaded to use that registry check
instead of unconditionally calling get_tts_backend_for_engine.

In `@justfile`:
- Around line 62-63: The fixed 2-second sleep after starting the server (the
line running "{{ venv_bin }}/uvicorn backend.main:app --reload --port 17493 &"
followed by "sleep 2") is brittle; replace it with a readiness/health-check loop
that polls a known endpoint (e.g., /health or /docs) on localhost:17493 until it
returns a successful status or a timeout is reached, retrying with short sleeps
between attempts; ensure the loop runs after launching uvicorn in the background
and fails the script with a clear message if the timeout is exceeded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 42548d0a-ba2d-4505-a3f0-e926aa815e54

📥 Commits

Reviewing files that changed from the base of the PR and between 6359dee and 1e5afc2.

📒 Files selected for processing (15)
  • CONTRIBUTING.md
  • README.md
  • app/src/components/Generation/FloatingGenerateBox.tsx
  • app/src/components/Generation/GenerationForm.tsx
  • app/src/components/ServerSettings/ModelManagement.tsx
  • app/src/lib/api/types.ts
  • app/src/lib/hooks/useGenerationForm.ts
  • backend/backends/__init__.py
  • backend/backends/luxtts_backend.py
  • backend/main.py
  • backend/models.py
  • backend/profiles.py
  • backend/requirements.txt
  • justfile
  • scripts/setup-dev-sidecar.js

Comment on lines +306 to +320
<ModelItem
key={model.model_name}
model={model}
onDownload={() => handleDownload(model.model_name)}
onDelete={() => {
setModelToDelete({
name: model.model_name,
displayName: model.display_name,
sizeMb: model.size_mb,
});
setDeleteDialogOpen(true);
}}
isDownloading={downloadingModel === model.model_name}
formatSize={formatSize}
/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Pass required ModelItem props in the LuxTTS block (type-check blocker).

At Line 306, ModelItem is rendered without onCancel, isCancelling, and isDismissed, but those props are required by ModelItemProps (Lines 478-481). This is a compile-time correctness issue and also drops cancel/error handling behavior for LuxTTS entries.

Suggested fix
                       <ModelItem
                         key={model.model_name}
                         model={model}
                         onDownload={() => handleDownload(model.model_name)}
                         onDelete={() => {
                           setModelToDelete({
                             name: model.model_name,
                             displayName: model.display_name,
                             sizeMb: model.size_mb,
                           });
                           setDeleteDialogOpen(true);
                         }}
+                        onCancel={() => handleCancel(model.model_name)}
                         isDownloading={downloadingModel === model.model_name}
+                        isCancelling={
+                          cancelMutation.isPending && cancelMutation.variables === model.model_name
+                        }
+                        isDismissed={dismissedErrors.has(model.model_name)}
+                        erroredDownload={erroredDownloads.get(model.model_name)}
                         formatSize={formatSize}
                       />
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 306 -
320, The LuxTTS block renders <ModelItem> without required props from
<ModelItemProps>, causing type errors and missing cancel/dismiss behavior;
update the <ModelItem> invocation to pass the missing props: provide onCancel
(call the same cancel handler used elsewhere, e.g., the function used for
downloads/cancels), pass isCancelling (compare a cancellingModel state to
model.model_name) and isDismissed (use the dismissal state/lookup used for other
models), while keeping existing props like model, onDownload (handleDownload),
onDelete (setModelToDelete + setDeleteDialogOpen), isDownloading
(downloadingModel === model.model_name) and formatSize so LuxTTS items support
cancel/dismiss handling and satisfy the type checker.

Comment on lines +21 to +22
linacodec @ git+https://github.com/ysharma3501/LinaCodec.git
Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get latest commit hashes for pinning
echo "LinaCodec latest commit:"
curl -s https://api.github.com/repos/ysharma3501/LinaCodec/commits/main | jq -r '.sha'

echo "LuxTTS latest commit:"
curl -s https://api.github.com/repos/ysharma3501/LuxTTS/commits/main | jq -r '.sha'

Repository: jamiepine/voicebox

Length of output: 118


🏁 Script executed:

#!/bin/bash
# Check if repositories exist and their default branches
echo "=== LinaCodec Repository Info ==="
curl -s -I https://github.com/ysharma3501/LinaCodec
echo ""
echo "=== LuxTTS Repository Info ==="
curl -s -I https://github.com/ysharma3501/LuxTTS
echo ""
# Try with master branch instead of main
echo "=== Trying master branch for LinaCodec ==="
curl -s https://api.github.com/repos/ysharma3501/LinaCodec/commits/master | jq -r '.sha' 2>/dev/null || echo "master not found"
echo ""
echo "=== Trying master branch for LuxTTS ==="
curl -s https://api.github.com/repos/ysharma3501/LuxTTS/commits/master | jq -r '.sha' 2>/dev/null || echo "master not found"

Repository: jamiepine/voicebox

Length of output: 9827


🌐 Web query:

ysharma3501 LinaCodec GitHub repository

💡 Result:

GitHub repository: ysharma3501/LinaCodec — “A highly compressive and high-quality neural audio codec for speech models.” It provides a Python package (linacodec) with example usage for encoding audio into tokens + a global embedding and decoding back to 48 kHz audio, plus helpers like voice conversion. [1]

Install (from README):

pip install git+https://github.com/ysharma3501/LinaCodec.git

[1]

Associated model card (points back to the GitHub repo): YatharthS/LinaCodec on Hugging Face. [2]

Sources: [1] [2]


🌐 Web query:

ysharma3501 LuxTTS GitHub repository

💡 Result:

The GitHub repository is ysharma3501/LuxTTS: https://github.com/ysharma3501/LuxTTS (LuxTTS is a lightweight ZipVoice-based TTS model for high-quality voice cloning, advertised at 150× realtime). [1]

Related model page (points back to the same repo): https://huggingface.co/YatharthS/LuxTTS. [2]


Pin git dependencies to specific commits for reproducibility and stability.

The linacodec and Zipvoice packages are installed from git HEAD without version pins. If upstream changes the API (e.g., the LuxTTS constructor signature or generate_speech() parameters used in backend/backends/luxtts_backend.py), builds will silently fail or behave unexpectedly. This is especially critical for custom/non-standard repositories like these.

🔒 Proposed fix to pin commits
-linacodec @ git+https://github.com/ysharma3501/LinaCodec.git
-Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git
+linacodec @ git+https://github.com/ysharma3501/LinaCodec.git@<commit-hash>
+Zipvoice @ git+https://github.com/ysharma3501/LuxTTS.git@<commit-hash>

Determine the appropriate commit hashes from each repository and replace <commit-hash> accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/requirements.txt` around lines 21 - 22, The requirements file
currently references linacodec and Zipvoice from git HEAD which is unstable; pin
each git dependency to a specific commit hash in backend/requirements.txt
(replace the current git URLs for "linacodec" and "Zipvoice" with the same repo
URLs annotated with the chosen commit hashes, e.g. include @<commit-hash> after
the repo URL) so installations are reproducible; after pinning, verify
backend/backends/luxtts_backend.py (look for LuxTTS constructor usage and
generate_speech() calls) still match the pinned commit API and update those
callsites if the pinned version has different parameters.

Comment on lines +214 to +236
0x64,
0x86, // Machine: AMD64
0x01,
0x00, // NumberOfSections: 1
0x00,
0x00,
0x00,
0x00, // TimeDateStamp
0x00,
0x00,
0x00,
0x00, // PointerToSymbolTable
0x00,
0x00,
0x00,
0x00, // NumberOfSymbols
0xf0,
0x00, // SizeOfOptionalHeader
0x22,
0x00, // Characteristics: EXECUTABLE_IMAGE | LARGE_ADDRESS_AWARE
// Optional Header (PE32+)
0x0B, 0x02, // Magic: PE32+
0x00, 0x00, // Linker version
0x00, 0x00, 0x00, 0x00, // SizeOfCode
0x00, 0x00, 0x00, 0x00, // SizeOfInitializedData
0x00, 0x00, 0x00, 0x00, // SizeOfUninitializedData
0x00, 0x10, 0x00, 0x00, // AddressOfEntryPoint
0x00, 0x00, 0x00, 0x00, // BaseOfCode
0x00, 0x00, 0x00, 0x40, 0x01, 0x00, 0x00, 0x00, // ImageBase
0x00, 0x10, 0x00, 0x00, // SectionAlignment
0x00, 0x02, 0x00, 0x00, // FileAlignment
0x06, 0x00, 0x00, 0x00, // OS version
0x00, 0x00, 0x00, 0x00, // Image version
0x06, 0x00, 0x00, 0x00, // Subsystem version
0x00, 0x00, 0x00, 0x00, // Win32VersionValue
0x00, 0x20, 0x00, 0x00, // SizeOfImage
0x00, 0x02, 0x00, 0x00, // SizeOfHeaders
0x00, 0x00, 0x00, 0x00, // CheckSum
0x03, 0x00, // Subsystem: CONSOLE
0x60, 0x01, // DllCharacteristics
0x0b,
0x02, // Magic: PE32+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard non-x64 Windows targets or generate matching PE headers.

At Line 214/Line 215 the COFF machine is hardcoded to AMD64, and Line 235/Line 236 uses PE32+ magic. If target detection returns i686-pc-windows-msvc, this produces a target-named binary with an incompatible executable format.

Proposed defensive fix
   if (isWindows) {
+    if (!targetTriple.startsWith('x86_64-')) {
+      throw new Error(
+        `Unsupported Windows target for placeholder PE: ${targetTriple}. ` +
+          'Only x86_64 Windows placeholder is currently implemented.',
+      );
+    }
+
     // Create a minimal valid Windows PE executable that exits with code 1
     // This is the smallest valid PE that Windows will accept
     const minimalPE = Buffer.from([
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/setup-dev-sidecar.js` around lines 214 - 236, The COFF Machine field
and Optional Header Magic in the byte array are hardcoded for AMD64/PE32+ which
will produce invalid binaries for 32-bit Windows targets; update
scripts/setup-dev-sidecar.js to detect the target triple (e.g., check for
"i686-pc-windows-msvc") and conditionally set the Machine bytes (use 0x14,0x01
for IMAGE_FILE_MACHINE_I386) and the Optional Header Magic (use 0x0b,0x01 for
PE32) instead of the current 0x64,0x86 and 0x0b,0x02 values, or alternately
generate matching PE headers for the detected target so the produced binary
format matches the target architecture.

- Add threading lock to get_tts_backend_for_engine() to prevent race
  condition where concurrent requests could create duplicate backend
  instances (double-checked locking pattern)
- Fix LuxTTS generate: call .detach().cpu() before .numpy() so it
  works on GPU/MPS devices, not just CPU
- Store background download tasks in a module-level set to prevent
  garbage collection before completion (asyncio.create_task fire-and-
  forget pattern)
- Deduplicate cache_key computation in LuxTTS create_voice_prompt
- Prefix unused sr variable with underscore
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
backend/backends/__init__.py (1)

199-204: Consider acquiring lock in reset_backends() for thread safety.

reset_backends() calls _tts_backends.clear() without holding _tts_backends_lock. While dict operations are atomic in CPython, a concurrent get_tts_backend_for_engine() call could see an inconsistent state or re-create a backend immediately after the clear. Since this function is primarily for testing, the risk is low but worth noting.

🔧 Proposed fix
 def reset_backends():
     """Reset backend instances (useful for testing)."""
-    global _tts_backend, _tts_backends, _stt_backend
-    _tts_backend = None
-    _tts_backends.clear()
-    _stt_backend = None
+    global _tts_backend, _tts_backends, _stt_backend
+    with _tts_backends_lock:
+        _tts_backend = None
+        _tts_backends.clear()
+        _stt_backend = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/__init__.py` around lines 199 - 204, reset_backends()
mutates shared state without synchronizing with _tts_backends_lock; acquire
_tts_backends_lock before clearing or reassigning related globals to avoid race
conditions. Update reset_backends() to acquire _tts_backends_lock, perform
_tts_backends.clear() and set _tts_backend/_stt_backend under the lock, then
release it; ensure the lock used matches the one in get_tts_backend_for_engine()
and other backend-accessing functions.
backend/backends/luxtts_backend.py (1)

88-93: Model loading lacks protection against concurrent load_model calls.

If two coroutines call load_model() concurrently, both may pass the self.model is not None check before either completes loading. This could result in redundant model loading or resource contention.

Based on learnings, a similar race condition exists in PyTorchTTSBackend and is tracked as a future follow-up. This is a pre-existing pattern in the codebase.

🔒 Proposed fix using asyncio.Lock
+import asyncio
+
 class LuxTTSBackend:
     """LuxTTS backend for zero-shot voice cloning."""
 
     def __init__(self):
         self.model = None
         self.model_size = "default"
         self._device = None
+        self._load_lock = asyncio.Lock()
 
     # ...
 
     async def load_model(self, model_size: str = "default") -> None:
         """Load the LuxTTS model."""
-        if self.model is not None:
-            return
-
-        await asyncio.to_thread(self._load_model_sync)
+        async with self._load_lock:
+            if self.model is not None:
+                return
+            await asyncio.to_thread(self._load_model_sync)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/luxtts_backend.py` around lines 88 - 93, The load_model
coroutine suffers a race where multiple callers can pass the "if self.model is
not None" check concurrently; protect it with an asyncio.Lock: add an
asyncio.Lock instance on the backend (e.g., self._load_lock created in __init__
or lazily), then wrap the check-and-load sequence inside "async with
self._load_lock" in load_model, re-check self.model after acquiring the lock,
and only then call await asyncio.to_thread(self._load_model_sync); reference the
methods/attributes load_model, _load_model_sync, and self.model when applying
the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/backends/__init__.py`:
- Around line 199-204: reset_backends() mutates shared state without
synchronizing with _tts_backends_lock; acquire _tts_backends_lock before
clearing or reassigning related globals to avoid race conditions. Update
reset_backends() to acquire _tts_backends_lock, perform _tts_backends.clear()
and set _tts_backend/_stt_backend under the lock, then release it; ensure the
lock used matches the one in get_tts_backend_for_engine() and other
backend-accessing functions.

In `@backend/backends/luxtts_backend.py`:
- Around line 88-93: The load_model coroutine suffers a race where multiple
callers can pass the "if self.model is not None" check concurrently; protect it
with an asyncio.Lock: add an asyncio.Lock instance on the backend (e.g.,
self._load_lock created in __init__ or lazily), then wrap the check-and-load
sequence inside "async with self._load_lock" in load_model, re-check self.model
after acquiring the lock, and only then call await
asyncio.to_thread(self._load_model_sync); reference the methods/attributes
load_model, _load_model_sync, and self.model when applying the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6fddb006-2901-421a-b297-d941800d80b2

📥 Commits

Reviewing files that changed from the base of the PR and between 1e5afc2 and 753158c.

📒 Files selected for processing (3)
  • backend/backends/__init__.py
  • backend/backends/luxtts_backend.py
  • backend/main.py

@jamiepine jamiepine merged commit 3576521 into main Mar 13, 2026
1 check passed
jamiepine added a commit that referenced this pull request Mar 13, 2026
- Reflects merged PRs: #254 (LuxTTS/multi-engine), #257 (Chatterbox), #252 (CUDA swap), #238 (download UI)
- Updated architecture diagram to show all 4 TTS engines
- Added TTS engine comparison table and multi-engine architecture section
- Marked resolved bottlenecks (singleton backend, frontend Qwen assumptions)
- Updated PR triage: marked #194 and #33 as superseded
- Added 'Adding a New Engine' guide (now ~1 day effort)
- Updated recommended priorities to reflect current state
- Added new API endpoints (CUDA, cancel, active tasks)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant