feat: paralinguistic tag autocomplete for Chatterbox Turbo by jamiepine · Pull Request #265 · jamiepine/voicebox

jamiepine · 2026-03-13T12:19:40Z

Summary

Adds inline tag autocomplete for Chatterbox Turbo's 9 paralinguistic sound effects. Only appears when the engine is set to Chatterbox Turbo.

How it works

Type / in the text input to open a dropdown with all 9 tags
Filter by typing (e.g. /la shows laugh)
Arrow keys + Enter/Tab to select, Escape to dismiss
Selected tags render as inline styled badges in the text
Pasting text containing [laugh], [sigh] etc. auto-converts to badges
On submit, badges serialize back to plain [tag] text for the API

Supported tags

[laugh] [chuckle] [gasp] [cough] [sigh] [groan] [sniff] [shush] [clear throat]

Implementation

New ParalinguisticInput component using a contentEditable div
Replaces Textarea only when engine is chatterbox_turbo
Dropdown portalled to document.body and positioned above the caret (since the generate box sits at the bottom of the screen)
Integrated in both FloatingGenerateBox and GenerationForm

Summary by CodeRabbit

New Features
- New paralinguistic text input for chatterbox_turbo with inline tag badges, slash-triggered autocomplete, keyboard navigation, paste/tag insertion, and improved placeholder/expansion behavior.
- UI: option to unload loaded models from memory.
Bug Fixes
- Resolved data type mismatches in chatterbox model processing to improve inference stability.

The previous approach of patching librosa.load didn't work because melspectrogram itself performs float64 math (numpy dot, signal.lfilter) regardless of input dtype. The actual mismatch happens when pack() creates a float64 tensor from the mel arrays and passes it into the float32 LSTM weights in VoiceEncoder.forward(). Fix by monkey-patching VoiceEncoder.forward() to call mels.float() before the LSTM, ensuring the input always matches the model dtype.

- POST /models/{model_name}/unload — unloads a specific model from memory without deleting from disk, supports all engine types - Frontend: Unload button in model detail dialog when model is loaded - Delete button remains disabled while loaded (unload first)

The actual dtype mismatch was in S3Tokenizer.log_mel_spectrogram, not VoiceEncoder.forward. librosa.load returns float64 numpy, which torch.from_numpy preserves as double. The STFT output (double) then hits _mel_filters (float32) in a matmul at s3tokenizer.py:163. Now patching both entry points after model load: 1. S3Tokenizer.log_mel_spectrogram — cast audio to float32 before STFT 2. VoiceEncoder.forward — cast mels to float32 before LSTM Remove debug traceback logging (no longer needed).

Type / in the text input when using Chatterbox Turbo to open an autocomplete dropdown with 9 supported paralinguistic tags ([laugh], [chuckle], [gasp], [cough], [sigh], [groan], [sniff], [shush], [clear throat]). - contentEditable div replaces textarea for Turbo engine only - Tags render as inline styled badges - Pasting text with [tag] patterns auto-converts to badges - Badges serialize back to plain [tag] text for the API - Dropdown portalled to body, opens above caret to avoid overflow

coderabbitai · 2026-03-13T12:20:00Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a ParalinguisticInput rich-text editor used when engine === "chatterbox_turbo", a model unload API + client/UI integration, and runtime dtype-casting monkey-patches in chatterbox and chatterbox_turbo backends to coerce audio tensors to float32.

Changes

Cohort / File(s)	Summary
Paralinguistic Input `app/src/components/Generation/ParalinguisticInput.tsx`, `app/src/components/Generation/FloatingGenerateBox.tsx`, `app/src/components/Generation/GenerationForm.tsx`	Adds a forwardRef contentEditable ParalinguisticInput with tag badges, slash-autocomplete, paste/keyboard handling and conversions; wired into GenerationForm/FloatingGenerateBox when `engine === 'chatterbox_turbo'`, preserving fallback Textarea behavior.
Model Unload API & UI `backend/main.py`, `app/src/lib/api/client.ts`, `app/src/components/ServerSettings/ModelManagement.tsx`	New backend route POST `/models/{model_name}/unload`, new ApiClient.unloadModel(), and ModelManagement UI: Unload button, loading state, toasts, and cache invalidation; added per-backend unload dispatch.
Backend dtype patches `backend/backends/chatterbox_backend.py`, `backend/backends/chatterbox_turbo_backend.py`	Apply runtime monkey-patches to S3Tokenizer.log_mel_spectrogram and VoiceEncoder.forward to cast incoming audio/mel tensors to float32 to avoid dtype mismatches during inference.
Minor UI/formatting `app/src/components/ServerSettings/ModelManagement.tsx` (formatSize change)	Adjusted formatSize display logic to show MB for small sizes and GB for larger sizes; minor UI text/placement changes around model actions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend as Frontend<br/>(ModelManagement UI)
    participant ApiClient
    participant Backend as Backend<br/>(main.py)
    participant Model as ModelManager

    User->>Frontend: Click "Unload" for model
    Frontend->>ApiClient: call unloadModel(modelName)
    ApiClient->>Backend: POST /models/{model_name}/unload
    Backend->>Backend: resolve model_name -> (type,size)
    Backend->>Model: query is_loaded / backend-specific state
    Model-->>Backend: loaded status
    alt model is loaded
      Backend->>Model: perform backend-specific unload
      Model-->>Backend: unload success
      Backend-->>ApiClient: { message: "unloaded" }
    else not loaded
      Backend-->>ApiClient: { message: "not loaded" }
    end
    ApiClient-->>Frontend: response
    Frontend->>Frontend: show toast, invalidate queries, update UI
    Frontend-->>User: display result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

fix: Chatterbox float64 dtype mismatch + model unload button #264: Adds same runtime monkey-patches for chatterbox/chatterbox_turbo dtype casting and includes unload endpoint/UX overlap.
feat: Chatterbox Turbo engine + per-engine language lists #258: Overlaps chatterbox_turbo integration and frontend changes in GenerationForm/FloatingGenerateBox.
feat: Chatterbox TTS engine with multilingual voice cloning #257: Related chatterbox backend and frontend engine handling edits that intersect with these dtype and engine changes.

Poem

🐰🌿
I nibble tags and badges bright,
Slash to summon sounds by night,
Models unplug, the bytes grow light,
I hop and hum — the inputs right.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main feature being added: paralinguistic tag autocomplete functionality for the Chatterbox Turbo engine, which is the primary focus of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/paralinguistic-tags

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (2)

backend/backends/chatterbox_turbo_backend.py (1)

181-213: LGTM — consistent with chatterbox_backend.py implementation.

The dtype patching is identical to chatterbox_backend.py, ensuring consistent behavior across both Chatterbox variants.

💡 Optional: Consider extracting shared patching logic

Since both backends apply identical patches, you could extract this to a shared utility function in a common module (e.g., backend/backends/chatterbox_utils.py). This would reduce duplication and ensure both backends stay in sync if the upstream library changes.

# backend/backends/chatterbox_utils.py
def apply_dtype_patches(model):
    """Patch float64 → float32 dtype mismatches in upstream chatterbox."""
    import types
    
    _tokzr = model.s3gen.tokenizer
    _orig_log_mel = _tokzr.log_mel_spectrogram.__func__

    def _f32_log_mel(self_tokzr, audio, padding=0):
        import torch as _torch
        if _torch.is_tensor(audio):
            audio = audio.float()
        return _orig_log_mel(self_tokzr, audio, padding)

    _tokzr.log_mel_spectrogram = types.MethodType(_f32_log_mel, _tokzr)

    _ve = model.ve
    _orig_ve_forward = _ve.forward.__func__

    def _f32_ve_forward(self_ve, mels):
        return _orig_ve_forward(self_ve, mels.float())

    _ve.forward = types.MethodType(_f32_ve_forward, _ve)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/backends/chatterbox_turbo_backend.py` around lines 181 - 213, Extract
the duplicated dtype-patching code into a shared function (e.g.,
apply_dtype_patches(model)) and call it from both chatterbox_turbo_backend and
chatterbox_backend; specifically move the logic that accesses
model.s3gen.tokenizer and its log_mel_spectrogram original (__func__), and
model.ve and its forward original (__func__), into the new utility, preserve the
MethodType wrapping for _f32_log_mel and _f32_ve_forward, and then replace the
inline patching in both backends with a single call to
apply_dtype_patches(self.model) so updates remain in one place.

backend/main.py (1)

1519-1539: Consider moving repeated import outside the conditionals.

The get_tts_backend_for_engine import is repeated in each branch. Moving it to the top of the try block reduces duplication.

♻️ Proposed refactor

     try:
+        from .backends import get_tts_backend_for_engine
+
         if model_type == "tts":
             tts_model = tts.get_tts_model()
             if tts_model.is_loaded() and tts_model.model_size == model_size:
                 tts.unload_tts_model()
             else:
                 return {"message": f"Model {model_name} is not loaded"}
         elif model_type == "luxtts":
-            from .backends import get_tts_backend_for_engine
             backend = get_tts_backend_for_engine("luxtts")
             ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` around lines 1519 - 1539, Move the repeated import of
get_tts_backend_for_engine out of each model_type branch: import
get_tts_backend_for_engine once at the start of the try block, then inside the
branches call get_tts_backend_for_engine with the appropriate engine string
(e.g., "luxtts", "chatterbox", "chatterbox_turbo"); keep the existing logic that
checks backend.is_loaded() and calls backend.unload_model() or returns the
not-loaded message, but remove the duplicate from each elif branch so only the
single import remains.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/Generation/ParalinguisticInput.tsx`:
- Around line 357-406: The portalled autocomplete can remain open after focus
leaves the editor; add an outside-blur handler to close it by setting showMenu
to false: create refs for the editor input (e.g., editorRef) and the menu
container (menuRef used around the motion.div), then in a useEffect attach a
document 'mousedown' or 'focusin' listener that checks if the event target is
not inside editorRef.current nor menuRef.current and if so calls
setShowMenu(false); make sure to cleanup the listener on unmount and preserve
existing behavior for isComposingRef, handleInput, insertTag, and menuIndex when
closing the menu.
- Around line 139-160: lastSerializedRef is initialized to value which causes
the initial useEffect sync to mistakenly believe the editor is already hydrated;
change initialization and update logic so the editor always hydrates on mount:
initialize lastSerializedRef with an empty string (useRef<string>('')) instead
of value, and in the useEffect that writes el.innerHTML, after setting
el.innerHTML assign lastSerializedRef.current = value ?? '' (so future updates
still compare correctly). This touches the lastSerializedRef declaration and the
useEffect block that reads/writes editorRef.current.innerHTML.
- Around line 220-227: The ArrowUp/ArrowDown handlers in ParalinguisticInput.tsx
use modulo with filteredTags.length which becomes 0 and yields NaN; guard these
branches by checking filteredTags.length > 0 before calling setMenuIndex (or
early-return from the key handler when showMenu is true but filteredTags.length
=== 0) so menuIndex is only updated when there are results; update the
ArrowDown/ArrowUp blocks that call setMenuIndex to run only when
filteredTags.length > 0.
- Around line 333-356: The div retains interactive semantics when disabled and
still receives clicks/focus; update the JSX to make it non-focusable and inert
when disabled by: keep contentEditable={!disabled} and aria-disabled, but set
tabIndex={disabled ? -1 : 0} (or omit tabIndex when you prefer default), and
only attach handlers (onInput, onKeyDown, onPaste, onClick, onFocus) when
!disabled (e.g. onInput={!disabled ? handleInput : undefined}, etc.); this
ensures editorRef-backed element, handlers (handleInput, handleKeyDown,
handlePaste, onClick, onFocus) and keyboard interactions are disabled while
preserving accessible ARIA state.

In `@backend/main.py`:
- Around line 1548-1549: The except block currently re-raises an HTTPException
without chaining the original exception; modify the exception raise in the
except handler so the HTTPException is raised from the caught exception (use
"raise HTTPException(status_code=500, detail=str(e)) from e") to preserve the
original traceback—update the except block where HTTPException is raised in
backend/main.py (the handler catching "Exception as e") to use exception
chaining.

---

Nitpick comments:
In `@backend/backends/chatterbox_turbo_backend.py`:
- Around line 181-213: Extract the duplicated dtype-patching code into a shared
function (e.g., apply_dtype_patches(model)) and call it from both
chatterbox_turbo_backend and chatterbox_backend; specifically move the logic
that accesses model.s3gen.tokenizer and its log_mel_spectrogram original
(__func__), and model.ve and its forward original (__func__), into the new
utility, preserve the MethodType wrapping for _f32_log_mel and _f32_ve_forward,
and then replace the inline patching in both backends with a single call to
apply_dtype_patches(self.model) so updates remain in one place.

In `@backend/main.py`:
- Around line 1519-1539: Move the repeated import of get_tts_backend_for_engine
out of each model_type branch: import get_tts_backend_for_engine once at the
start of the try block, then inside the branches call get_tts_backend_for_engine
with the appropriate engine string (e.g., "luxtts", "chatterbox",
"chatterbox_turbo"); keep the existing logic that checks backend.is_loaded() and
calls backend.unload_model() or returns the not-loaded message, but remove the
duplicate from each elif branch so only the single import remains.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 238e008c-5f68-4f6a-9527-09f4761ed161

📥 Commits

Reviewing files that changed from the base of the PR and between bfe912e and b420637.

📒 Files selected for processing (8)

app/src/components/Generation/FloatingGenerateBox.tsx
app/src/components/Generation/GenerationForm.tsx
app/src/components/Generation/ParalinguisticInput.tsx
app/src/components/ServerSettings/ModelManagement.tsx
app/src/lib/api/client.ts
backend/backends/chatterbox_backend.py
backend/backends/chatterbox_turbo_backend.py
backend/main.py

coderabbitai · 2026-03-13T12:27:12Z

app/src/components/Generation/ParalinguisticInput.tsx

+    const lastSerializedRef = useRef<string>(value ?? '');
+    const isComposingRef = useRef(false);
+
+    useImperativeHandle(ref, () => ({
+      focus: () => editorRef.current?.focus(),
+      element: editorRef.current,
+    }));
+
+    // Filtered tag list for the autocomplete menu
+    const filteredTags = PARALINGUISTIC_TAGS.filter((t) =>
+      t.label.toLowerCase().includes(menuFilter.toLowerCase()),
+    );
+
+    // ── Sync external value → editor ──────────────────────────────
+    useEffect(() => {
+      const el = editorRef.current;
+      if (!el) return;
+      // Only update DOM if the external value differs from what we last emitted
+      if (value !== undefined && value !== lastSerializedRef.current) {
+        lastSerializedRef.current = value;
+        el.innerHTML = value ? textToHTML(value) : '';
+      }


⚠️ Potential issue | 🟠 Major

Initial value can fail to render in the editor on first mount.

lastSerializedRef starts with value, so the first sync can skip innerHTML hydration when value is already non-empty.

💡 Proposed fix

- const lastSerializedRef = useRef<string>(value ?? ''); + const lastSerializedRef = useRef<string>(''); // ── Sync external value → editor ────────────────────────────── useEffect(() => { const el = editorRef.current; if (!el) return; - // Only update DOM if the external value differs from what we last emitted - if (value !== undefined && value !== lastSerializedRef.current) { - lastSerializedRef.current = value; - el.innerHTML = value ? textToHTML(value) : ''; - } + const next = value ?? ''; + if (htmlToText(el) !== next) { + el.innerHTML = next ? textToHTML(next) : ''; + } + lastSerializedRef.current = next; }, [value]);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/src/components/Generation/ParalinguisticInput.tsx` around lines 139 - 160, lastSerializedRef is initialized to value which causes the initial useEffect sync to mistakenly believe the editor is already hydrated; change initialization and update logic so the editor always hydrates on mount: initialize lastSerializedRef with an empty string (useRef<string>('')) instead of value, and in the useEffect that writes el.innerHTML, after setting el.innerHTML assign lastSerializedRef.current = value ?? '' (so future updates still compare correctly). This touches the lastSerializedRef declaration and the useEffect block that reads/writes editorRef.current.innerHTML.

app/src/components/Generation/ParalinguisticInput.tsx

backend/main.py

- Initialize lastSerializedRef to empty string so first-mount hydration always runs (fixes initial value not rendering) - Guard arrow-key menu nav against empty filteredTags (avoids NaN index) - Disable ARIA role/multiline and detach event handlers when disabled - Add onBlur to close autocomplete dropdown when editor loses focus - Chain exception with 'from e' in unload endpoint for better tracebacks

jamiepine added 4 commits March 13, 2026 04:41

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

jamiepine merged commit c12b5d6 into main Mar 13, 2026
1 check was pending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: paralinguistic tag autocomplete for Chatterbox Turbo#265

feat: paralinguistic tag autocomplete for Chatterbox Turbo#265
jamiepine merged 5 commits intomainfrom
feat/paralinguistic-tags

jamiepine commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamiepine commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Supported tags

Implementation

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading