Skip to content

feat: paralinguistic tag autocomplete for Chatterbox Turbo#265

Merged
jamiepine merged 5 commits intomainfrom
feat/paralinguistic-tags
Mar 13, 2026
Merged

feat: paralinguistic tag autocomplete for Chatterbox Turbo#265
jamiepine merged 5 commits intomainfrom
feat/paralinguistic-tags

Conversation

@jamiepine
Copy link
Owner

@jamiepine jamiepine commented Mar 13, 2026

Summary

Adds inline tag autocomplete for Chatterbox Turbo's 9 paralinguistic sound effects. Only appears when the engine is set to Chatterbox Turbo.

How it works

  • Type / in the text input to open a dropdown with all 9 tags
  • Filter by typing (e.g. /la shows laugh)
  • Arrow keys + Enter/Tab to select, Escape to dismiss
  • Selected tags render as inline styled badges in the text
  • Pasting text containing [laugh], [sigh] etc. auto-converts to badges
  • On submit, badges serialize back to plain [tag] text for the API

Supported tags

[laugh] [chuckle] [gasp] [cough] [sigh] [groan] [sniff] [shush] [clear throat]

Implementation

  • New ParalinguisticInput component using a contentEditable div
  • Replaces Textarea only when engine is chatterbox_turbo
  • Dropdown portalled to document.body and positioned above the caret (since the generate box sits at the bottom of the screen)
  • Integrated in both FloatingGenerateBox and GenerationForm

Summary by CodeRabbit

  • New Features

    • New paralinguistic text input for chatterbox_turbo with inline tag badges, slash-triggered autocomplete, keyboard navigation, paste/tag insertion, and improved placeholder/expansion behavior.
    • UI: option to unload loaded models from memory.
  • Bug Fixes

    • Resolved data type mismatches in chatterbox model processing to improve inference stability.

The previous approach of patching librosa.load didn't work because
melspectrogram itself performs float64 math (numpy dot, signal.lfilter)
regardless of input dtype. The actual mismatch happens when pack()
creates a float64 tensor from the mel arrays and passes it into the
float32 LSTM weights in VoiceEncoder.forward().

Fix by monkey-patching VoiceEncoder.forward() to call mels.float()
before the LSTM, ensuring the input always matches the model dtype.
- POST /models/{model_name}/unload — unloads a specific model from
  memory without deleting from disk, supports all engine types
- Frontend: Unload button in model detail dialog when model is loaded
- Delete button remains disabled while loaded (unload first)
The actual dtype mismatch was in S3Tokenizer.log_mel_spectrogram, not
VoiceEncoder.forward. librosa.load returns float64 numpy, which
torch.from_numpy preserves as double. The STFT output (double) then
hits _mel_filters (float32) in a matmul at s3tokenizer.py:163.

Now patching both entry points after model load:
1. S3Tokenizer.log_mel_spectrogram — cast audio to float32 before STFT
2. VoiceEncoder.forward — cast mels to float32 before LSTM

Remove debug traceback logging (no longer needed).
Type / in the text input when using Chatterbox Turbo to open an
autocomplete dropdown with 9 supported paralinguistic tags ([laugh],
[chuckle], [gasp], [cough], [sigh], [groan], [sniff], [shush],
[clear throat]).

- contentEditable div replaces textarea for Turbo engine only
- Tags render as inline styled badges
- Pasting text with [tag] patterns auto-converts to badges
- Badges serialize back to plain [tag] text for the API
- Dropdown portalled to body, opens above caret to avoid overflow
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Adds a ParalinguisticInput rich-text editor used when engine === "chatterbox_turbo", a model unload API + client/UI integration, and runtime dtype-casting monkey-patches in chatterbox and chatterbox_turbo backends to coerce audio tensors to float32.

Changes

Cohort / File(s) Summary
Paralinguistic Input
app/src/components/Generation/ParalinguisticInput.tsx, app/src/components/Generation/FloatingGenerateBox.tsx, app/src/components/Generation/GenerationForm.tsx
Adds a forwardRef contentEditable ParalinguisticInput with tag badges, slash-autocomplete, paste/keyboard handling and conversions; wired into GenerationForm/FloatingGenerateBox when engine === 'chatterbox_turbo', preserving fallback Textarea behavior.
Model Unload API & UI
backend/main.py, app/src/lib/api/client.ts, app/src/components/ServerSettings/ModelManagement.tsx
New backend route POST /models/{model_name}/unload, new ApiClient.unloadModel(), and ModelManagement UI: Unload button, loading state, toasts, and cache invalidation; added per-backend unload dispatch.
Backend dtype patches
backend/backends/chatterbox_backend.py, backend/backends/chatterbox_turbo_backend.py
Apply runtime monkey-patches to S3Tokenizer.log_mel_spectrogram and VoiceEncoder.forward to cast incoming audio/mel tensors to float32 to avoid dtype mismatches during inference.
Minor UI/formatting
app/src/components/ServerSettings/ModelManagement.tsx (formatSize change)
Adjusted formatSize display logic to show MB for small sizes and GB for larger sizes; minor UI text/placement changes around model actions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Frontend as Frontend<br/>(ModelManagement UI)
    participant ApiClient
    participant Backend as Backend<br/>(main.py)
    participant Model as ModelManager

    User->>Frontend: Click "Unload" for model
    Frontend->>ApiClient: call unloadModel(modelName)
    ApiClient->>Backend: POST /models/{model_name}/unload
    Backend->>Backend: resolve model_name -> (type,size)
    Backend->>Model: query is_loaded / backend-specific state
    Model-->>Backend: loaded status
    alt model is loaded
      Backend->>Model: perform backend-specific unload
      Model-->>Backend: unload success
      Backend-->>ApiClient: { message: "unloaded" }
    else not loaded
      Backend-->>ApiClient: { message: "not loaded" }
    end
    ApiClient-->>Frontend: response
    Frontend->>Frontend: show toast, invalidate queries, update UI
    Frontend-->>User: display result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰🌿
I nibble tags and badges bright,
Slash to summon sounds by night,
Models unplug, the bytes grow light,
I hop and hum — the inputs right.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main feature being added: paralinguistic tag autocomplete functionality for the Chatterbox Turbo engine, which is the primary focus of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/paralinguistic-tags
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
backend/backends/chatterbox_turbo_backend.py (1)

181-213: LGTM — consistent with chatterbox_backend.py implementation.

The dtype patching is identical to chatterbox_backend.py, ensuring consistent behavior across both Chatterbox variants.

💡 Optional: Consider extracting shared patching logic

Since both backends apply identical patches, you could extract this to a shared utility function in a common module (e.g., backend/backends/chatterbox_utils.py). This would reduce duplication and ensure both backends stay in sync if the upstream library changes.

# backend/backends/chatterbox_utils.py
def apply_dtype_patches(model):
    """Patch float64 → float32 dtype mismatches in upstream chatterbox."""
    import types
    
    _tokzr = model.s3gen.tokenizer
    _orig_log_mel = _tokzr.log_mel_spectrogram.__func__

    def _f32_log_mel(self_tokzr, audio, padding=0):
        import torch as _torch
        if _torch.is_tensor(audio):
            audio = audio.float()
        return _orig_log_mel(self_tokzr, audio, padding)

    _tokzr.log_mel_spectrogram = types.MethodType(_f32_log_mel, _tokzr)

    _ve = model.ve
    _orig_ve_forward = _ve.forward.__func__

    def _f32_ve_forward(self_ve, mels):
        return _orig_ve_forward(self_ve, mels.float())

    _ve.forward = types.MethodType(_f32_ve_forward, _ve)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/backends/chatterbox_turbo_backend.py` around lines 181 - 213, Extract
the duplicated dtype-patching code into a shared function (e.g.,
apply_dtype_patches(model)) and call it from both chatterbox_turbo_backend and
chatterbox_backend; specifically move the logic that accesses
model.s3gen.tokenizer and its log_mel_spectrogram original (__func__), and
model.ve and its forward original (__func__), into the new utility, preserve the
MethodType wrapping for _f32_log_mel and _f32_ve_forward, and then replace the
inline patching in both backends with a single call to
apply_dtype_patches(self.model) so updates remain in one place.
backend/main.py (1)

1519-1539: Consider moving repeated import outside the conditionals.

The get_tts_backend_for_engine import is repeated in each branch. Moving it to the top of the try block reduces duplication.

♻️ Proposed refactor
     try:
+        from .backends import get_tts_backend_for_engine
+
         if model_type == "tts":
             tts_model = tts.get_tts_model()
             if tts_model.is_loaded() and tts_model.model_size == model_size:
                 tts.unload_tts_model()
             else:
                 return {"message": f"Model {model_name} is not loaded"}
         elif model_type == "luxtts":
-            from .backends import get_tts_backend_for_engine
             backend = get_tts_backend_for_engine("luxtts")
             ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/main.py` around lines 1519 - 1539, Move the repeated import of
get_tts_backend_for_engine out of each model_type branch: import
get_tts_backend_for_engine once at the start of the try block, then inside the
branches call get_tts_backend_for_engine with the appropriate engine string
(e.g., "luxtts", "chatterbox", "chatterbox_turbo"); keep the existing logic that
checks backend.is_loaded() and calls backend.unload_model() or returns the
not-loaded message, but remove the duplicate from each elif branch so only the
single import remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/Generation/ParalinguisticInput.tsx`:
- Around line 357-406: The portalled autocomplete can remain open after focus
leaves the editor; add an outside-blur handler to close it by setting showMenu
to false: create refs for the editor input (e.g., editorRef) and the menu
container (menuRef used around the motion.div), then in a useEffect attach a
document 'mousedown' or 'focusin' listener that checks if the event target is
not inside editorRef.current nor menuRef.current and if so calls
setShowMenu(false); make sure to cleanup the listener on unmount and preserve
existing behavior for isComposingRef, handleInput, insertTag, and menuIndex when
closing the menu.
- Around line 139-160: lastSerializedRef is initialized to value which causes
the initial useEffect sync to mistakenly believe the editor is already hydrated;
change initialization and update logic so the editor always hydrates on mount:
initialize lastSerializedRef with an empty string (useRef<string>('')) instead
of value, and in the useEffect that writes el.innerHTML, after setting
el.innerHTML assign lastSerializedRef.current = value ?? '' (so future updates
still compare correctly). This touches the lastSerializedRef declaration and the
useEffect block that reads/writes editorRef.current.innerHTML.
- Around line 220-227: The ArrowUp/ArrowDown handlers in ParalinguisticInput.tsx
use modulo with filteredTags.length which becomes 0 and yields NaN; guard these
branches by checking filteredTags.length > 0 before calling setMenuIndex (or
early-return from the key handler when showMenu is true but filteredTags.length
=== 0) so menuIndex is only updated when there are results; update the
ArrowDown/ArrowUp blocks that call setMenuIndex to run only when
filteredTags.length > 0.
- Around line 333-356: The div retains interactive semantics when disabled and
still receives clicks/focus; update the JSX to make it non-focusable and inert
when disabled by: keep contentEditable={!disabled} and aria-disabled, but set
tabIndex={disabled ? -1 : 0} (or omit tabIndex when you prefer default), and
only attach handlers (onInput, onKeyDown, onPaste, onClick, onFocus) when
!disabled (e.g. onInput={!disabled ? handleInput : undefined}, etc.); this
ensures editorRef-backed element, handlers (handleInput, handleKeyDown,
handlePaste, onClick, onFocus) and keyboard interactions are disabled while
preserving accessible ARIA state.

In `@backend/main.py`:
- Around line 1548-1549: The except block currently re-raises an HTTPException
without chaining the original exception; modify the exception raise in the
except handler so the HTTPException is raised from the caught exception (use
"raise HTTPException(status_code=500, detail=str(e)) from e") to preserve the
original traceback—update the except block where HTTPException is raised in
backend/main.py (the handler catching "Exception as e") to use exception
chaining.

---

Nitpick comments:
In `@backend/backends/chatterbox_turbo_backend.py`:
- Around line 181-213: Extract the duplicated dtype-patching code into a shared
function (e.g., apply_dtype_patches(model)) and call it from both
chatterbox_turbo_backend and chatterbox_backend; specifically move the logic
that accesses model.s3gen.tokenizer and its log_mel_spectrogram original
(__func__), and model.ve and its forward original (__func__), into the new
utility, preserve the MethodType wrapping for _f32_log_mel and _f32_ve_forward,
and then replace the inline patching in both backends with a single call to
apply_dtype_patches(self.model) so updates remain in one place.

In `@backend/main.py`:
- Around line 1519-1539: Move the repeated import of get_tts_backend_for_engine
out of each model_type branch: import get_tts_backend_for_engine once at the
start of the try block, then inside the branches call get_tts_backend_for_engine
with the appropriate engine string (e.g., "luxtts", "chatterbox",
"chatterbox_turbo"); keep the existing logic that checks backend.is_loaded() and
calls backend.unload_model() or returns the not-loaded message, but remove the
duplicate from each elif branch so only the single import remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 238e008c-5f68-4f6a-9527-09f4761ed161

📥 Commits

Reviewing files that changed from the base of the PR and between bfe912e and b420637.

📒 Files selected for processing (8)
  • app/src/components/Generation/FloatingGenerateBox.tsx
  • app/src/components/Generation/GenerationForm.tsx
  • app/src/components/Generation/ParalinguisticInput.tsx
  • app/src/components/ServerSettings/ModelManagement.tsx
  • app/src/lib/api/client.ts
  • backend/backends/chatterbox_backend.py
  • backend/backends/chatterbox_turbo_backend.py
  • backend/main.py

Comment on lines +139 to +160
const lastSerializedRef = useRef<string>(value ?? '');
const isComposingRef = useRef(false);

useImperativeHandle(ref, () => ({
focus: () => editorRef.current?.focus(),
element: editorRef.current,
}));

// Filtered tag list for the autocomplete menu
const filteredTags = PARALINGUISTIC_TAGS.filter((t) =>
t.label.toLowerCase().includes(menuFilter.toLowerCase()),
);

// ── Sync external value → editor ──────────────────────────────
useEffect(() => {
const el = editorRef.current;
if (!el) return;
// Only update DOM if the external value differs from what we last emitted
if (value !== undefined && value !== lastSerializedRef.current) {
lastSerializedRef.current = value;
el.innerHTML = value ? textToHTML(value) : '';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Initial value can fail to render in the editor on first mount.

lastSerializedRef starts with value, so the first sync can skip innerHTML hydration when value is already non-empty.

💡 Proposed fix
-    const lastSerializedRef = useRef<string>(value ?? '');
+    const lastSerializedRef = useRef<string>('');

     // ── Sync external value → editor ──────────────────────────────
     useEffect(() => {
       const el = editorRef.current;
       if (!el) return;
-      // Only update DOM if the external value differs from what we last emitted
-      if (value !== undefined && value !== lastSerializedRef.current) {
-        lastSerializedRef.current = value;
-        el.innerHTML = value ? textToHTML(value) : '';
-      }
+      const next = value ?? '';
+      if (htmlToText(el) !== next) {
+        el.innerHTML = next ? textToHTML(next) : '';
+      }
+      lastSerializedRef.current = next;
     }, [value]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/Generation/ParalinguisticInput.tsx` around lines 139 -
160, lastSerializedRef is initialized to value which causes the initial
useEffect sync to mistakenly believe the editor is already hydrated; change
initialization and update logic so the editor always hydrates on mount:
initialize lastSerializedRef with an empty string (useRef<string>('')) instead
of value, and in the useEffect that writes el.innerHTML, after setting
el.innerHTML assign lastSerializedRef.current = value ?? '' (so future updates
still compare correctly). This touches the lastSerializedRef declaration and the
useEffect block that reads/writes editorRef.current.innerHTML.

- Initialize lastSerializedRef to empty string so first-mount hydration
  always runs (fixes initial value not rendering)
- Guard arrow-key menu nav against empty filteredTags (avoids NaN index)
- Disable ARIA role/multiline and detach event handlers when disabled
- Add onBlur to close autocomplete dropdown when editor loses focus
- Chain exception with 'from e' in unload endpoint for better tracebacks
@jamiepine jamiepine merged commit c12b5d6 into main Mar 13, 2026
1 check was pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant