Skip to content

Feat/rr 411 add text to audio node#583

Closed
dsapandora wants to merge 35 commits intodevelopfrom
feat/RR-411-add-text-to-audio-node
Closed

Feat/rr 411 add text to audio node#583
dsapandora wants to merge 35 commits intodevelopfrom
feat/RR-411-add-text-to-audio-node

Conversation

@dsapandora
Copy link
Copy Markdown
Collaborator

@dsapandora dsapandora commented Mar 31, 2026

Summary

New Text-to-Speech (TTS) pipeline node (audio_tts) that converts text into speech audio. It supports 5 engines — 3 local and 2 cloud — selectable via preconfig profiles, and follows the established IGlobal / IInstance node pattern.

Input: text lane · Output: audio lane (WAV or MP3 container bytes)


Supported Engines

Profile Engine Type Notes
piper Piper Local Neural ONNX voices from HF Hub; downloaded and cached on first use
kokoro Kokoro Local Kokoro-82M via KPipeline; language auto-detected from voice prefix
bark Bark Local HF text-to-audio pipeline; GPU when available
openai OpenAI Cloud OpenAI TTS API (gpt-4o-mini-tts); requires api_key or OPENAI_API_KEY
elevenlabs ElevenLabs Cloud ElevenLabs API; returns MP3 directly; requires api_key or ELEVENLABS_API_KEY

Default profile: piper


Architecture

flowchart TB
    subgraph Pipeline["Pipeline Runtime"]
        direction LR
        UP["Upstream node<br/>(text producer)"] -->|text lane| TTS["audio_tts<br/>filter node"]
        TTS -->|audio lane<br/>WAV / MP3| DOWN["Downstream node<br/>(audio consumer)"]
    end

    subgraph Node["Node Internals"]
        direction TB
        IG["IGlobal<br/>• beginGlobal: install deps, create TTSEngine<br/>• synthesize: hot-swap engine on config change<br/>• validateConfig: fail-fast on missing keys<br/>• cleanup stale temp files (≤ every 5 min)"]
        II["IInstance<br/>• writeText → synthesize → writeAudio<br/>• BEGIN / WRITE / END streaming<br/>• temp file deleted in finally block"]
        IG --- II
    end

    subgraph Engine["TTSEngine (tts_engine.py)"]
        direction TB
        PIPER["Piper<br/>PiperVoice ONNX"]
        KOKORO["Kokoro<br/>KPipeline + misaki G2P"]
        BARK["Bark<br/>HF transformers pipeline"]
        OAI["OpenAI<br/>REST API (direct)"]
        EL["ElevenLabs<br/>REST API (direct)"]
    end

    subgraph Shared["packages/ai — Shared Loaders"]
        direction TB
        PL["PiperLoader"]
        KL["KokoroLoader"]
    end

    II -->|"synthesize(text)"| IG
    IG -->|"config dict"| Engine
    Engine -.->|"--modelserver"| Shared
Loading

Configuration Flow

sequenceDiagram
    participant UI as Builder UI (RJSF)
    participant SVC as services.json
    participant CC as connConfig
    participant GNC as Config.getNodeConfig
    participant IG as IGlobal
    participant ENG as TTSEngine

    UI->>SVC: read preconfig profiles + shape
    UI->>CC: user selects profile + options
    CC->>GNC: merge preconfig defaults + overrides
    GNC->>IG: merged config dict
    IG->>IG: _normalize_engine_id()
    IG->>IG: _resolve_tts_model() / _resolve_tts_voice()
    IG->>ENG: _build_tts_config_dict()
    Note over IG,ENG: Engine is hot-swapped if<br/>identity signature changes
Loading

Key Improvements

Reliability & Resource Management

  • Atomic temp files — replaced timestamp-based naming with tempfile.mkstemp(prefix='tts_') to prevent race conditions
  • Deterministic cleanup — temp files deleted in finally blocks on both success and failure paths
  • Stale file sweeper_cleanup_stale_outputs() runs at most once every 5 minutes, removing tts_*.wav/mp3 files older than 1 hour
  • Correct dispose semantics — fixed self._engine.dispose()inst.dispose() so the captured reference is always disposed

Engine Hot-Swap

  • _tts_identity_signature() detects config changes (engine, voice, model, model-server flag)
  • TTSEngine is recreated only when the identity actually changes — no unnecessary teardown

Configuration Robustness

  • _normalize_engine_id() handles compound UI ids (kokoro-default, openai-tts) by extracting the canonical prefix
  • validateConfig() called before depends() so missing API keys surface immediately in the node panel
  • Per-engine model/voice resolvers with legacy key fallbacks (modelopenai_model, voiceopenai_voice)

Shared Model Loaders (packages/ai)

  • PiperLoader — ONNX voice inference via piper-tts, runs on model server CPU worker
  • KokoroLoader — Kokoro-82M via kokoro.KPipeline, model server GPU/CPU
  • Cloud engines (OpenAI, ElevenLabs) call their REST APIs directly from tts_engine.py — no loader needed
  • piper_hub — HF Hub download + cache for Piper ONNX voice files
  • spacy_en_model — auto-installs matching en_core_web_sm wheel for the installed spaCy version (fixes wasabi / Exception: 1 issue)
  • wav_to_mp3 — in-process MP3 encoding via lameenc (no ffmpeg dependency)

Test Suite

  • test_iglobal — config building, engine normalization, validation, hot-swap, stale cleanup
  • test_iinstance — writeText lifecycle, empty input handling, synthesis failure propagation
  • test_normalize — edge cases for _normalize_engine_id() (canonical, compound, invalid)
  • test_tts_engine — per-engine synthesis paths with mocked backends, dispose lifecycle

Files Changed (27 files, +3,616 / −58)

Area Files
Node nodes/src/nodes/audio_tts/IGlobal.py, IInstance.py, tts_engine.py, piper_catalog.py, services.json, README.md, requirements.txt, __init__.py
Shared loaders packages/ai/src/ai/common/models/audio/piper_loader.py, kokoro_loader.py, piper_hub.py, piper_native.py, spacy_en_model.py, wav_to_mp3.py
Tests nodes/test/audio_tts/test_iglobal.py, test_iinstance.py, test_normalize.py, test_tts_engine.py, conftest.py

Type

Feature

Testing

  • Tests added (4 test modules with shared conftest)
  • Tested locally (all 5 engines verified)
  • ./builder test passes

Checklist

  • No secrets or credentials included
  • MIT license headers on all new files
  • README with configuration, troubleshooting, and cross-platform notes

Linked Issue

Fixes #411

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new Audio TTS node with IGlobal/IInstance, a multi-backend TTSEngine, config resolution and cloud API key handling, model-server loaders and Piper hub helpers, requirements and service registration, and per-utterance temp-file lifecycle and cleanup.

Changes

Cohort / File(s) Summary
Node Core
nodes/src/nodes/audio_tts/IGlobal.py, nodes/src/nodes/audio_tts/IInstance.py, nodes/src/nodes/audio_tts/__init__.py
New node package: global lifecycle (beginGlobal/endGlobal), config validation, per-call synth with identity signature and engine recreation, audio/text lane routing, and package exports.
TTS Engine & Runtime
nodes/src/nodes/audio_tts/tts_engine.py, nodes/src/nodes/audio_tts/requirements.txt
TTSEngine dispatcher implementing piper/kokoro/bark/openai/elevenlabs backends, local vs model-server modes, WAV writing and MP3 transcoding with lameenc/ffmpeg fallback, plus node-specific runtime deps.
Config Resolution
nodes/src/nodes/audio_tts/config_resolver.py
Merged-config normalization, profile→engine binding, multi-shape api_key extraction, and cloud API key resolution (env fallback).
Service Definition & Docs
nodes/src/nodes/audio_tts/services.json, nodes/src/nodes/audio_tts/README.md
New service registration audio_tts://, UI/preconfig and fields, tests, and README describing wiring semantics, engines, model-server rules, and troubleshooting.
Instance Behavior
nodes/src/nodes/audio_tts/IInstance.py
Instance-level writeText flow: infer output format, call IGlobal.synthesize, emit audio lane (BEGIN/WRITE/END) and/or text lane (JSON with base64), and cleanup temp files.
Model Server Loaders (cloud)
packages/ai/src/ai/common/models/audio/cloud_tts_loader.py
New OpenAITTSLoader and ElevenLabsTTSLoader: model-server/HTTP proxy loaders producing base64 audio, per-bundle locks, CPU allocation metadata, and error handling.
Model Server Loaders (kokoro/piper)
packages/ai/src/ai/common/models/audio/kokoro_loader.py, packages/ai/src/ai/common/models/audio/piper_loader.py
Kokoro and Piper BaseLoader implementations: dependency handling, device/allocation logic, synchronized inference, and return of base64 WAV outputs.
Piper Hub & Native Helpers
packages/ai/src/ai/common/models/audio/piper_hub.py, packages/ai/src/ai/common/models/audio/piper_native.py, nodes/src/nodes/audio_tts/piper_catalog.py
HF Hub downloader for Piper voices, in-process Piper load/write helpers, and small catalog re-export.
Audio Utilities & Requirements
packages/ai/src/ai/common/models/audio/wav_to_mp3.py, packages/ai/src/ai/common/models/audio/requirements_piper.txt, packages/ai/src/ai/common/models/audio/requirements_kokoro.txt
WAV→MP3 conversion via lameenc with ffmpeg fallback; per-backend requirements files.
Package Exports & Base Update
packages/ai/src/ai/common/models/__init__.py, packages/ai/src/ai/common/models/audio/__init__.py, packages/ai/src/ai/common/models/base.py
Re-exported new TTS loader symbols (ElevenLabs/OpenAI/Kokoro/Piper) and added piper to ModelClient.load_model docstring/types.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant IInstance as IInstance
    participant IGlobal as IGlobal
    participant Config as ConfigResolver
    participant TTSEngine as TTSEngine
    participant Local as LocalBackend
    participant Cloud as CloudAPI

    Client->>IInstance: writeText(text)
    activate IInstance

    IInstance->>IGlobal: _build_tts_config / synthesize(text, out_fmt)
    activate IGlobal

    IGlobal->>Config: resolve_merged_config()
    Config-->>IGlobal: merged config

    IGlobal->>IGlobal: compute identity signature
    alt signature or engine changed
        IGlobal->>TTSEngine: dispose existing engine
        IGlobal->>TTSEngine: instantiate new engine(config)
    end

    IGlobal->>TTSEngine: synthesize(text)
    activate TTSEngine

    alt local backend
        TTSEngine->>Local: load pipeline/voice & synth
        Local-->>TTSEngine: audio bytes
    else cloud backend
        TTSEngine->>Cloud: HTTP POST (api_key, model, voice)
        Cloud-->>TTSEngine: audio bytes
    end

    TTSEngine-->>IGlobal: {path, mime_type}
    deactivate TTSEngine
    IGlobal-->>IInstance: {path, mime_type}
    deactivate IGlobal

    IInstance->>IInstance: read temp file bytes
    opt audio lane connected
        IInstance->>Client: stream audio (BEGIN/WRITE/END)
    end
    opt text lane connected
        IInstance->>Client: emit JSON {mime_type, base64}
    end
    IInstance->>IInstance: delete temp file
    deactivate IInstance
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested reviewers

  • jmaionchi
  • stepmikhaylov

Poem

🐰 I hopped through configs, voices bright,
Merged profiles hummed into the night,
From Piper burrows to cloud-lit skies,
WAVs and MP3s in tiny sighs,
A rabbit cheers — the TTS takes flight!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Feat/rr 411 add text to audio node' accurately describes the main change: introducing a new text-to-speech audio node. It is clear, concise, and directly related to the core functionality added.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/RR-411-add-text-to-audio-node

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added docs Documentation module:nodes Python pipeline nodes module:ai AI/ML modules labels Mar 31, 2026
@dsapandora dsapandora marked this pull request as ready for review April 2, 2026 12:37
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 16-22: The profile/engine mapping and resolution must canonicalize
UI-style ids (e.g., "kokoro-default", "openai-tts", "elevenlabs-default") to
their base engine names before lookup; update the logic that uses
_PROFILE_TO_ENGINE and the function _engine_from_merged_cfg to normalize the
value returned by resolve_merged_config() (strip known suffixes like "-default",
"-tts" and/or map alias forms) so lookups always use bare engine ids
("kokoro","bark","openai","elevenlabs","piper"); modify both the profile mapping
declaration and the _engine_from_merged_cfg path that reads
merged_cfg["profile"] or merged_cfg["engine"] to run the canonicalization helper
first and then return the mapped engine name to TTSEngine.
- Around line 237-239: The current filename generation in IGlobal.py (variables
filename/out_path) races under burst traffic; reserve the path using an
exclusive temp-file API before returning it to TTSEngine. Replace the time-based
filename with a call to tempfile.mkstemp or
tempfile.NamedTemporaryFile(delete=False) to atomically create/claim the file
(use suffix='.'+ext and a suitable prefix like 'tts_'), close the returned file
descriptor if using mkstemp, and pass the resulting secure path to TTSEngine;
also ensure the chosen extension logic (ext = output_format if output_format in
('wav','mp3') else 'wav') is preserved so the temp file has the correct suffix.
Ensure IInstance.py can still delete the file later.
- Around line 194-207: The engine identity tuples for OpenAI/ElevenLabs must
include credential changes so cached clients are invalidated: modify the return
values in the block handling e == 'openai' and e == 'elevenlabs' to also include
the API key (e.g., str(cfg.get('api_key','') or '').strip()) as part of the
tuple alongside model, voice and the use_model_server flag so TTSEngine's client
cache sees credential rotations and rebuilds the remote client.
- Around line 218-246: synthesize currently mutates shared state (_config,
_engine, _engine.config) without synchronization; wrap the engine lifecycle and
config mutation in a lock (e.g., add a threading.RLock on the instance and
acquire it while deciding need_engine, disposing/creating engines, and assigning
self._config/self._engine), capture the current engine in a local variable
(inst) and call dispose on that captured instance instead of self._engine, and
avoid overwriting shared engine.config during synthesis by passing a runtime_cfg
to the engine's synthesize call or using a cloned/per-request config object so
the engine can perform synthesize using the per-request runtime_cfg without
mutating shared state (referencing synthesize, _tts_identity_signature, _engine,
_config, TTSEngine, and _cleanup_stale_outputs).

In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 55-72: The current code always reads the entire temp_path into raw
even when only want_audio is True; change it so you only read-and-buffer the
full file when want_text is True (needed for base64), and for want_audio (when
text is not connected) open temp_path and stream it in chunks calling
self.instance.writeAudio(AVI_ACTION.BEGIN, mime), then multiple
self.instance.writeAudio(AVI_ACTION.WRITE, mime, chunk) per chunk, and finally
self.instance.writeAudio(AVI_ACTION.END, mime); keep using
payload.get('mime_type') for mime and only build text_payload/base64 when
want_text is True so you avoid double-buffering and enable true streaming via
the existing AVI_ACTION and self.instance.writeAudio/writeText calls.

In `@nodes/src/nodes/audio_tts/README.md`:
- Line 65: Remove the user-facing mention of kokoro_lang_code from the README
section and instead document only audio_tts.kokoro_voice as the user-settable
field; note that kokoro_lang_code is derived server-side from kokoro_voice (do
not present it as configurable). Update the line referencing "**Kokoro:**
**`kokoro_lang_code`**, **`kokoro_voice`**" to only mention "**Kokoro:**
**`kokoro_voice`** — must match Kokoro’s voice table" and add a short
parenthetical indicating that kokoro_lang_code is computed from kokoro_voice in
the Python backend.

In `@nodes/src/nodes/audio_tts/services.json`:
- Around line 280-295: Update the "test" matrix to include the default "piper"
profile and add a smoke assertion for the promised payload shape: modify the
"profiles" array in the "test" object to include "piper" and extend the "cases"
-> "expect" for the response to verify the presence of "mime_type" and a
base64-encoded audio payload (e.g., check that "audio" has keys "mime_type" and
"base64" and that "base64" is not empty) instead of only asserting non-empty
text so regressions in the Piper backend payload shape are caught; locate the
"test" entry, the "profiles" array, and the "cases" expectation objects to make
these changes.

In `@nodes/src/nodes/audio_tts/tts_engine.py`:
- Around line 119-132: The temporary WAV created with tempfile.mkstemp() in TTS
engine must be unconditionally removed even if writing or transcoding fails;
wrap the creation/use/removal of wav_path in a try/finally (or ensure finally
cleanup) so write_piper_wav(...) and _transcode_wav_to_mp3(...) errors do not
leak tmp*.wav files—apply the same pattern inside _piper() around
write_piper_wav(...) + _transcode_wav_to_mp3(...), ensuring the temporary file
created by mkstemp() (wav_path) is removed in the finally block and any OSError
on removal is swallowed/logged.

In `@packages/ai/src/ai/common/models/audio/cloud_tts_loader.py`:
- Around line 163-165: The postprocess() implementations currently return a
fabricated single-item batch by using "batch_size or 1" when raw_output has no
items; change that to preserve actual batch cardinality by using "batch_size or
0" (i.e., return [{} for _ in range(batch_size or 0)]) so an empty input batch
yields an empty list. Apply this exact change to both postprocess() occurrences
in this file (the two postprocess methods in cloud_tts_loader.py).

In `@packages/ai/src/ai/common/models/audio/requirements_kokoro.txt`:
- Around line 4-5: The hardcoded wheel URL for en_core_web_sm-3.8.0 in
requirements_kokoro.txt creates a maintenance risk; update the file to either
(a) document the coupling to spaCy and the reason for pinning this exact wheel
URL (referencing the en_core_web_sm wheel URL and the en_core_web_sm model name)
or (b) add a fallback mechanism: attempt to install the pinned wheel first and
if that fails, fall back to using spaCy's model install (spacy.cli.download)
wrapped in safe error handling so it doesn't sys.exit, ensuring code paths
referencing en_core_web_sm handle both cases; include a short comment explaining
why the fallback is needed and how to update the wheel version when upgrading
spaCy.

In `@packages/ai/src/ai/common/models/audio/wav_to_mp3.py`:
- Around line 29-30: The error message raised when invalid channel count is
detected uses an en dash character; update the string in the ValueError raised
alongside the nchannels check to use a regular hyphen-minus ("-") instead of the
en dash so searches/grepping and linters (RUF001) work correctly; locate the
conditional that checks nchannels and change the message text in the raise
ValueError call accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7790af72-2a98-4747-ab45-48f9a7e0825a

📥 Commits

Reviewing files that changed from the base of the PR and between 6f1c58c and fc94265.

📒 Files selected for processing (20)
  • nodes/src/nodes/audio_tts/IGlobal.py
  • nodes/src/nodes/audio_tts/IInstance.py
  • nodes/src/nodes/audio_tts/README.md
  • nodes/src/nodes/audio_tts/__init__.py
  • nodes/src/nodes/audio_tts/config_resolver.py
  • nodes/src/nodes/audio_tts/piper_catalog.py
  • nodes/src/nodes/audio_tts/requirements.txt
  • nodes/src/nodes/audio_tts/services.json
  • nodes/src/nodes/audio_tts/tts_engine.py
  • packages/ai/src/ai/common/models/__init__.py
  • packages/ai/src/ai/common/models/audio/__init__.py
  • packages/ai/src/ai/common/models/audio/cloud_tts_loader.py
  • packages/ai/src/ai/common/models/audio/kokoro_loader.py
  • packages/ai/src/ai/common/models/audio/piper_hub.py
  • packages/ai/src/ai/common/models/audio/piper_loader.py
  • packages/ai/src/ai/common/models/audio/piper_native.py
  • packages/ai/src/ai/common/models/audio/requirements_kokoro.txt
  • packages/ai/src/ai/common/models/audio/requirements_piper.txt
  • packages/ai/src/ai/common/models/audio/wav_to_mp3.py
  • packages/ai/src/ai/common/models/base.py

Comment thread nodes/src/nodes/audio_tts/IGlobal.py Outdated
Comment thread nodes/src/nodes/audio_tts/IGlobal.py Outdated
Comment thread nodes/src/nodes/audio_tts/IGlobal.py Outdated
Comment thread nodes/src/nodes/audio_tts/IGlobal.py Outdated
Comment thread nodes/src/nodes/audio_tts/IInstance.py Outdated
Comment thread nodes/src/nodes/audio_tts/services.json
Comment thread nodes/src/nodes/audio_tts/tts_engine.py Outdated
Comment thread packages/ai/src/ai/common/models/audio/cloud_tts_loader.py Outdated
Comment thread packages/ai/src/ai/common/models/audio/requirements_kokoro.txt Outdated
Comment thread packages/ai/src/ai/common/models/audio/wav_to_mp3.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
nodes/src/nodes/audio_tts/IGlobal.py (1)

255-257: ⚠️ Potential issue | 🟠 Major

Reserve the output file atomically.

The millisecond-based name is not unique across workers/processes, and TTSEngine writes straight to whatever output_path it receives. Two syntheses landing in the same millisecond can clobber each other; mkstemp() also gives you the restrictive temp-file permissions that manual path construction skips.

🗂️ Possible fix
-        filename = f'tts_{int(time.time() * 1000)}.{ext}'
-        out_path = os.path.join(tempfile.gettempdir(), filename)
+        fd, out_path = tempfile.mkstemp(prefix='tts_', suffix=f'.{ext}')
+        os.close(fd)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 255 - 257, Replace the
millisecond-based filename/tempdir construction with an atomic temp-file
reservation using tempfile.mkstemp so concurrent syntheses cannot clobber each
other: determine ext as you already do (ext from output_format), call
tempfile.mkstemp(suffix=f'.{ext}') to get (fd, path), close the fd
(os.close(fd)) to allow TTSEngine to open/write the path, and use that path in
place of out_path; this preserves mkstemp's restrictive permissions and
guarantees a unique, reserved file for the TTS output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 149-176: _cleanup_stale_outputs currently does an O(n) walk of the
system temp dir on every call and synthesize() invokes it per-utterance; move
this off the per-request path by either (A) running _cleanup_stale_outputs in a
single background/daemon thread started once (e.g., in the node/class
initializer) that sleeps or schedules runs at a configured interval, or (B) add
a simple rate-limit guard in synthesize() using a timestamp property like
self._last_cleanup and only call _cleanup_stale_outputs when now -
self._last_cleanup > cleanup_interval_sec; update references to
_cleanup_stale_outputs and synthesize accordingly so cleanup is no longer
performed synchronously per request and ensure thread-safety when accessing any
shared state.
- Around line 177-191: The validateConfig method currently uses
_engine_from_merged_cfg() but doesn't reject unknown engine ids; update
validateConfig to check that the resolved engine is one of the supported engines
(e.g. 'elevenlabs','openai','piper','kokoro' or a central list if available) and
raise a clear Exception if not supported (message like `Unknown TTS engine
"<engine>"`); reference validateConfig and _engine_from_merged_cfg (or
TTSEngine.synthesize where failure currently surfaces) and perform the
membership check before proceeding with api_key/voice-specific validations.

---

Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 255-257: Replace the millisecond-based filename/tempdir
construction with an atomic temp-file reservation using tempfile.mkstemp so
concurrent syntheses cannot clobber each other: determine ext as you already do
(ext from output_format), call tempfile.mkstemp(suffix=f'.{ext}') to get (fd,
path), close the fd (os.close(fd)) to allow TTSEngine to open/write the path,
and use that path in place of out_path; this preserves mkstemp's restrictive
permissions and guarantees a unique, reserved file for the TTS output.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 58cc5610-4a9e-4073-8e58-a3b6b7d02e34

📥 Commits

Reviewing files that changed from the base of the PR and between fc94265 and d450fc7.

📒 Files selected for processing (1)
  • nodes/src/nodes/audio_tts/IGlobal.py

Comment thread nodes/src/nodes/audio_tts/IGlobal.py
Comment thread nodes/src/nodes/audio_tts/IGlobal.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
nodes/src/nodes/audio_tts/IInstance.py (1)

73-90: 🧹 Nitpick | 🔵 Trivial

Stream audio-only output instead of buffering the whole file.

This still reads the full artifact into raw even when only the audio lane is connected. For long utterances that doubles peak memory and bypasses the existing BEGIN/WRITE/END streaming contract; only buffer when the text lane needs base64.

♻️ Possible refactor
-            raw: bytes | None = None
-            if want_audio or want_text:
-                with open(temp_path, 'rb') as fin:
-                    raw = fin.read()
+            raw: bytes | None = None
+            if want_text:
+                with open(temp_path, 'rb') as fin:
+                    raw = fin.read()

-            if want_audio and raw is not None:
+            if want_audio and not want_text:
+                mime = payload.get('mime_type', 'audio/wav')
+                self.instance.writeAudio(AVI_ACTION.BEGIN, mime)
+                with open(temp_path, 'rb') as fin:
+                    while chunk := fin.read(64 * 1024):
+                        self.instance.writeAudio(AVI_ACTION.WRITE, mime, chunk)
+                self.instance.writeAudio(AVI_ACTION.END, mime)
+            elif want_audio and raw is not None:
                 mime = payload.get('mime_type', 'audio/wav')
                 self.instance.writeAudio(AVI_ACTION.BEGIN, mime)
                 self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw)
                 self.instance.writeAudio(AVI_ACTION.END, mime)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IInstance.py` around lines 73 - 90, The code
currently reads the entire temp_path into raw even when only audio output is
requested; change the logic so that when want_audio is true you open temp_path
and stream it in chunks to self.instance.writeAudio between AVI_ACTION.BEGIN and
AVI_ACTION.END (calling writeAudio with each chunk and the mime from
payload.get('mime_type')), and only read and base64-encode the full file when
want_text is true (i.e., buffer into base64 only if want_text is set so you can
build text_payload and call self.instance.writeText); ensure you still call
BEGIN before the first WRITE and END after the last chunk and avoid allocating
raw when want_text is false.
nodes/src/nodes/audio_tts/IGlobal.py (2)

149-175: ⚠️ Potential issue | 🟠 Major

Keep stale-temp cleanup off the per-utterance hot path.

This still walks the entire system temp directory on every synthesis call. On a busy or shared temp dir that adds blocking O(n) filesystem work to each utterance, even when nothing is old enough to delete.

♻️ Possible fix
     def _cleanup_stale_outputs(self):
+        now = time.time()
+        last_cleanup = float(getattr(self, '_last_temp_cleanup_at', 0.0) or 0.0)
+        if now - last_cleanup < 300:
+            return
+        self._last_temp_cleanup_at = now
+
         # Keep current output behavior but prevent unbounded temp directory growth.
         if not self._read_cfg(self._config, 'cleanup_temp_outputs', True):
             return

         max_age_sec = int(self._read_cfg(self._config, 'temp_output_max_age_sec', 3600))
         if max_age_sec <= 0:
             return

-        now = time.time()
         temp_dir = tempfile.gettempdir()

Also applies to: 254-254

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 149 - 175, The
_cleanup_stale_outputs method is being called on the per-utterance hot path and
scans the whole system temp dir each time; change it to run only infrequently by
adding a rate-limit check (e.g., store a timestamp in an instance attribute like
self._last_cleanup_ts or a small marker file in tempfile.gettempdir()) and
return immediately if the last run was within a configured interval (e.g.,
temp_output_cleanup_interval_sec); keep the existing logic for deletion
unchanged but only execute it when the rate-limit allows, and expose the
interval via the same config keys read by _cleanup_stale_outputs.

177-183: ⚠️ Potential issue | 🟡 Minor

Reject unsupported engine ids during validation.

A typo in engine currently passes validateConfig() and only fails later inside TTSEngine.synthesize(). Failing fast here gives users a config-time error instead of a runtime pipeline failure.

🛡️ Possible fix
     def validateConfig(self):
         raw, cfg = self._resolve_merged_config()
         engine = self._engine_from_merged_cfg(cfg)
+        if engine not in {'piper', 'kokoro', 'bark', 'bak', 'openai', 'elevenlabs'}:
+            raise Exception(f'Unsupported TTS engine "{engine}"')
         api_key = self._resolve_cloud_api_key(cfg, raw, engine)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 177 - 183, validateConfig
currently accepts unknown engine ids and lets TTSEngine.synthesize fail later;
update validateConfig to immediately reject unsupported engine values by
checking the resolved engine (from _engine_from_merged_cfg(cfg)) against the
canonical supported list (use the same source of truth as TTSEngine, e.g., the
supported engines constant/enum or a helper like TTSEngine.SUPPORTED_ENGINES or
a shared _supported_engines list). If the engine is not in that set, raise an
Exception with a clear message mentioning the invalid engine and the allowed
engine ids; keep the existing api_key checks for 'elevenlabs' and 'openai'
afterwards.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 256-264: mkstemp reserves out_path but if
self._engine.synthesize(text) raises, the reserved tts_* file is left behind;
wrap the synthesize call in a try/except (or try/finally) around the block that
creates runtime_cfg/out_path and call os.remove(out_path) when synthesis fails
(catch Exception as e) to ensure the temporary file is deleted on error while
leaving the normal success path intact (IInstance.py still removes temp_path
after successful synthesize); reference mkstemp, out_path, runtime_cfg, and
self._engine.synthesize in your change.

---

Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 149-175: The _cleanup_stale_outputs method is being called on the
per-utterance hot path and scans the whole system temp dir each time; change it
to run only infrequently by adding a rate-limit check (e.g., store a timestamp
in an instance attribute like self._last_cleanup_ts or a small marker file in
tempfile.gettempdir()) and return immediately if the last run was within a
configured interval (e.g., temp_output_cleanup_interval_sec); keep the existing
logic for deletion unchanged but only execute it when the rate-limit allows, and
expose the interval via the same config keys read by _cleanup_stale_outputs.
- Around line 177-183: validateConfig currently accepts unknown engine ids and
lets TTSEngine.synthesize fail later; update validateConfig to immediately
reject unsupported engine values by checking the resolved engine (from
_engine_from_merged_cfg(cfg)) against the canonical supported list (use the same
source of truth as TTSEngine, e.g., the supported engines constant/enum or a
helper like TTSEngine.SUPPORTED_ENGINES or a shared _supported_engines list). If
the engine is not in that set, raise an Exception with a clear message
mentioning the invalid engine and the allowed engine ids; keep the existing
api_key checks for 'elevenlabs' and 'openai' afterwards.

In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 73-90: The code currently reads the entire temp_path into raw even
when only audio output is requested; change the logic so that when want_audio is
true you open temp_path and stream it in chunks to self.instance.writeAudio
between AVI_ACTION.BEGIN and AVI_ACTION.END (calling writeAudio with each chunk
and the mime from payload.get('mime_type')), and only read and base64-encode the
full file when want_text is true (i.e., buffer into base64 only if want_text is
set so you can build text_payload and call self.instance.writeText); ensure you
still call BEGIN before the first WRITE and END after the last chunk and avoid
allocating raw when want_text is false.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4494d525-b495-4f60-a997-fd0f1e68d293

📥 Commits

Reviewing files that changed from the base of the PR and between d450fc7 and 812bc02.

📒 Files selected for processing (3)
  • nodes/src/nodes/audio_tts/IGlobal.py
  • nodes/src/nodes/audio_tts/IInstance.py
  • nodes/src/nodes/audio_tts/tts_engine.py

Comment thread nodes/src/nodes/audio_tts/IGlobal.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
nodes/src/nodes/audio_tts/IGlobal.py (2)

271-279: ⚠️ Potential issue | 🟠 Major

Delete the reserved temp file on synthesis failures.

mkstemp() claims out_path before engine execution. If self._engine.synthesize(text) raises, the reserved tts_* file is left behind until the periodic stale-file sweep runs.

🧹 Minimal fix
         runtime_cfg = dict(self._config)
         runtime_cfg['output_path'] = out_path
         runtime_cfg['output_format'] = ext
         self._engine.config = runtime_cfg

-        result = self._engine.synthesize(text)
+        try:
+            result = self._engine.synthesize(text)
+        except Exception:
+            try:
+                os.remove(out_path)
+            except OSError:
+                pass
+            raise
         path = result['path']
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 271 - 279, The code
currently reserves a temp file with tempfile.mkstemp (fd, out_path) then closes
fd and calls self._engine.synthesize(text), but if synthesize raises the
reserved tts_* file is leaked; wrap the synthesize call in a try/except (or
try/finally) around self._engine.synthesize(text) and on exception
unlink(out_path) to remove the reserved file before re-raising the error; keep
the existing tempfile creation and fd close, and reference the symbols
tempfile.mkstemp, out_path, and self._engine.synthesize so you update that block
to delete the reserved file on synthesis failures.

189-196: ⚠️ Potential issue | 🟡 Minor

Reject unsupported engine ids in validateConfig().

_engine_from_merged_cfg() returns any non-empty engine verbatim, so a typo passes config validation and only fails later in TTSEngine.synthesize(). Fail fast here instead of deferring the error to runtime.

🛡️ Minimal fix
     raw, cfg = self._resolve_merged_config()
     engine = self._engine_from_merged_cfg(cfg)
+    if engine not in {'piper', 'kokoro', 'bark', 'bak', 'openai', 'elevenlabs'}:
+        raise Exception(f'Unsupported TTS engine "{engine}"')
     api_key = self._resolve_cloud_api_key(cfg, raw, engine)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 189 - 196, validateConfig
currently accepts any non-empty engine from _engine_from_merged_cfg and defers
typos to TTSEngine.synthesize; update validateConfig so after calling
_engine_from_merged_cfg(cfg) it checks that engine is one of the supported
engine ids (e.g. the set used by your runtime like 'elevenlabs' and 'openai' —
replace with the canonical list if different) and immediately raise an Exception
listing supported engines when engine is present but invalid; keep the existing
api_key logic (_resolve_cloud_api_key, env_hint) but perform the engine
membership check first to fail fast on bad engine ids.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/audio_tts/config_resolver.py`:
- Around line 49-69: _pick_api_key currently returns a top-level api_key before
checking the active profile, which promotes stale credentials; modify
_pick_api_key so it first checks profile-specific entries (use
raw.get('profile') and then pick(raw.get(profile)) and
pick(raw.get(profile.split('-',1)[0])) if profile contains '-') and then the
fallback vendor keys (loop over 'openai' and 'elevenlabs'), only using pick(raw)
or the legacy top-level pick(raw.get('api_key')) after those profile-specific
checks so resolve_merged_config will prefer the active profile's credential.
- Around line 107-129: The helper currently bails out when no profile is
provided, allowing payloads to override the engine; instead, after obtaining
sdef via getServiceDefinition(logical_type) and pre = sdef.get('preconfig') use
pre.get('default') as the effective profile when profile is missing/empty
(falling back to the existing profile resolution from raw/cfg), then look up
prof = (pre.get('profiles') or {}).get(profile) and, if found, set merged =
_as_dict(cfg) and merged['engine'] = prof.get('engine') before returning merged
so the engine is locked to the resolved default profile.

In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 32-44: The audio lane is currently forwarding container file bytes
(WAV/MP3) instead of raw PCM samples; update the flow so when
_infer_output_format(...) returns 'wav' or 'mp3' and want_audio is True, the
bytes read in writeText() are decoded into raw PCM samples before calling
writeAudio(). Specifically, in the code paths around writeText() and
writeAudio() (and the related logic at lines ~74-83), detect the format from
_infer_output_format(engine, ...), decode WAV by parsing the RIFF headers to
extract PCM frames (or use a WAV reader) and decode MP3 via an MP3
decoder/ffmpeg/pydub to PCM, then forward the PCM buffer (and correct sample
rate/channel metadata) to writeAudio() instead of the serialized file bytes.
Ensure metadata (sample rate, channels, sample width) is preserved/translated
for downstream consumers.

In `@nodes/src/nodes/audio_tts/tts_engine.py`:
- Around line 453-497: The ElevenLabs HTTP branch (requests.post in
tts_engine.py) and the model-server branch
(self._elevenlabs_remote_client.send_command) ignore the configured
output_format: include the output_format value (from output_format =
str(self.config.get('output_format', 'mp3')).lower()) in both requests so the
server returns the requested format and the MIME matches; specifically add an
"output_format" field to the payload passed to requests.post (instead of
hardcoding Accept: 'audio/mpeg') and add the same "output_format" into the body
passed to _elevenlabs_remote_client.send_command, and ensure headers/Accept are
set appropriately based on output_format so _mime_from_format(output_format)
remains correct when returning the response.

In `@packages/ai/src/ai/common/models/audio/cloud_tts_loader.py`:
- Around line 64-69: The bundle dict in cloud_tts_loader.py currently includes a
shared '_lock' and the loader code wraps the per-row HTTP request loops with
that lock, which serializes all syntheses; remove the '_lock' entry from the
bundle construction (the dict with 'api_key', 'voice', 'model') and delete any
uses of bundle['_lock'].replace the locked sections in the request loops (the
loops that build a urllib.request.Request per row) with no lock — instead
capture bundle values into local variables (api_key, voice, model_name) at
function start so each request uses fresh local state and runs concurrently; do
not add a global/shared lock back into the bundle.

In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py`:
- Around line 210-213: The code currently fabricates one output row for empty
input by returning [{} for _ in range(batch_size or 1)], which breaks
BaseLoader.postprocess() cardinality; change the branch so when items is falsy
it returns an empty list if batch_size == 0, otherwise return a list of empty
dicts of length batch_size (use batch_size if provided, fallback to 1 only when
you truly need a single placeholder). Update the logic around raw_output/items
in kokoro_loader.py to: if not items: return [] if batch_size == 0 else [{} for
_ in range(batch_size or 1)] so downstream batching and BaseLoader.postprocess()
keep correct cardinality.

---

Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 271-279: The code currently reserves a temp file with
tempfile.mkstemp (fd, out_path) then closes fd and calls
self._engine.synthesize(text), but if synthesize raises the reserved tts_* file
is leaked; wrap the synthesize call in a try/except (or try/finally) around
self._engine.synthesize(text) and on exception unlink(out_path) to remove the
reserved file before re-raising the error; keep the existing tempfile creation
and fd close, and reference the symbols tempfile.mkstemp, out_path, and
self._engine.synthesize so you update that block to delete the reserved file on
synthesis failures.
- Around line 189-196: validateConfig currently accepts any non-empty engine
from _engine_from_merged_cfg and defers typos to TTSEngine.synthesize; update
validateConfig so after calling _engine_from_merged_cfg(cfg) it checks that
engine is one of the supported engine ids (e.g. the set used by your runtime
like 'elevenlabs' and 'openai' — replace with the canonical list if different)
and immediately raise an Exception listing supported engines when engine is
present but invalid; keep the existing api_key logic (_resolve_cloud_api_key,
env_hint) but perform the engine membership check first to fail fast on bad
engine ids.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a692325b-d417-4679-98d9-f60c7be4d45f

📥 Commits

Reviewing files that changed from the base of the PR and between 812bc02 and 30083ea.

📒 Files selected for processing (11)
  • nodes/src/nodes/audio_tts/IGlobal.py
  • nodes/src/nodes/audio_tts/IInstance.py
  • nodes/src/nodes/audio_tts/README.md
  • nodes/src/nodes/audio_tts/__init__.py
  • nodes/src/nodes/audio_tts/config_resolver.py
  • nodes/src/nodes/audio_tts/piper_catalog.py
  • nodes/src/nodes/audio_tts/tts_engine.py
  • packages/ai/src/ai/common/models/audio/cloud_tts_loader.py
  • packages/ai/src/ai/common/models/audio/kokoro_loader.py
  • packages/ai/src/ai/common/models/audio/piper_hub.py
  • packages/ai/src/ai/common/models/audio/piper_loader.py

Comment thread nodes/src/nodes/audio_tts/config_resolver.py Outdated
Comment thread nodes/src/nodes/audio_tts/config_resolver.py Outdated
Comment thread nodes/src/nodes/audio_tts/IInstance.py Outdated
Comment thread nodes/src/nodes/audio_tts/tts_engine.py Outdated
Comment thread packages/ai/src/ai/common/models/audio/cloud_tts_loader.py Outdated
Comment thread packages/ai/src/ai/common/models/audio/kokoro_loader.py
Copy link
Copy Markdown
Collaborator

@asclearuc asclearuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are few commens

Comment thread nodes/src/nodes/audio_tts/services.json
Comment thread packages/ai/src/ai/common/models/base.py
Comment thread nodes/src/nodes/audio_tts/IGlobal.py Outdated
Comment thread packages/ai/src/ai/common/models/audio/requirements_kokoro.txt Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
nodes/src/nodes/audio_tts/IGlobal.py (1)

189-204: ⚠️ Potential issue | 🟡 Minor

Add validation for unsupported engine ids.

The validateConfig method validates API keys and voice presets but doesn't reject unknown engine ids. A typo like 'opnai' passes validation and only fails later inside TTSEngine.synthesize(). Failing early with a clear error message improves UX.

🛡️ Proposed fix
     def validateConfig(self):
         """Validate the node configuration, raising an exception on missing required values."""
         raw, cfg = self._resolve_merged_config()
         engine = self._engine_from_merged_cfg(cfg)
+        supported = {'piper', 'kokoro', 'bark', 'bak', 'openai', 'elevenlabs'}
+        if engine not in supported:
+            raise Exception(f'Unsupported TTS engine "{engine}". Supported: {", ".join(sorted(supported))}')
         api_key = self._resolve_cloud_api_key(cfg, raw, engine)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 189 - 204, The
validateConfig method should reject unknown engine ids early: after computing
engine = self._engine_from_merged_cfg(cfg) in validateConfig, check that engine
is one of the supported values (e.g. 'elevenlabs', 'openai', 'piper', 'kokoro');
if not, raise an Exception with a clear message like 'Unsupported TTS engine
"<engine>"' so typos (e.g. "opnai") fail here instead of later in
TTSEngine.synthesize(); update the check near _engine_from_merged_cfg and
include the engine value in the error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 157-187: In _cleanup_stale_outputs(), remove the redundant second
assignment to now (the initial now = time.time() used to check _last_cleanup_ts
is still valid) and replace the two separate suffix checks
(name.endswith('.wav') or name.endswith('.mp3')) with a single tuple-based call
like name.endswith(('.wav', '.mp3')) to simplify and slightly optimize the temp
file filtering; adjust only those lines inside the _cleanup_stale_outputs
function (references: _cleanup_stale_outputs, _last_cleanup_ts, _read_cfg,
temp_dir, name.endswith).

---

Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 189-204: The validateConfig method should reject unknown engine
ids early: after computing engine = self._engine_from_merged_cfg(cfg) in
validateConfig, check that engine is one of the supported values (e.g.
'elevenlabs', 'openai', 'piper', 'kokoro'); if not, raise an Exception with a
clear message like 'Unsupported TTS engine "<engine>"' so typos (e.g. "opnai")
fail here instead of later in TTSEngine.synthesize(); update the check near
_engine_from_merged_cfg and include the engine value in the error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 67ff56c4-d83d-434a-9e6d-b3d2bcca7634

📥 Commits

Reviewing files that changed from the base of the PR and between 30083ea and aaa5957.

📒 Files selected for processing (1)
  • nodes/src/nodes/audio_tts/IGlobal.py

Comment thread nodes/src/nodes/audio_tts/IGlobal.py
asclearuc
asclearuc previously approved these changes Apr 2, 2026
stepmikhaylov
stepmikhaylov previously approved these changes Apr 2, 2026
@dsapandora dsapandora enabled auto-merge (squash) April 2, 2026 21:49
Copy link
Copy Markdown
Collaborator

@Rod-Christensen Rod-Christensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just seems more complicated than it needs to be...

Comment thread nodes/src/nodes/audio_tts/README.md Outdated
Comment thread nodes/src/nodes/audio_tts/config_resolver.py Outdated
Comment thread nodes/src/nodes/audio_tts/config_resolver.py Outdated
Comment thread nodes/src/nodes/audio_tts/IInstance.py Outdated
Comment thread nodes/src/nodes/audio_tts/tts_engine.py Outdated
@dsapandora dsapandora dismissed stale reviews from stepmikhaylov and asclearuc via b0edaf2 April 2, 2026 23:09
asclearuc
asclearuc previously approved these changes Apr 6, 2026
Copy link
Copy Markdown
Collaborator

@kwit75 kwit75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsapandora thanks for putting this together. This is a substantial change (+2458/−9, 20 files, 5 TTS engines) and there's enough feedback from CodeRabbit and from Rod that I'd like you to address two things before doing another round of reviewer ping-pong.

1. Sync with @Rod-Christensen on the architecture first. Rod's comment — "This just seems more complicated than it needs to be..." — is a leadership signal that the 5-engine abstraction layer may be over-engineered. Before fixing the dozens of CodeRabbit nits, please have a quick sync with Rod on the right shape:

  • Do we actually need 5 engines (piper, kokoro, bark, openai, elevenlabs) in v1, or can we ship with 1-2 and add more in follow-up PRs?
  • Is the _PROFILE_TO_ENGINE indirection necessary, or can it collapse into something simpler?
  • Should the cloud loaders (cloud_tts_loader.py, 285 lines) be a separate node from the local ones?

Reworking the architecture after addressing 21 inline nits is wasted effort — better to align on the shape first.

2. Three 🟠 Major bugs that need to be fixed regardless of architecture decisions. These are real production issues, not stylistic nits:

  • Output filename collision (IGlobal.py ~255-257): millisecond-based output paths are not unique across workers/processes. Two syntheses landing in the same millisecond will clobber each other. Fix: use tempfile.mkstemp() to atomically reserve the path. This also gives you restrictive temp-file permissions for free.

  • Temp file leak on synthesis failure (IGlobal.py ~271-279): once you switch to mkstemp(), you also need to delete the reserved file in the exception path — if self._engine.synthesize(text) raises, the tts_* file stays on disk until the next stale-file sweep.

  • Profile/engine ID normalization (IGlobal.py ~16-22): _PROFILE_TO_ENGINE lookup doesn't canonicalize UI-style ids like kokoro-default, openai-tts, elevenlabs-default. The form panel shows these but _engine_from_merged_cfg() will fail to resolve them. Normalize the profile string (lowercase + strip suffix) before lookup, or expand _PROFILE_TO_ENGINE with all UI variants.

There are also several 🟡 Minor / 🔵 Nit items from CodeRabbit (engine-id validation in validateConfig, audio buffering in IInstance.py bypassing the streaming contract, others) — those can wait until after the architecture sync.

TL;DR: Rod sync first → fix the 3 🟠 Majors → then we'll do another review pass on the rest. Please don't address the nits one-by-one before the architecture is settled.

@dsapandora
Copy link
Copy Markdown
Collaborator Author

dsapandora commented Apr 6, 2026

@kwit75
Thanks for the clear sequencing — agreed on Rod sync first, then majors, then nits.

Architecture (5 engines): We’ll sync with Rod as requested. From our side, the five backends are already wired and behaving well in practice: a single TTSEngine dispatches on engine, and services.json preconfig keeps the UI aligned with one node. We’re not blocked on correctness here.

Three majors:
1–2 Output path collision and temp leak on synthesis failure — already addressed in the current branch: tempfile.mkstemp() reserves a unique path with safe permissions, and the except path removes that file if synthesize raises.
3 Engine / profile id normalization — there is no _PROFILE_TO_ENGINE map in this revision; engine is read from merged config (canonical values from preconfig). I’ve added _normalize_engine_id() so compound/UI-style ids like kokoro-default / openai-tts resolve to the canonical engine name before validation and dispatch.

I’ll hold CodeRabbit nits until after the architecture check-in with @Rod-Christensen .

@nihalnihalani
Copy link
Copy Markdown
Contributor

Senior Review: feat/RR-411 — Text-to-Audio Node

What works well

  • Solid TTS abstraction supporting 5 engines (Piper, Kokoro, Bark, OpenAI, ElevenLabs) with a clean profile-based selection pattern.
  • Thorough PR description with architecture diagram, profile table, and flow explanation — makes the design intent very clear.
  • Follows the existing filter registration pattern (audio_tts:// protocol) and integrates cleanly with IGlobal/IInstance.

Blockers (must fix before merge)

  1. No tests for a 2458-line addition. This is the single biggest issue. At minimum, unit tests are needed for:

    • Config resolution logic (Config.getNodeConfig for audio_tts)
    • Engine selection from profile mapping (especially edge cases like unknown profiles)
    • TTSEngine initialization with different engine configs
    • Error paths (missing API keys, synthesis failures)
  2. PR checklist items are all unchecked — tests, local testing, and ./builder test are marked incomplete. These should be verified before merge.

Should fix

  1. Profile-to-engine mapping does not canonicalize UI-style IDs. The profile IDs in the RJSF form use compound names like kokoro-default, bark-default, openai-tts, elevenlabs-default, but the runtime engine expects canonical values (kokoro, bark, openai, elevenlabs). Confirm this mapping is handled cleanly and doesn't silently fall through on unrecognized profiles.

  2. Temp file cleanup on exception paths. Verify that temporary audio files (WAV/MP3) are cleaned up on all code paths, including exceptions during synthesis. A try/finally or context manager pattern would be safer than manual cleanup.

  3. Cloud API key validation timing. For cloud engines (OpenAI, ElevenLabs), API key validation should fail at beginGlobal() time, not at first synthesis. Failing early gives users a clear error in the node panel instead of a runtime surprise mid-pipeline.

Nice-to-have

  1. The mermaid diagram in the PR body is great documentation — consider including a condensed version in the node's README so future contributors understand the config flow without reading the PR.

Overall this is a well-designed feature with a clear architecture. The main gap is test coverage — once tests are added, this should be in good shape.

@nihalnihalani
Copy link
Copy Markdown
Contributor

🚀 Merge Request

Solid TTS implementation supporting 5 engines. CI is green, no merge conflicts.

Before merge, please address:

  • Add unit tests (2458 lines with no test coverage)
  • Canonicalize profile-to-engine mapping (e.g., "kokoro-default" → "kokoro")
  • Ensure temp file cleanup on all code paths (including exceptions)
  • Fail cloud API key validation at beginGlobal(), not at synthesis time

Once these are addressed, this is ready to merge. Great work @dsapandora!

@dsapandora dsapandora force-pushed the feat/RR-411-add-text-to-audio-node branch from 3d68114 to 15fe82e Compare April 8, 2026 14:29
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2026

No description provided.

@dsapandora
Copy link
Copy Markdown
Collaborator Author

🚀 Merge Request

Solid TTS implementation supporting 5 engines. CI is green, no merge conflicts.

Before merge, please address:

  • Add unit tests (2458 lines with no test coverage)
  • Canonicalize profile-to-engine mapping (e.g., "kokoro-default" → "kokoro")
  • Ensure temp file cleanup on all code paths (including exceptions)
  • Fail cloud API key validation at beginGlobal(), not at synthesis time

Once these are addressed, this is ready to merge. Great work @dsapandora!

Thanks @nihalnihalani for the comments, I followed that and also included some tests, but because this node downloads models that can take time to download, I included it inside the skip_node list

@dsapandora
Copy link
Copy Markdown
Collaborator Author

Changes requested — ## PR #583 — Text-to-Speech Node Review

This is a genuinely useful addition to the pipeline ecosystem. TTS is a natural complement to the existing audio_transcribe node, and supporting 5 engines (3 local, 2 cloud) gives users real flexibility. The architecture is well thought out. That said, I have some concerns worth addressing before this merges.

What I Like

Engine abstraction is clean. The TTSEngine in tts_engine.py as a unified interface over Piper/Kokoro/Bark/OpenAI/ElevenLabs is the right call. Hot-swapping engines on config change via IGlobal.synthesize() is a nice touch — avoids tearing down and rebuilding the whole node when a user tweaks the profile.

Temp file cleanup is handled properly. The finally block in IInstance deleting temp files is exactly what you want here. The periodic stale-file cleanup in IGlobal (≤ every 5 min) is a good safety net for crash scenarios. This is the kind of operational detail that often gets missed in node PRs.

Local-first default (piper) is the right call. Not requiring an API key out of the box lowers the barrier to entry significantly.

Concerns

Dependency weight is significant. This node pulls in piper-tts, kokoro, bark (which drags in transformers, torch, etc.), openai, and elevenlabs. That's a heavy requirements.txt. The existing pattern for heavy-dependency nodes (e.g., llm_openai) uses depends() for lazy loading — I need to verify IGlobal.beginGlobal() is actually deferring these imports and not loading all 5 engines at startup regardless of which profile is selected. If Bark's transformers/torch stack loads even when the user picked piper, that's a serious startup time regression for the whole node runtime.

No test coverage for cloud engines. The test files (test_iglobal.py, test_iinstance.py, test_tts_engine.py) are present, which is good. But cloud engine tests (OpenAI, ElevenLabs) are almost certainly mocked or skipped. That's acceptable for CI, but I'd want to see at least contract-level tests verifying the request shape sent to each API — similar to how rerank_cohere (PR #499) did contract tests without live API calls.

piper_catalog.py and HF Hub downloads. Piper voices are downloaded from HF Hub on first use. What's the caching strategy? Is this going into a deterministic path that survives container restarts, or is it re-downloading on every cold start? The PR description says "downloaded and cached on first use" but I don't see explicit cache directory configuration in services.json. This needs to be configurable or at minimum documented.

Audio output lane typing. The output is described as "WAV or MP3 container bytes" depending on engine. ElevenLabs returns MP3 directly, others return WAV. Downstream nodes consuming the audio lane need to know which format they're getting. Is there MIME type metadata attached to the audio output? The audio_player node presumably needs to handle both — has that been verified?

kokoro_loader.py and spacy_en_model.py in packages/ai/src/ai/common/models/audio/ — these are shared package additions, not just node-local files. Changes to packages/ai have broader blast radius than changes under nodes/src/nodes/. The spacy_en_model.py in particular suggests a spaCy model download is happening somewhere in the Kokoro path. This needs explicit documentation and should probably be gated behind the Kokoro profile check.

wav_to_mp3.py — this implies ffmpeg or pydub/lameenc as a dependency. What's the actual dependency here? If it's a system binary (ffmpeg), that needs to be documented as a system requirement, not just a pip dependency.

Minor

Verdict

The feature is valuable and the architecture is sound, but I want answers on: (1) lazy loading of heavy deps, (2) audio format metadata on output, (3) HF Hub cache configurability, and (4) the wav_to_mp3 system dependency situation. These aren't blockers if they're already handled correctly in the code — I just can't confirm from the PR description alone. Requesting changes until those are clarified.

@dsapandora
Copy link
Copy Markdown
Collaborator Author

I am going to recreate this PR splitting the engines to make it easy to review

@dsapandora dsapandora closed this Apr 13, 2026
auto-merge was automatically disabled April 13, 2026 12:35

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation module:ai AI/ML modules module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add text-to-speech node to pipeline library

6 participants