Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new Audio TTS node with IGlobal/IInstance, a multi-backend TTSEngine, config resolution and cloud API key handling, model-server loaders and Piper hub helpers, requirements and service registration, and per-utterance temp-file lifecycle and cleanup. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant IInstance as IInstance
participant IGlobal as IGlobal
participant Config as ConfigResolver
participant TTSEngine as TTSEngine
participant Local as LocalBackend
participant Cloud as CloudAPI
Client->>IInstance: writeText(text)
activate IInstance
IInstance->>IGlobal: _build_tts_config / synthesize(text, out_fmt)
activate IGlobal
IGlobal->>Config: resolve_merged_config()
Config-->>IGlobal: merged config
IGlobal->>IGlobal: compute identity signature
alt signature or engine changed
IGlobal->>TTSEngine: dispose existing engine
IGlobal->>TTSEngine: instantiate new engine(config)
end
IGlobal->>TTSEngine: synthesize(text)
activate TTSEngine
alt local backend
TTSEngine->>Local: load pipeline/voice & synth
Local-->>TTSEngine: audio bytes
else cloud backend
TTSEngine->>Cloud: HTTP POST (api_key, model, voice)
Cloud-->>TTSEngine: audio bytes
end
TTSEngine-->>IGlobal: {path, mime_type}
deactivate TTSEngine
IGlobal-->>IInstance: {path, mime_type}
deactivate IGlobal
IInstance->>IInstance: read temp file bytes
opt audio lane connected
IInstance->>Client: stream audio (BEGIN/WRITE/END)
end
opt text lane connected
IInstance->>Client: emit JSON {mime_type, base64}
end
IInstance->>IInstance: delete temp file
deactivate IInstance
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 16-22: The profile/engine mapping and resolution must canonicalize
UI-style ids (e.g., "kokoro-default", "openai-tts", "elevenlabs-default") to
their base engine names before lookup; update the logic that uses
_PROFILE_TO_ENGINE and the function _engine_from_merged_cfg to normalize the
value returned by resolve_merged_config() (strip known suffixes like "-default",
"-tts" and/or map alias forms) so lookups always use bare engine ids
("kokoro","bark","openai","elevenlabs","piper"); modify both the profile mapping
declaration and the _engine_from_merged_cfg path that reads
merged_cfg["profile"] or merged_cfg["engine"] to run the canonicalization helper
first and then return the mapped engine name to TTSEngine.
- Around line 237-239: The current filename generation in IGlobal.py (variables
filename/out_path) races under burst traffic; reserve the path using an
exclusive temp-file API before returning it to TTSEngine. Replace the time-based
filename with a call to tempfile.mkstemp or
tempfile.NamedTemporaryFile(delete=False) to atomically create/claim the file
(use suffix='.'+ext and a suitable prefix like 'tts_'), close the returned file
descriptor if using mkstemp, and pass the resulting secure path to TTSEngine;
also ensure the chosen extension logic (ext = output_format if output_format in
('wav','mp3') else 'wav') is preserved so the temp file has the correct suffix.
Ensure IInstance.py can still delete the file later.
- Around line 194-207: The engine identity tuples for OpenAI/ElevenLabs must
include credential changes so cached clients are invalidated: modify the return
values in the block handling e == 'openai' and e == 'elevenlabs' to also include
the API key (e.g., str(cfg.get('api_key','') or '').strip()) as part of the
tuple alongside model, voice and the use_model_server flag so TTSEngine's client
cache sees credential rotations and rebuilds the remote client.
- Around line 218-246: synthesize currently mutates shared state (_config,
_engine, _engine.config) without synchronization; wrap the engine lifecycle and
config mutation in a lock (e.g., add a threading.RLock on the instance and
acquire it while deciding need_engine, disposing/creating engines, and assigning
self._config/self._engine), capture the current engine in a local variable
(inst) and call dispose on that captured instance instead of self._engine, and
avoid overwriting shared engine.config during synthesis by passing a runtime_cfg
to the engine's synthesize call or using a cloned/per-request config object so
the engine can perform synthesize using the per-request runtime_cfg without
mutating shared state (referencing synthesize, _tts_identity_signature, _engine,
_config, TTSEngine, and _cleanup_stale_outputs).
In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 55-72: The current code always reads the entire temp_path into raw
even when only want_audio is True; change it so you only read-and-buffer the
full file when want_text is True (needed for base64), and for want_audio (when
text is not connected) open temp_path and stream it in chunks calling
self.instance.writeAudio(AVI_ACTION.BEGIN, mime), then multiple
self.instance.writeAudio(AVI_ACTION.WRITE, mime, chunk) per chunk, and finally
self.instance.writeAudio(AVI_ACTION.END, mime); keep using
payload.get('mime_type') for mime and only build text_payload/base64 when
want_text is True so you avoid double-buffering and enable true streaming via
the existing AVI_ACTION and self.instance.writeAudio/writeText calls.
In `@nodes/src/nodes/audio_tts/README.md`:
- Line 65: Remove the user-facing mention of kokoro_lang_code from the README
section and instead document only audio_tts.kokoro_voice as the user-settable
field; note that kokoro_lang_code is derived server-side from kokoro_voice (do
not present it as configurable). Update the line referencing "**Kokoro:**
**`kokoro_lang_code`**, **`kokoro_voice`**" to only mention "**Kokoro:**
**`kokoro_voice`** — must match Kokoro’s voice table" and add a short
parenthetical indicating that kokoro_lang_code is computed from kokoro_voice in
the Python backend.
In `@nodes/src/nodes/audio_tts/services.json`:
- Around line 280-295: Update the "test" matrix to include the default "piper"
profile and add a smoke assertion for the promised payload shape: modify the
"profiles" array in the "test" object to include "piper" and extend the "cases"
-> "expect" for the response to verify the presence of "mime_type" and a
base64-encoded audio payload (e.g., check that "audio" has keys "mime_type" and
"base64" and that "base64" is not empty) instead of only asserting non-empty
text so regressions in the Piper backend payload shape are caught; locate the
"test" entry, the "profiles" array, and the "cases" expectation objects to make
these changes.
In `@nodes/src/nodes/audio_tts/tts_engine.py`:
- Around line 119-132: The temporary WAV created with tempfile.mkstemp() in TTS
engine must be unconditionally removed even if writing or transcoding fails;
wrap the creation/use/removal of wav_path in a try/finally (or ensure finally
cleanup) so write_piper_wav(...) and _transcode_wav_to_mp3(...) errors do not
leak tmp*.wav files—apply the same pattern inside _piper() around
write_piper_wav(...) + _transcode_wav_to_mp3(...), ensuring the temporary file
created by mkstemp() (wav_path) is removed in the finally block and any OSError
on removal is swallowed/logged.
In `@packages/ai/src/ai/common/models/audio/cloud_tts_loader.py`:
- Around line 163-165: The postprocess() implementations currently return a
fabricated single-item batch by using "batch_size or 1" when raw_output has no
items; change that to preserve actual batch cardinality by using "batch_size or
0" (i.e., return [{} for _ in range(batch_size or 0)]) so an empty input batch
yields an empty list. Apply this exact change to both postprocess() occurrences
in this file (the two postprocess methods in cloud_tts_loader.py).
In `@packages/ai/src/ai/common/models/audio/requirements_kokoro.txt`:
- Around line 4-5: The hardcoded wheel URL for en_core_web_sm-3.8.0 in
requirements_kokoro.txt creates a maintenance risk; update the file to either
(a) document the coupling to spaCy and the reason for pinning this exact wheel
URL (referencing the en_core_web_sm wheel URL and the en_core_web_sm model name)
or (b) add a fallback mechanism: attempt to install the pinned wheel first and
if that fails, fall back to using spaCy's model install (spacy.cli.download)
wrapped in safe error handling so it doesn't sys.exit, ensuring code paths
referencing en_core_web_sm handle both cases; include a short comment explaining
why the fallback is needed and how to update the wheel version when upgrading
spaCy.
In `@packages/ai/src/ai/common/models/audio/wav_to_mp3.py`:
- Around line 29-30: The error message raised when invalid channel count is
detected uses an en dash character; update the string in the ValueError raised
alongside the nchannels check to use a regular hyphen-minus ("-") instead of the
en dash so searches/grepping and linters (RUF001) work correctly; locate the
conditional that checks nchannels and change the message text in the raise
ValueError call accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7790af72-2a98-4747-ab45-48f9a7e0825a
📒 Files selected for processing (20)
nodes/src/nodes/audio_tts/IGlobal.pynodes/src/nodes/audio_tts/IInstance.pynodes/src/nodes/audio_tts/README.mdnodes/src/nodes/audio_tts/__init__.pynodes/src/nodes/audio_tts/config_resolver.pynodes/src/nodes/audio_tts/piper_catalog.pynodes/src/nodes/audio_tts/requirements.txtnodes/src/nodes/audio_tts/services.jsonnodes/src/nodes/audio_tts/tts_engine.pypackages/ai/src/ai/common/models/__init__.pypackages/ai/src/ai/common/models/audio/__init__.pypackages/ai/src/ai/common/models/audio/cloud_tts_loader.pypackages/ai/src/ai/common/models/audio/kokoro_loader.pypackages/ai/src/ai/common/models/audio/piper_hub.pypackages/ai/src/ai/common/models/audio/piper_loader.pypackages/ai/src/ai/common/models/audio/piper_native.pypackages/ai/src/ai/common/models/audio/requirements_kokoro.txtpackages/ai/src/ai/common/models/audio/requirements_piper.txtpackages/ai/src/ai/common/models/audio/wav_to_mp3.pypackages/ai/src/ai/common/models/base.py
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
nodes/src/nodes/audio_tts/IGlobal.py (1)
255-257:⚠️ Potential issue | 🟠 MajorReserve the output file atomically.
The millisecond-based name is not unique across workers/processes, and
TTSEnginewrites straight to whateveroutput_pathit receives. Two syntheses landing in the same millisecond can clobber each other;mkstemp()also gives you the restrictive temp-file permissions that manual path construction skips.🗂️ Possible fix
- filename = f'tts_{int(time.time() * 1000)}.{ext}' - out_path = os.path.join(tempfile.gettempdir(), filename) + fd, out_path = tempfile.mkstemp(prefix='tts_', suffix=f'.{ext}') + os.close(fd)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 255 - 257, Replace the millisecond-based filename/tempdir construction with an atomic temp-file reservation using tempfile.mkstemp so concurrent syntheses cannot clobber each other: determine ext as you already do (ext from output_format), call tempfile.mkstemp(suffix=f'.{ext}') to get (fd, path), close the fd (os.close(fd)) to allow TTSEngine to open/write the path, and use that path in place of out_path; this preserves mkstemp's restrictive permissions and guarantees a unique, reserved file for the TTS output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 149-176: _cleanup_stale_outputs currently does an O(n) walk of the
system temp dir on every call and synthesize() invokes it per-utterance; move
this off the per-request path by either (A) running _cleanup_stale_outputs in a
single background/daemon thread started once (e.g., in the node/class
initializer) that sleeps or schedules runs at a configured interval, or (B) add
a simple rate-limit guard in synthesize() using a timestamp property like
self._last_cleanup and only call _cleanup_stale_outputs when now -
self._last_cleanup > cleanup_interval_sec; update references to
_cleanup_stale_outputs and synthesize accordingly so cleanup is no longer
performed synchronously per request and ensure thread-safety when accessing any
shared state.
- Around line 177-191: The validateConfig method currently uses
_engine_from_merged_cfg() but doesn't reject unknown engine ids; update
validateConfig to check that the resolved engine is one of the supported engines
(e.g. 'elevenlabs','openai','piper','kokoro' or a central list if available) and
raise a clear Exception if not supported (message like `Unknown TTS engine
"<engine>"`); reference validateConfig and _engine_from_merged_cfg (or
TTSEngine.synthesize where failure currently surfaces) and perform the
membership check before proceeding with api_key/voice-specific validations.
---
Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 255-257: Replace the millisecond-based filename/tempdir
construction with an atomic temp-file reservation using tempfile.mkstemp so
concurrent syntheses cannot clobber each other: determine ext as you already do
(ext from output_format), call tempfile.mkstemp(suffix=f'.{ext}') to get (fd,
path), close the fd (os.close(fd)) to allow TTSEngine to open/write the path,
and use that path in place of out_path; this preserves mkstemp's restrictive
permissions and guarantees a unique, reserved file for the TTS output.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 58cc5610-4a9e-4073-8e58-a3b6b7d02e34
📒 Files selected for processing (1)
nodes/src/nodes/audio_tts/IGlobal.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
nodes/src/nodes/audio_tts/IInstance.py (1)
73-90: 🧹 Nitpick | 🔵 TrivialStream audio-only output instead of buffering the whole file.
This still reads the full artifact into
raweven when only theaudiolane is connected. For long utterances that doubles peak memory and bypasses the existing BEGIN/WRITE/END streaming contract; only buffer when thetextlane needs base64.♻️ Possible refactor
- raw: bytes | None = None - if want_audio or want_text: - with open(temp_path, 'rb') as fin: - raw = fin.read() + raw: bytes | None = None + if want_text: + with open(temp_path, 'rb') as fin: + raw = fin.read() - if want_audio and raw is not None: + if want_audio and not want_text: + mime = payload.get('mime_type', 'audio/wav') + self.instance.writeAudio(AVI_ACTION.BEGIN, mime) + with open(temp_path, 'rb') as fin: + while chunk := fin.read(64 * 1024): + self.instance.writeAudio(AVI_ACTION.WRITE, mime, chunk) + self.instance.writeAudio(AVI_ACTION.END, mime) + elif want_audio and raw is not None: mime = payload.get('mime_type', 'audio/wav') self.instance.writeAudio(AVI_ACTION.BEGIN, mime) self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw) self.instance.writeAudio(AVI_ACTION.END, mime)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IInstance.py` around lines 73 - 90, The code currently reads the entire temp_path into raw even when only audio output is requested; change the logic so that when want_audio is true you open temp_path and stream it in chunks to self.instance.writeAudio between AVI_ACTION.BEGIN and AVI_ACTION.END (calling writeAudio with each chunk and the mime from payload.get('mime_type')), and only read and base64-encode the full file when want_text is true (i.e., buffer into base64 only if want_text is set so you can build text_payload and call self.instance.writeText); ensure you still call BEGIN before the first WRITE and END after the last chunk and avoid allocating raw when want_text is false.nodes/src/nodes/audio_tts/IGlobal.py (2)
149-175:⚠️ Potential issue | 🟠 MajorKeep stale-temp cleanup off the per-utterance hot path.
This still walks the entire system temp directory on every synthesis call. On a busy or shared temp dir that adds blocking O(n) filesystem work to each utterance, even when nothing is old enough to delete.
♻️ Possible fix
def _cleanup_stale_outputs(self): + now = time.time() + last_cleanup = float(getattr(self, '_last_temp_cleanup_at', 0.0) or 0.0) + if now - last_cleanup < 300: + return + self._last_temp_cleanup_at = now + # Keep current output behavior but prevent unbounded temp directory growth. if not self._read_cfg(self._config, 'cleanup_temp_outputs', True): return max_age_sec = int(self._read_cfg(self._config, 'temp_output_max_age_sec', 3600)) if max_age_sec <= 0: return - now = time.time() temp_dir = tempfile.gettempdir()Also applies to: 254-254
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 149 - 175, The _cleanup_stale_outputs method is being called on the per-utterance hot path and scans the whole system temp dir each time; change it to run only infrequently by adding a rate-limit check (e.g., store a timestamp in an instance attribute like self._last_cleanup_ts or a small marker file in tempfile.gettempdir()) and return immediately if the last run was within a configured interval (e.g., temp_output_cleanup_interval_sec); keep the existing logic for deletion unchanged but only execute it when the rate-limit allows, and expose the interval via the same config keys read by _cleanup_stale_outputs.
177-183:⚠️ Potential issue | 🟡 MinorReject unsupported engine ids during validation.
A typo in
enginecurrently passesvalidateConfig()and only fails later insideTTSEngine.synthesize(). Failing fast here gives users a config-time error instead of a runtime pipeline failure.🛡️ Possible fix
def validateConfig(self): raw, cfg = self._resolve_merged_config() engine = self._engine_from_merged_cfg(cfg) + if engine not in {'piper', 'kokoro', 'bark', 'bak', 'openai', 'elevenlabs'}: + raise Exception(f'Unsupported TTS engine "{engine}"') api_key = self._resolve_cloud_api_key(cfg, raw, engine)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 177 - 183, validateConfig currently accepts unknown engine ids and lets TTSEngine.synthesize fail later; update validateConfig to immediately reject unsupported engine values by checking the resolved engine (from _engine_from_merged_cfg(cfg)) against the canonical supported list (use the same source of truth as TTSEngine, e.g., the supported engines constant/enum or a helper like TTSEngine.SUPPORTED_ENGINES or a shared _supported_engines list). If the engine is not in that set, raise an Exception with a clear message mentioning the invalid engine and the allowed engine ids; keep the existing api_key checks for 'elevenlabs' and 'openai' afterwards.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 256-264: mkstemp reserves out_path but if
self._engine.synthesize(text) raises, the reserved tts_* file is left behind;
wrap the synthesize call in a try/except (or try/finally) around the block that
creates runtime_cfg/out_path and call os.remove(out_path) when synthesis fails
(catch Exception as e) to ensure the temporary file is deleted on error while
leaving the normal success path intact (IInstance.py still removes temp_path
after successful synthesize); reference mkstemp, out_path, runtime_cfg, and
self._engine.synthesize in your change.
---
Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 149-175: The _cleanup_stale_outputs method is being called on the
per-utterance hot path and scans the whole system temp dir each time; change it
to run only infrequently by adding a rate-limit check (e.g., store a timestamp
in an instance attribute like self._last_cleanup_ts or a small marker file in
tempfile.gettempdir()) and return immediately if the last run was within a
configured interval (e.g., temp_output_cleanup_interval_sec); keep the existing
logic for deletion unchanged but only execute it when the rate-limit allows, and
expose the interval via the same config keys read by _cleanup_stale_outputs.
- Around line 177-183: validateConfig currently accepts unknown engine ids and
lets TTSEngine.synthesize fail later; update validateConfig to immediately
reject unsupported engine values by checking the resolved engine (from
_engine_from_merged_cfg(cfg)) against the canonical supported list (use the same
source of truth as TTSEngine, e.g., the supported engines constant/enum or a
helper like TTSEngine.SUPPORTED_ENGINES or a shared _supported_engines list). If
the engine is not in that set, raise an Exception with a clear message
mentioning the invalid engine and the allowed engine ids; keep the existing
api_key checks for 'elevenlabs' and 'openai' afterwards.
In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 73-90: The code currently reads the entire temp_path into raw even
when only audio output is requested; change the logic so that when want_audio is
true you open temp_path and stream it in chunks to self.instance.writeAudio
between AVI_ACTION.BEGIN and AVI_ACTION.END (calling writeAudio with each chunk
and the mime from payload.get('mime_type')), and only read and base64-encode the
full file when want_text is true (i.e., buffer into base64 only if want_text is
set so you can build text_payload and call self.instance.writeText); ensure you
still call BEGIN before the first WRITE and END after the last chunk and avoid
allocating raw when want_text is false.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4494d525-b495-4f60-a997-fd0f1e68d293
📒 Files selected for processing (3)
nodes/src/nodes/audio_tts/IGlobal.pynodes/src/nodes/audio_tts/IInstance.pynodes/src/nodes/audio_tts/tts_engine.py
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (2)
nodes/src/nodes/audio_tts/IGlobal.py (2)
271-279:⚠️ Potential issue | 🟠 MajorDelete the reserved temp file on synthesis failures.
mkstemp()claimsout_pathbefore engine execution. Ifself._engine.synthesize(text)raises, the reservedtts_*file is left behind until the periodic stale-file sweep runs.🧹 Minimal fix
runtime_cfg = dict(self._config) runtime_cfg['output_path'] = out_path runtime_cfg['output_format'] = ext self._engine.config = runtime_cfg - result = self._engine.synthesize(text) + try: + result = self._engine.synthesize(text) + except Exception: + try: + os.remove(out_path) + except OSError: + pass + raise path = result['path']🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 271 - 279, The code currently reserves a temp file with tempfile.mkstemp (fd, out_path) then closes fd and calls self._engine.synthesize(text), but if synthesize raises the reserved tts_* file is leaked; wrap the synthesize call in a try/except (or try/finally) around self._engine.synthesize(text) and on exception unlink(out_path) to remove the reserved file before re-raising the error; keep the existing tempfile creation and fd close, and reference the symbols tempfile.mkstemp, out_path, and self._engine.synthesize so you update that block to delete the reserved file on synthesis failures.
189-196:⚠️ Potential issue | 🟡 MinorReject unsupported engine ids in
validateConfig().
_engine_from_merged_cfg()returns any non-emptyengineverbatim, so a typo passes config validation and only fails later inTTSEngine.synthesize(). Fail fast here instead of deferring the error to runtime.🛡️ Minimal fix
raw, cfg = self._resolve_merged_config() engine = self._engine_from_merged_cfg(cfg) + if engine not in {'piper', 'kokoro', 'bark', 'bak', 'openai', 'elevenlabs'}: + raise Exception(f'Unsupported TTS engine "{engine}"') api_key = self._resolve_cloud_api_key(cfg, raw, engine)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 189 - 196, validateConfig currently accepts any non-empty engine from _engine_from_merged_cfg and defers typos to TTSEngine.synthesize; update validateConfig so after calling _engine_from_merged_cfg(cfg) it checks that engine is one of the supported engine ids (e.g. the set used by your runtime like 'elevenlabs' and 'openai' — replace with the canonical list if different) and immediately raise an Exception listing supported engines when engine is present but invalid; keep the existing api_key logic (_resolve_cloud_api_key, env_hint) but perform the engine membership check first to fail fast on bad engine ids.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/audio_tts/config_resolver.py`:
- Around line 49-69: _pick_api_key currently returns a top-level api_key before
checking the active profile, which promotes stale credentials; modify
_pick_api_key so it first checks profile-specific entries (use
raw.get('profile') and then pick(raw.get(profile)) and
pick(raw.get(profile.split('-',1)[0])) if profile contains '-') and then the
fallback vendor keys (loop over 'openai' and 'elevenlabs'), only using pick(raw)
or the legacy top-level pick(raw.get('api_key')) after those profile-specific
checks so resolve_merged_config will prefer the active profile's credential.
- Around line 107-129: The helper currently bails out when no profile is
provided, allowing payloads to override the engine; instead, after obtaining
sdef via getServiceDefinition(logical_type) and pre = sdef.get('preconfig') use
pre.get('default') as the effective profile when profile is missing/empty
(falling back to the existing profile resolution from raw/cfg), then look up
prof = (pre.get('profiles') or {}).get(profile) and, if found, set merged =
_as_dict(cfg) and merged['engine'] = prof.get('engine') before returning merged
so the engine is locked to the resolved default profile.
In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 32-44: The audio lane is currently forwarding container file bytes
(WAV/MP3) instead of raw PCM samples; update the flow so when
_infer_output_format(...) returns 'wav' or 'mp3' and want_audio is True, the
bytes read in writeText() are decoded into raw PCM samples before calling
writeAudio(). Specifically, in the code paths around writeText() and
writeAudio() (and the related logic at lines ~74-83), detect the format from
_infer_output_format(engine, ...), decode WAV by parsing the RIFF headers to
extract PCM frames (or use a WAV reader) and decode MP3 via an MP3
decoder/ffmpeg/pydub to PCM, then forward the PCM buffer (and correct sample
rate/channel metadata) to writeAudio() instead of the serialized file bytes.
Ensure metadata (sample rate, channels, sample width) is preserved/translated
for downstream consumers.
In `@nodes/src/nodes/audio_tts/tts_engine.py`:
- Around line 453-497: The ElevenLabs HTTP branch (requests.post in
tts_engine.py) and the model-server branch
(self._elevenlabs_remote_client.send_command) ignore the configured
output_format: include the output_format value (from output_format =
str(self.config.get('output_format', 'mp3')).lower()) in both requests so the
server returns the requested format and the MIME matches; specifically add an
"output_format" field to the payload passed to requests.post (instead of
hardcoding Accept: 'audio/mpeg') and add the same "output_format" into the body
passed to _elevenlabs_remote_client.send_command, and ensure headers/Accept are
set appropriately based on output_format so _mime_from_format(output_format)
remains correct when returning the response.
In `@packages/ai/src/ai/common/models/audio/cloud_tts_loader.py`:
- Around line 64-69: The bundle dict in cloud_tts_loader.py currently includes a
shared '_lock' and the loader code wraps the per-row HTTP request loops with
that lock, which serializes all syntheses; remove the '_lock' entry from the
bundle construction (the dict with 'api_key', 'voice', 'model') and delete any
uses of bundle['_lock'].replace the locked sections in the request loops (the
loops that build a urllib.request.Request per row) with no lock — instead
capture bundle values into local variables (api_key, voice, model_name) at
function start so each request uses fresh local state and runs concurrently; do
not add a global/shared lock back into the bundle.
In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py`:
- Around line 210-213: The code currently fabricates one output row for empty
input by returning [{} for _ in range(batch_size or 1)], which breaks
BaseLoader.postprocess() cardinality; change the branch so when items is falsy
it returns an empty list if batch_size == 0, otherwise return a list of empty
dicts of length batch_size (use batch_size if provided, fallback to 1 only when
you truly need a single placeholder). Update the logic around raw_output/items
in kokoro_loader.py to: if not items: return [] if batch_size == 0 else [{} for
_ in range(batch_size or 1)] so downstream batching and BaseLoader.postprocess()
keep correct cardinality.
---
Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 271-279: The code currently reserves a temp file with
tempfile.mkstemp (fd, out_path) then closes fd and calls
self._engine.synthesize(text), but if synthesize raises the reserved tts_* file
is leaked; wrap the synthesize call in a try/except (or try/finally) around
self._engine.synthesize(text) and on exception unlink(out_path) to remove the
reserved file before re-raising the error; keep the existing tempfile creation
and fd close, and reference the symbols tempfile.mkstemp, out_path, and
self._engine.synthesize so you update that block to delete the reserved file on
synthesis failures.
- Around line 189-196: validateConfig currently accepts any non-empty engine
from _engine_from_merged_cfg and defers typos to TTSEngine.synthesize; update
validateConfig so after calling _engine_from_merged_cfg(cfg) it checks that
engine is one of the supported engine ids (e.g. the set used by your runtime
like 'elevenlabs' and 'openai' — replace with the canonical list if different)
and immediately raise an Exception listing supported engines when engine is
present but invalid; keep the existing api_key logic (_resolve_cloud_api_key,
env_hint) but perform the engine membership check first to fail fast on bad
engine ids.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a692325b-d417-4679-98d9-f60c7be4d45f
📒 Files selected for processing (11)
nodes/src/nodes/audio_tts/IGlobal.pynodes/src/nodes/audio_tts/IInstance.pynodes/src/nodes/audio_tts/README.mdnodes/src/nodes/audio_tts/__init__.pynodes/src/nodes/audio_tts/config_resolver.pynodes/src/nodes/audio_tts/piper_catalog.pynodes/src/nodes/audio_tts/tts_engine.pypackages/ai/src/ai/common/models/audio/cloud_tts_loader.pypackages/ai/src/ai/common/models/audio/kokoro_loader.pypackages/ai/src/ai/common/models/audio/piper_hub.pypackages/ai/src/ai/common/models/audio/piper_loader.py
asclearuc
left a comment
There was a problem hiding this comment.
There are few commens
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
nodes/src/nodes/audio_tts/IGlobal.py (1)
189-204:⚠️ Potential issue | 🟡 MinorAdd validation for unsupported engine ids.
The
validateConfigmethod validates API keys and voice presets but doesn't reject unknown engine ids. A typo like'opnai'passes validation and only fails later insideTTSEngine.synthesize(). Failing early with a clear error message improves UX.🛡️ Proposed fix
def validateConfig(self): """Validate the node configuration, raising an exception on missing required values.""" raw, cfg = self._resolve_merged_config() engine = self._engine_from_merged_cfg(cfg) + supported = {'piper', 'kokoro', 'bark', 'bak', 'openai', 'elevenlabs'} + if engine not in supported: + raise Exception(f'Unsupported TTS engine "{engine}". Supported: {", ".join(sorted(supported))}') api_key = self._resolve_cloud_api_key(cfg, raw, engine)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 189 - 204, The validateConfig method should reject unknown engine ids early: after computing engine = self._engine_from_merged_cfg(cfg) in validateConfig, check that engine is one of the supported values (e.g. 'elevenlabs', 'openai', 'piper', 'kokoro'); if not, raise an Exception with a clear message like 'Unsupported TTS engine "<engine>"' so typos (e.g. "opnai") fail here instead of later in TTSEngine.synthesize(); update the check near _engine_from_merged_cfg and include the engine value in the error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 157-187: In _cleanup_stale_outputs(), remove the redundant second
assignment to now (the initial now = time.time() used to check _last_cleanup_ts
is still valid) and replace the two separate suffix checks
(name.endswith('.wav') or name.endswith('.mp3')) with a single tuple-based call
like name.endswith(('.wav', '.mp3')) to simplify and slightly optimize the temp
file filtering; adjust only those lines inside the _cleanup_stale_outputs
function (references: _cleanup_stale_outputs, _last_cleanup_ts, _read_cfg,
temp_dir, name.endswith).
---
Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 189-204: The validateConfig method should reject unknown engine
ids early: after computing engine = self._engine_from_merged_cfg(cfg) in
validateConfig, check that engine is one of the supported values (e.g.
'elevenlabs', 'openai', 'piper', 'kokoro'); if not, raise an Exception with a
clear message like 'Unsupported TTS engine "<engine>"' so typos (e.g. "opnai")
fail here instead of later in TTSEngine.synthesize(); update the check near
_engine_from_merged_cfg and include the engine value in the error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 67ff56c4-d83d-434a-9e6d-b3d2bcca7634
📒 Files selected for processing (1)
nodes/src/nodes/audio_tts/IGlobal.py
Rod-Christensen
left a comment
There was a problem hiding this comment.
This just seems more complicated than it needs to be...
kwit75
left a comment
There was a problem hiding this comment.
@dsapandora thanks for putting this together. This is a substantial change (+2458/−9, 20 files, 5 TTS engines) and there's enough feedback from CodeRabbit and from Rod that I'd like you to address two things before doing another round of reviewer ping-pong.
1. Sync with @Rod-Christensen on the architecture first. Rod's comment — "This just seems more complicated than it needs to be..." — is a leadership signal that the 5-engine abstraction layer may be over-engineered. Before fixing the dozens of CodeRabbit nits, please have a quick sync with Rod on the right shape:
- Do we actually need 5 engines (piper, kokoro, bark, openai, elevenlabs) in v1, or can we ship with 1-2 and add more in follow-up PRs?
- Is the
_PROFILE_TO_ENGINEindirection necessary, or can it collapse into something simpler? - Should the cloud loaders (
cloud_tts_loader.py, 285 lines) be a separate node from the local ones?
Reworking the architecture after addressing 21 inline nits is wasted effort — better to align on the shape first.
2. Three 🟠 Major bugs that need to be fixed regardless of architecture decisions. These are real production issues, not stylistic nits:
-
Output filename collision (
IGlobal.py~255-257): millisecond-based output paths are not unique across workers/processes. Two syntheses landing in the same millisecond will clobber each other. Fix: usetempfile.mkstemp()to atomically reserve the path. This also gives you restrictive temp-file permissions for free. -
Temp file leak on synthesis failure (
IGlobal.py~271-279): once you switch tomkstemp(), you also need to delete the reserved file in the exception path — ifself._engine.synthesize(text)raises, thetts_*file stays on disk until the next stale-file sweep. -
Profile/engine ID normalization (
IGlobal.py~16-22):_PROFILE_TO_ENGINElookup doesn't canonicalize UI-style ids likekokoro-default,openai-tts,elevenlabs-default. The form panel shows these but_engine_from_merged_cfg()will fail to resolve them. Normalize the profile string (lowercase + strip suffix) before lookup, or expand_PROFILE_TO_ENGINEwith all UI variants.
There are also several 🟡 Minor / 🔵 Nit items from CodeRabbit (engine-id validation in validateConfig, audio buffering in IInstance.py bypassing the streaming contract, others) — those can wait until after the architecture sync.
TL;DR: Rod sync first → fix the 3 🟠 Majors → then we'll do another review pass on the rest. Please don't address the nits one-by-one before the architecture is settled.
…f _cleanup_stale_outputs: the temp-dir walk now runs at most once every 5 minutes (_CLEANUP_INTERVAL_SEC = 300) regardless of synthesis throughput. The per-request cost is a single time.time() subtraction.
…lf._engine.synthesize(). The exception is re-raised after cleanup so the pipeline error handling is unaffected.
|
@kwit75 Architecture (5 engines): We’ll sync with Rod as requested. From our side, the five backends are already wired and behaving well in practice: a single TTSEngine dispatches on engine, and services.json preconfig keeps the UI aligned with one node. We’re not blocked on correctness here. Three majors: I’ll hold CodeRabbit nits until after the architecture check-in with @Rod-Christensen . |
Senior Review: feat/RR-411 — Text-to-Audio NodeWhat works well
Blockers (must fix before merge)
Should fix
Nice-to-have
Overall this is a well-designed feature with a clear architecture. The main gap is test coverage — once tests are added, this should be in good shape. |
🚀 Merge RequestSolid TTS implementation supporting 5 engines. CI is green, no merge conflicts. Before merge, please address:
Once these are addressed, this is ready to merge. Great work @dsapandora! |
3d68114 to
15fe82e
Compare
|
No description provided. |
Thanks @nihalnihalani for the comments, I followed that and also included some tests, but because this node downloads models that can take time to download, I included it inside the skip_node list |
|
Changes requested — ## PR #583 — Text-to-Speech Node Review This is a genuinely useful addition to the pipeline ecosystem. TTS is a natural complement to the existing What I LikeEngine abstraction is clean. The Temp file cleanup is handled properly. The Local-first default ( ConcernsDependency weight is significant. This node pulls in No test coverage for cloud engines. The test files (
Audio output lane typing. The output is described as "WAV or MP3 container bytes" depending on engine. ElevenLabs returns MP3 directly, others return WAV. Downstream nodes consuming the
Minor
VerdictThe feature is valuable and the architecture is sound, but I want answers on: (1) lazy loading of heavy deps, (2) audio format metadata on output, (3) HF Hub cache configurability, and (4) the |
|
I am going to recreate this PR splitting the engines to make it easy to review |
Summary
New Text-to-Speech (TTS) pipeline node (
audio_tts) that converts text into speech audio. It supports 5 engines — 3 local and 2 cloud — selectable via preconfig profiles, and follows the establishedIGlobal/IInstancenode pattern.Input:
textlane · Output:audiolane (WAV or MP3 container bytes)Supported Engines
piperkokoroKPipeline; language auto-detected from voice prefixbarktext-to-audiopipeline; GPU when availableopenaigpt-4o-mini-tts); requiresapi_keyorOPENAI_API_KEYelevenlabsapi_keyorELEVENLABS_API_KEYDefault profile:
piperArchitecture
flowchart TB subgraph Pipeline["Pipeline Runtime"] direction LR UP["Upstream node<br/>(text producer)"] -->|text lane| TTS["audio_tts<br/>filter node"] TTS -->|audio lane<br/>WAV / MP3| DOWN["Downstream node<br/>(audio consumer)"] end subgraph Node["Node Internals"] direction TB IG["IGlobal<br/>• beginGlobal: install deps, create TTSEngine<br/>• synthesize: hot-swap engine on config change<br/>• validateConfig: fail-fast on missing keys<br/>• cleanup stale temp files (≤ every 5 min)"] II["IInstance<br/>• writeText → synthesize → writeAudio<br/>• BEGIN / WRITE / END streaming<br/>• temp file deleted in finally block"] IG --- II end subgraph Engine["TTSEngine (tts_engine.py)"] direction TB PIPER["Piper<br/>PiperVoice ONNX"] KOKORO["Kokoro<br/>KPipeline + misaki G2P"] BARK["Bark<br/>HF transformers pipeline"] OAI["OpenAI<br/>REST API (direct)"] EL["ElevenLabs<br/>REST API (direct)"] end subgraph Shared["packages/ai — Shared Loaders"] direction TB PL["PiperLoader"] KL["KokoroLoader"] end II -->|"synthesize(text)"| IG IG -->|"config dict"| Engine Engine -.->|"--modelserver"| SharedConfiguration Flow
sequenceDiagram participant UI as Builder UI (RJSF) participant SVC as services.json participant CC as connConfig participant GNC as Config.getNodeConfig participant IG as IGlobal participant ENG as TTSEngine UI->>SVC: read preconfig profiles + shape UI->>CC: user selects profile + options CC->>GNC: merge preconfig defaults + overrides GNC->>IG: merged config dict IG->>IG: _normalize_engine_id() IG->>IG: _resolve_tts_model() / _resolve_tts_voice() IG->>ENG: _build_tts_config_dict() Note over IG,ENG: Engine is hot-swapped if<br/>identity signature changesKey Improvements
Reliability & Resource Management
tempfile.mkstemp(prefix='tts_')to prevent race conditionsfinallyblocks on both success and failure paths_cleanup_stale_outputs()runs at most once every 5 minutes, removingtts_*.wav/mp3files older than 1 hourself._engine.dispose()→inst.dispose()so the captured reference is always disposedEngine Hot-Swap
_tts_identity_signature()detects config changes (engine, voice, model, model-server flag)Configuration Robustness
_normalize_engine_id()handles compound UI ids (kokoro-default,openai-tts) by extracting the canonical prefixvalidateConfig()called beforedepends()so missing API keys surface immediately in the node panelmodel→openai_model,voice→openai_voice)Shared Model Loaders (
packages/ai)piper-tts, runs on model server CPU workerkokoro.KPipeline, model server GPU/CPUtts_engine.py— no loader neededen_core_web_smwheel for the installed spaCy version (fixeswasabi/Exception: 1issue)lameenc(no ffmpeg dependency)Test Suite
_normalize_engine_id()(canonical, compound, invalid)Files Changed (27 files, +3,616 / −58)
nodes/src/nodes/audio_tts/—IGlobal.py,IInstance.py,tts_engine.py,piper_catalog.py,services.json,README.md,requirements.txt,__init__.pypackages/ai/src/ai/common/models/audio/—piper_loader.py,kokoro_loader.py,piper_hub.py,piper_native.py,spacy_en_model.py,wav_to_mp3.pynodes/test/audio_tts/—test_iglobal.py,test_iinstance.py,test_normalize.py,test_tts_engine.py,conftest.pyType
Feature
Testing
./builder testpassesChecklist
Linked Issue
Fixes #411