feat(audio_tts): add Kokoro-only Text To Speech node (RR-411)#662
feat(audio_tts): add Kokoro-only Text To Speech node (RR-411)#662dsapandora wants to merge 2 commits intodevelopfrom
Conversation
|
No description provided. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Kokoro-based text-to-speech node with global/instance lifecycle handlers, local and remote synthesis paths, a Kokoro model loader and spaCy helper, service metadata and requirements, example pipeline, and minor test and export updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Node as TTS Node\n(IGlobal / IInstance)
participant Config as Config\nSystem
participant ModelServer as ModelClient\n(Remote)
participant LocalPipeline as KPipeline\n(Local)
participant spaCy as spaCy\n(en_core_web_sm)
participant FS as FileSystem
Note over Node: beginGlobal() initialization
Node->>Config: getNodeConfig() -> kokoro_voice, profile, model server addr
alt Model server address present
Node->>ModelServer: create client & load Kokoro model (lang_code)
ModelServer-->>Node: model bundle (_remote_client)
else No model server
Node->>spaCy: ensure_spacy_en_model()
spaCy-->>Node: model available or installed
Node->>LocalPipeline: construct KPipeline(lang_code)
LocalPipeline-->>Node: pipeline ready (_pipeline)
end
Note over Node: synthesize(text)
alt Remote path
Node->>ModelServer: inference request (text, voice, speed)
ModelServer->>ModelServer: synthesize -> wav_base64
ModelServer-->>Node: response with wav_base64
Node->>FS: decode base64 -> write temp WAV
else Local path
Node->>LocalPipeline: pipeline(text, voice, speed)
LocalPipeline-->>Node: stream audio chunks (ndarrays)
Node->>Node: concat/clip -> write mono 24kHz 16-bit WAV
end
Node-->>IInstance: return {path, mime_type}
IInstance->>FS: read temp WAV bytes
IInstance->>Node: writeAudio BEGIN/WRITE/END (audio/wav)
IInstance->>FS: delete temp WAV
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
3a7c119 to
244ce4e
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 89-113: The _cleanup_stale_outputs method calls time.time() twice
and reads a config key ('temp_output_max_age_sec') that isn't populated by
_build_tts_config_dict; remove the redundant now = time.time() (keep the first
one used for both interval check and file mtime comparisons) and ensure the
temp_output_max_age_sec config is actually provided by adding it to
_build_tts_config_dict (or move its default to a shared constant and reference
that constant here), so _cleanup_stale_outputs uses a single timestamp and a
real/ documented config value instead of silently falling back to 3600.
In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 47-49: The calls to writeAudio in IInstance
(writeAudio(AVI_ACTION.BEGIN, mime) and writeAudio(AVI_ACTION.END, mime)) are
missing the required third parameter (buffer); update those calls to pass an
empty bytes object as the buffer (e.g., b'') so they match the signature
writeAudio(action, mimeType, buffer) and avoid a TypeError at runtime when using
AVI_ACTION.BEGIN and AVI_ACTION.END.
In `@nodes/src/nodes/audio_tts/tts_engine.py`:
- Around line 88-109: The _kokoro method reads out_path from config without
validation which can lead to confusing errors when opening/writing the file; add
an early guard that verifies out_path is a non-empty string and raises a clear
ValueError (or similar) if missing/invalid before any file operations
(references: method _kokoro, variable out_path, and downstream use in
_save_mono_float_audio and callers like IGlobal.synthesize) so consumers get an
explicit message like "output_path must be set in config" instead of a cryptic
file I/O error.
- Line 74: The RuntimeError string in tts_engine.py should use single quotes per
project style; update the raise in the branch that checks model server mode (the
raise RuntimeError(...) call in function/method handling Kokoro model server
mode) so the exception message is a single-quoted literal, e.g., replace any
double-quoted message with a single-quoted one to conform to nodes/**/*.py
coding guidelines.
In `@nodes/test/audio_tts/test_iglobal.py`:
- Around line 92-100: The test currently only checks exception propagation but
must also assert the temp file was deleted: after the pytest.raises block,
inspect mock_engine.synthesize.call_args to extract the filename/path that
ig.synthesize created (e.g. from call_args[0] or call_args[1] depending on how
synthesize was called), then assert that os.path.exists(extracted_path) is False
(or use pathlib.Path(extracted_path).exists() is False) to verify cleanup;
ensure you import os/pathlib if needed and keep the RuntimeError assertion
intact.
In `@packages/ai/src/ai/common/models/audio/spacy_en_model.py`:
- Around line 67-72: The subprocess.run call that installs the wheel in
spacy_en_model.py should include a timeout to avoid indefinite hangs; update the
call in the function where [sys.executable, '-m', 'pip', 'install', url] is
invoked (look for the subprocess.run block) to pass a reasonable timeout (e.g.
timeout=300) and add handling for subprocess.TimeoutExpired (and optionally
subprocess.CalledProcessError) to log/raise a clear error so initialization
fails fast instead of blocking indefinitely.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3216c990-14c1-4ea3-ab4f-d185cba50fe5
📒 Files selected for processing (21)
nodes/src/nodes/audio_tts/IGlobal.pynodes/src/nodes/audio_tts/IInstance.pynodes/src/nodes/audio_tts/README.mdnodes/src/nodes/audio_tts/__init__.pynodes/src/nodes/audio_tts/requirements.txtnodes/src/nodes/audio_tts/services.jsonnodes/src/nodes/audio_tts/tts_engine.pynodes/test/audio_tts/__init__.pynodes/test/audio_tts/conftest.pynodes/test/audio_tts/test_iglobal.pynodes/test/audio_tts/test_iinstance.pynodes/test/audio_tts/test_normalize.pynodes/test/audio_tts/test_tts_engine.pynodes/test/conftest.pypackages/ai/src/ai/common/models/__init__.pypackages/ai/src/ai/common/models/audio/__init__.pypackages/ai/src/ai/common/models/audio/kokoro_loader.pypackages/ai/src/ai/common/models/audio/requirements_kokoro.txtpackages/ai/src/ai/common/models/audio/spacy_en_model.pypackages/ai/src/ai/common/models/base.pypipelines/text_file_to_audio.pipe
| self.instance.writeAudio(AVI_ACTION.BEGIN, mime) | ||
| self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw) | ||
| self.instance.writeAudio(AVI_ACTION.END, mime) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python - <<'PY'
import ast, pathlib
base = pathlib.Path('packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py')
tree = ast.parse(base.read_text())
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef) and node.name == 'writeAudio':
args = [a.arg for a in node.args.args]
print('IInstanceBase.writeAudio signature args:', args)
print('defaults count:', len(node.args.defaults))
break
target = pathlib.Path('nodes/src/nodes/audio_tts/IInstance.py')
t = ast.parse(target.read_text())
for node in ast.walk(t):
if isinstance(node, ast.Call) and isinstance(node.func, ast.Attribute) and node.func.attr == 'writeAudio':
print(f'Line {node.lineno}: positional args passed = {len(node.args)}')
PYRepository: rocketride-org/rocketride-server
Length of output: 283
🏁 Script executed:
cat -n nodes/src/nodes/audio_tts/IInstance.py | sed -n '40,55p'Repository: rocketride-org/rocketride-server
Length of output: 772
🏁 Script executed:
cat -n packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py | grep -A 15 'def writeAudio'Repository: rocketride-org/rocketride-server
Length of output: 1512
Fix writeAudio calls on BEGIN and END actions to include empty buffer.
Lines 47 and 49 call writeAudio(action, mimeType) with 2 arguments, but the method signature requires 3: (action, mimeType, buffer). This will fail at runtime with a TypeError for missing the required buffer parameter. BEGIN and END are action markers without a data payload, so pass an empty bytes object.
Fix
- self.instance.writeAudio(AVI_ACTION.BEGIN, mime)
+ self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'')
self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw)
- self.instance.writeAudio(AVI_ACTION.END, mime)
+ self.instance.writeAudio(AVI_ACTION.END, mime, b'')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/audio_tts/IInstance.py` around lines 47 - 49, The calls to
writeAudio in IInstance (writeAudio(AVI_ACTION.BEGIN, mime) and
writeAudio(AVI_ACTION.END, mime)) are missing the required third parameter
(buffer); update those calls to pass an empty bytes object as the buffer (e.g.,
b'') so they match the signature writeAudio(action, mimeType, buffer) and avoid
a TypeError at runtime when using AVI_ACTION.BEGIN and AVI_ACTION.END.
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (4)
packages/ai/src/ai/common/models/audio/spacy_en_model.py (1)
67-72:⚠️ Potential issue | 🟠 MajorAdd timeout for the wheel install subprocess (still unresolved).
Line 67 runs a networked
pip installwith no timeout, so initialization can hang indefinitely on stalled networks. This was already raised earlier and still applies in the current revision.Suggested fix
- subprocess.run( - [sys.executable, '-m', 'pip', 'install', url], - check=True, - stdout=subprocess.DEVNULL, - stderr=subprocess.DEVNULL, - ) + try: + subprocess.run( + [sys.executable, '-m', 'pip', 'install', url], + check=True, + timeout=300, + stdout=subprocess.DEVNULL, + stderr=subprocess.DEVNULL, + ) + except subprocess.TimeoutExpired as exc: + raise RuntimeError('Timed out installing spaCy model wheel') from exc#!/bin/bash set -euo pipefail file="$(fd -a '^spacy_en_model\.py$' | head -n1)" rg -n -A8 -B2 'subprocess\.run\(' "$file" python - <<'PY' from pathlib import Path p = next(Path('.').rglob('spacy_en_model.py')) text = p.read_text() start = text.find('subprocess.run(') end = text.find(')\n', start) snippet = text[start:end] print('timeout_present=', 'timeout=' in snippet) PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/ai/src/ai/common/models/audio/spacy_en_model.py` around lines 67 - 72, The subprocess.run call in spacy_en_model.py that executes [sys.executable, '-m', 'pip', 'install', url] should include a timeout parameter to avoid hanging on stalled networks; update the subprocess.run invocation to pass a reasonable timeout (e.g., timeout=60) and handle subprocess.TimeoutExpired by catching it (try/except around subprocess.run) and logging or raising a clear error so initialization fails fast; locate the call to subprocess.run in the module (the pip install block) and add timeout and appropriate exception handling there.nodes/src/nodes/audio_tts/IInstance.py (1)
47-49:⚠️ Potential issue | 🔴 Critical
writeAudioBEGIN/END calls are missing the requiredbufferargument.
writeAudiois called with 2 args on BEGIN/END and 3 args on WRITE. BEGIN/END will raiseTypeErrorat runtime.Proposed fix
- self.instance.writeAudio(AVI_ACTION.BEGIN, mime) + self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'') self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw) - self.instance.writeAudio(AVI_ACTION.END, mime) + self.instance.writeAudio(AVI_ACTION.END, mime, b'')#!/bin/bash set -euo pipefail python - <<'PY' import ast from pathlib import Path sig_path = Path('packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py') call_path = Path('nodes/src/nodes/audio_tts/IInstance.py') sig_tree = ast.parse(sig_path.read_text()) for node in ast.walk(sig_tree): if isinstance(node, ast.FunctionDef) and node.name == 'writeAudio': print('writeAudio signature args:', [a.arg for a in node.args.args]) call_tree = ast.parse(call_path.read_text()) for node in ast.walk(call_tree): if isinstance(node, ast.Call) and isinstance(node.func, ast.Attribute) and node.func.attr == 'writeAudio': print(f'Line {node.lineno}: positional args passed = {len(node.args)}') PY🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IInstance.py` around lines 47 - 49, The calls to self.instance.writeAudio in IInstance.py use two arguments for AVI_ACTION.BEGIN and AVI_ACTION.END but the writeAudio signature requires a buffer argument; update the BEGIN and END invocations to pass an explicit buffer (e.g., an empty bytes object) as the third parameter so the calls match the writeAudio(signature) used by writeAudio in rocketlib (use self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'') and similarly for END), ensuring the buffer is of type bytes to avoid the TypeError at runtime.nodes/test/audio_tts/test_iglobal.py (1)
89-98:⚠️ Potential issue | 🟡 MinorAssert temp-file cleanup in the failure path.
This test name says it verifies temp-file removal, but it currently only checks that the exception bubbles out. After the
pytest.raises(...)block, assert thatmock_engine.config['output_path']no longer exists.Suggested fix
with pytest.raises(RuntimeError, match='boom'): ig.synthesize('hello') + assert not os.path.exists(mock_engine.config['output_path'])🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/test/audio_tts/test_iglobal.py` around lines 89 - 98, The test test_temp_file_removed_on_engine_exception currently only asserts the RuntimeError is raised; after the with pytest.raises(...) block, add an assertion that the temporary file was removed by checking that mock_engine.config['output_path'] does not exist (use os.path.exists or Path.exists on mock_engine.config['output_path']) to ensure ig.synthesize cleans up temp files on engine exceptions.nodes/src/nodes/audio_tts/IGlobal.py (1)
58-67:⚠️ Potential issue | 🟡 Minor
temp_output_max_age_secis ignored after config rebuilds.
_cleanup_stale_outputs()reads this value fromself._config, but_build_tts_config_dict()drops it wheneverbeginGlobal()orsynthesize()refreshes_config. Any override silently falls back to 3600 seconds.Suggested fix
return { 'engine': engine, 'kokoro_voice': kokoro_voice, 'kokoro_lang_code': kokoro_voice[0], + 'temp_output_max_age_sec': cfg.get('temp_output_max_age_sec', 3600), }Also applies to: 91-92
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 58 - 67, The TTS config builder _build_tts_config_dict currently drops temp_output_max_age_sec causing _cleanup_stale_outputs (which reads self._config) to always fall back to the default; update _build_tts_config_dict to include temp_output_max_age_sec (parse/validate as int with the existing default, e.g., 3600) in the returned dict so config refreshes preserve any override, and apply the same fix to the other config-building site referenced (the similar dict return around lines 91-92) to ensure both paths keep temp_output_max_age_sec.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 33-35: The cleanup routine _cleanup_stale_outputs() and
synthesizer synthesize() are using a global temp filename prefix which risks
deleting unrelated files; introduce a node-specific temp namespace (e.g., add a
constant like _TTS_TEMP_DIR_NAME or _TTS_TEMP_PREFIX) and change synthesize() to
create output files inside that dedicated subdirectory (mkdir it under
tempfile.gettempdir() if missing) or to use the namespaced prefix when naming
files, then update _cleanup_stale_outputs() to only remove files matching that
exact namespace inside that subdirectory (or matching the namespaced prefix)
instead of globbing tts_*.wav across the system temp; adjust the code paths
referenced around the existing _CLEANUP_INTERVAL_SEC, _CANONICAL_TTS_ENGINES,
lines where files are created (synthesize()) and where cleanup occurs (the
_cleanup_stale_outputs() logic at ~97-107 and ~138-139) to use the new constant
and ensure directory creation and safe deletion.
In `@nodes/src/nodes/audio_tts/services.json`:
- Line 50: The public service config in nodes/src/nodes/audio_tts/services.json
does not expose output_format, so users cannot choose WAV vs MP3; update the
JSON schema to add an "output_format" option (enum e.g., "wav","mp3") alongside
"profile" and "voice" and default to the current value. Then update the runtime
wiring to read that new config key: ensure IGlobal.py (the global config
accessor), tts_engine.py (where output MIME/extension is selected), and
kokoro_loader.py (which currently hardcodes .wav / audio/wav) consume the
config.output_format value and map it to the correct file extension and MIME
type. Also add validation/defaulting in the service loader so missing
output_format falls back to the original WAV behavior.
In `@nodes/test/audio_tts/conftest.py`:
- Around line 101-109: The hook currently skips the entire suite; change
pytest_collection_modifyitems so it only skips tests explicitly marked as heavy
instead of every test: keep computing include_skip, then when iterating items
only add the skip marker (the marker variable in this diff) to items that have a
specific marker (e.g., check item.get_closest_marker('heavy') or a similar
marker name) and only when 'audio_tts' is NOT present in include_skip (so
setting ROCKETRIDE_INCLUDE_SKIP=audio_tts opt‑in will run those heavy tests);
remove the blanket add_marker for all items and target only the heavy-marked
tests in pytest_collection_modifyitems.
In `@nodes/test/audio_tts/test_iglobal.py`:
- Around line 99-116: The test currently injects a fake_path from tempfile into
mock_engine.synthesize which doesn't match the temp output_path that
IGlobal.synthesize creates; change the mock so mock_engine.synthesize returns
the actual output_path passed by IGlobal.synthesize (e.g., set
mock_engine.synthesize.side_effect to a function that accepts (text, *args,
**kwargs) and returns {'path': kwargs.get('output_path') or args[...] as
appropriate, 'mime_type': 'audio/wav'}), ensuring the test verifies the engine
contract and that the created temp file path is returned; update references in
the test to use mock_engine.synthesize.side_effect and keep ig._engine =
mock_engine.
In `@nodes/test/audio_tts/test_iinstance.py`:
- Around line 42-46: The test currently only checks the order of actions from wa
= inst.instance.writeAudio but not the full call signatures; update the
assertions to inspect wa.call_args_list for each call and assert both positional
args and keyword args: verify that each call's first positional arg equals the
action string ('BEGIN','WRITE','END'), that the 'WRITE' call includes the
expected payload (e.g., audio bytes or dict) as its subsequent
positional/keyword arg, and that 'BEGIN' and 'END' calls explicitly include the
expected payload/metadata (or None) rather than missing the argument; reference
wa and wa.call_args_list in nodes/test/audio_tts/test_iinstance.py to locate and
update the assertions accordingly.
In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py`:
- Around line 191-205: The code currently writes audio_arr to a temp WAV file
(using tempfile.mkstemp, wave.open on wav_path, then reopening and reading it
into raw) causing unnecessary disk I/O; replace this round-trip by importing io
and writing the WAV into an in-memory BytesIO: open a io.BytesIO(), use
wave.open on that buffer in 'wb' mode to setnchannels/setsampwidth/setframerate
and writeframes((audio_arr * 32767).astype(np.int16).tobytes()), then call
buffer.getvalue() to get raw — remove tempfile.mkstemp, os.close, os.remove and
the wav_path/wavf file operations accordingly so raw comes directly from memory.
---
Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 58-67: The TTS config builder _build_tts_config_dict currently
drops temp_output_max_age_sec causing _cleanup_stale_outputs (which reads
self._config) to always fall back to the default; update _build_tts_config_dict
to include temp_output_max_age_sec (parse/validate as int with the existing
default, e.g., 3600) in the returned dict so config refreshes preserve any
override, and apply the same fix to the other config-building site referenced
(the similar dict return around lines 91-92) to ensure both paths keep
temp_output_max_age_sec.
In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 47-49: The calls to self.instance.writeAudio in IInstance.py use
two arguments for AVI_ACTION.BEGIN and AVI_ACTION.END but the writeAudio
signature requires a buffer argument; update the BEGIN and END invocations to
pass an explicit buffer (e.g., an empty bytes object) as the third parameter so
the calls match the writeAudio(signature) used by writeAudio in rocketlib (use
self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'') and similarly for END),
ensuring the buffer is of type bytes to avoid the TypeError at runtime.
In `@nodes/test/audio_tts/test_iglobal.py`:
- Around line 89-98: The test test_temp_file_removed_on_engine_exception
currently only asserts the RuntimeError is raised; after the with
pytest.raises(...) block, add an assertion that the temporary file was removed
by checking that mock_engine.config['output_path'] does not exist (use
os.path.exists or Path.exists on mock_engine.config['output_path']) to ensure
ig.synthesize cleans up temp files on engine exceptions.
In `@packages/ai/src/ai/common/models/audio/spacy_en_model.py`:
- Around line 67-72: The subprocess.run call in spacy_en_model.py that executes
[sys.executable, '-m', 'pip', 'install', url] should include a timeout parameter
to avoid hanging on stalled networks; update the subprocess.run invocation to
pass a reasonable timeout (e.g., timeout=60) and handle
subprocess.TimeoutExpired by catching it (try/except around subprocess.run) and
logging or raising a clear error so initialization fails fast; locate the call
to subprocess.run in the module (the pip install block) and add timeout and
appropriate exception handling there.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1d5161b7-2f28-4813-9b62-73013ce54099
📒 Files selected for processing (21)
nodes/src/nodes/audio_tts/IGlobal.pynodes/src/nodes/audio_tts/IInstance.pynodes/src/nodes/audio_tts/README.mdnodes/src/nodes/audio_tts/__init__.pynodes/src/nodes/audio_tts/requirements.txtnodes/src/nodes/audio_tts/services.jsonnodes/src/nodes/audio_tts/tts_engine.pynodes/test/audio_tts/__init__.pynodes/test/audio_tts/conftest.pynodes/test/audio_tts/test_iglobal.pynodes/test/audio_tts/test_iinstance.pynodes/test/audio_tts/test_normalize.pynodes/test/audio_tts/test_tts_engine.pynodes/test/conftest.pypackages/ai/src/ai/common/models/__init__.pypackages/ai/src/ai/common/models/audio/__init__.pypackages/ai/src/ai/common/models/audio/kokoro_loader.pypackages/ai/src/ai/common/models/audio/requirements_kokoro.txtpackages/ai/src/ai/common/models/audio/spacy_en_model.pypackages/ai/src/ai/common/models/base.pypipelines/text_file_to_audio.pipe
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
nodes/src/nodes/audio_tts/IInstance.py (1)
28-30:⚠️ Potential issue | 🔴 CriticalPass
b''on the BEGIN/END marker writes.Lines 28 and 30 still call
writeAudiowith only two arguments. The currentwriteAudio(action, mimeType, data)implementations require the thirddataparameter, so this path throwsTypeErrorbefore any audio is emitted.Match the `writeAudio` signature
- self.instance.writeAudio(AVI_ACTION.BEGIN, mime) + self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'') self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw) - self.instance.writeAudio(AVI_ACTION.END, mime) + self.instance.writeAudio(AVI_ACTION.END, mime, b'')🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nodes/src/nodes/audio_tts/IInstance.py` around lines 28 - 30, The calls to writeAudio in IInstance.py are missing the third data argument and should pass an empty bytes marker; update the BEGIN and END invocations (the calls using AVI_ACTION.BEGIN and AVI_ACTION.END) to call writeAudio(AVI_ACTION.BEGIN, mime, b'') and writeAudio(AVI_ACTION.END, mime, b'') so the signature writeAudio(action, mimeType, data) is satisfied (leave the WRITE call with actual raw data unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 29-30: The code only reads kokoro_voice and always builds/requests
WAV data and returns audio/wav; update the logic to wire the configured output
format from Config.getNodeConfig(self.glb.logicalType, self.glb.connConfig) into
the synthesize() call and downstream handling: read the configured format (e.g.
"mp3" or "wav") alongside kokoro_voice, pass that format option into
synthesize() (or trigger wav_to_mp3 conversion when format == "mp3"), choose the
correct base64 response field (wav_base64 vs mp3_base64) from the API response,
and set the returned Content-Type accordingly; modify the code block that
currently always builds a .wav (the logic in synthesize()/lines 57-92) to branch
on the requested format and use wav_to_mp3 conversion path when required.
- Around line 19-25: Add PEP 257 docstrings to the IGlobal class and its
lifecycle methods (beginGlobal, synthesize, endGlobal) describing: the
CONFIG-mode no-op behavior, the split between remote and local setup (how
_remote_client and _pipeline are used), and the synthesize return contract as a
dict with keys 'path' and 'mime_type' (including expected types/semantics);
ensure docstrings use single-quoted strings, follow PEP 257 style for classes
and methods, and run ruff/formatting for Python 3.10+ compliance across IGlobal,
beginGlobal, synthesize, and endGlobal locations referenced in the diff.
- Around line 36-40: The depends(...) installation is being called before
checking model-server mode, causing unnecessary installs for remote-only
deployments; change the flow in IGlobal.py so get_model_server_address() is
evaluated first and only if it indicates a local mode do you call depends(...).
Concretely, move the depends(os.path.join(..., 'requirements.txt')) call inside
the branch that runs when get_model_server_address() indicates no remote server
(or local mode), and apply the same reorder/fix for the similar block around the
other depends usage (the block covering lines 49-54) so depends() is never
invoked when operating in remote/model-server mode.
---
Duplicate comments:
In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 28-30: The calls to writeAudio in IInstance.py are missing the
third data argument and should pass an empty bytes marker; update the BEGIN and
END invocations (the calls using AVI_ACTION.BEGIN and AVI_ACTION.END) to call
writeAudio(AVI_ACTION.BEGIN, mime, b'') and writeAudio(AVI_ACTION.END, mime,
b'') so the signature writeAudio(action, mimeType, data) is satisfied (leave the
WRITE call with actual raw data unchanged).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9227144b-8117-4bf4-ab10-898eba989d9a
📒 Files selected for processing (2)
nodes/src/nodes/audio_tts/IGlobal.pynodes/src/nodes/audio_tts/IInstance.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 100-105: Replace the try/except/pass used to remove a temporary
file inside the except Exception handler with contextlib.suppress for clarity:
import contextlib at top and change the inner block that currently attempts to
os.remove(out_path) inside the except Exception in IGlobal.py to use with
contextlib.suppress(OSError): os.remove(out_path), then re-raise the original
exception as before.
- Around line 83-89: The loop over self._pipeline may receive None or
torch.Tensor audio; update the iteration in IGlobal (the block using for _gs,
_ps, audio in self._pipeline(...)) to skip when audio is None, to detect if
audio is a torch.Tensor and call .detach().cpu().numpy() before converting to a
numpy array, then convert to np.float32, check for zero-size and flatten
(arr.reshape(-1)) as currently done, and append to chunks; ensure you reference
the variables audio, _gs, _ps, self._pipeline, and chunks when making these
changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6e1741c9-2c58-467d-b637-0ae093281709
📒 Files selected for processing (1)
nodes/src/nodes/audio_tts/IGlobal.py
| except Exception: | ||
| try: | ||
| os.remove(out_path) | ||
| except OSError: | ||
| pass | ||
| raise |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider contextlib.suppress for cleaner exception handling.
The try-except-pass pattern can be simplified using contextlib.suppress(OSError).
Proposed fix
+import contextlib
...
except Exception:
- try:
- os.remove(out_path)
- except OSError:
- pass
+ with contextlib.suppress(OSError):
+ os.remove(out_path)
raise🧰 Tools
🪛 Ruff (0.15.9)
[warning] 101-104: Use contextlib.suppress(OSError) instead of try-except-pass
Replace try-except-pass with with contextlib.suppress(OSError): ...
(SIM105)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 100 - 105, Replace the
try/except/pass used to remove a temporary file inside the except Exception
handler with contextlib.suppress for clarity: import contextlib at top and
change the inner block that currently attempts to os.remove(out_path) inside the
except Exception in IGlobal.py to use with contextlib.suppress(OSError):
os.remove(out_path), then re-raise the original exception as before.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
packages/ai/src/ai/common/models/audio/kokoro_loader.py (1)
187-201:⚠️ Potential issue | 🟠 MajorKeep WAV serialization in memory.
This path already has the PCM buffer in
audio_arr; writing a temp file for every item adds avoidable filesystem latency and temp-file churn on the model-server hot path.♻️ Proposed fix
- fd, wav_path = tempfile.mkstemp(suffix='.wav') - os.close(fd) - try: - with wave.open(wav_path, 'wb') as wavf: - wavf.setnchannels(1) - wavf.setsampwidth(2) - wavf.setframerate(24000) - wavf.writeframes((audio_arr * 32767).astype(np.int16).tobytes()) - with open(wav_path, 'rb') as f: - raw = f.read() - finally: - try: - os.remove(wav_path) - except OSError: - pass + buffer = io.BytesIO() + with wave.open(buffer, 'wb') as wavf: + wavf.setnchannels(1) + wavf.setsampwidth(2) + wavf.setframerate(24000) + wavf.writeframes((audio_arr * 32767).astype(np.int16).tobytes()) + raw = buffer.getvalue()Also add
import ionear the other imports and drop the now-unusedtempfileimport.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py` around lines 187 - 201, The code currently writes audio_arr to a temporary WAV file using tempfile.mkstemp and reads it back into raw, causing unnecessary filesystem I/O; change this to serialize the WAV in-memory by creating an io.BytesIO() buffer, opening it with wave.open(buffer, 'wb'), setnchannels/setsampwidth/setframerate and writeframes((audio_arr * 32767).astype(np.int16).tobytes()), then obtain raw = buffer.getvalue() and close the buffer; also add import io to the module and remove the now-unused tempfile.mkstemp/os.close/os.remove usage and imports to eliminate temp-file churn (refer to audio_arr, raw, wave.open and tempfile.mkstemp in the existing block).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py`:
- Around line 221-223: In postprocess, the branch that returns a list of empty
dicts uses "batch_size or 1" which forces a single phantom item when batch_size
is 0; change it to respect a zero-length batch by returning [{} for _ in
range(batch_size if batch_size is not None else 0)] (i.e., remove the "or 1"
fallback) so that when batch_size == 0 the function returns an empty list and
preserves input/output cardinality; update the branch where items is falsy to
use that expression.
---
Duplicate comments:
In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py`:
- Around line 187-201: The code currently writes audio_arr to a temporary WAV
file using tempfile.mkstemp and reads it back into raw, causing unnecessary
filesystem I/O; change this to serialize the WAV in-memory by creating an
io.BytesIO() buffer, opening it with wave.open(buffer, 'wb'),
setnchannels/setsampwidth/setframerate and writeframes((audio_arr *
32767).astype(np.int16).tobytes()), then obtain raw = buffer.getvalue() and
close the buffer; also add import io to the module and remove the now-unused
tempfile.mkstemp/os.close/os.remove usage and imports to eliminate temp-file
churn (refer to audio_arr, raw, wave.open and tempfile.mkstemp in the existing
block).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c38d177b-cc92-499c-bdb2-b9e6942905ef
📒 Files selected for processing (1)
packages/ai/src/ai/common/models/audio/kokoro_loader.py
9ee9bd4 to
b00c098
Compare
- Node: IGlobal/IInstance/TTSEngine aligned with reference for engine=kokoro - services.json: single kokoro profile and voice catalog - packages/ai: KokoroLoader, spacy_en_model, wav_to_mp3, requirements_kokoro - depends() installs slim requirements (no piper/transformers/cloud) - Tests under nodes/test/audio_tts; audio_tts in dynamic skip_nodes
b00c098 to
a8b7da6
Compare
Summary
Type
Feature
Testing
./builder testpassesChecklist
Linked Issue
#583
Closes #123
Fixes #411
Summary by CodeRabbit
New Features
Documentation
Tests
Chores