Skip to content

feat(audio_tts): add Kokoro-only Text To Speech node (RR-411)#662

Open
dsapandora wants to merge 2 commits intodevelopfrom
feat/RR-411-kokoro
Open

feat(audio_tts): add Kokoro-only Text To Speech node (RR-411)#662
dsapandora wants to merge 2 commits intodevelopfrom
feat/RR-411-kokoro

Conversation

@dsapandora
Copy link
Copy Markdown
Collaborator

@dsapandora dsapandora commented Apr 13, 2026

Summary

  • Node: IGlobal/IInstance/TTSEngine aligned with reference for engine=kokoro
  • services.json: single kokoro profile and voice catalog
  • packages/ai: KokoroLoader, spacy_en_model, requirements_kokoro
  • depends() installs slim requirements (no piper/transformers/cloud)
  • Tests under nodes/test/audio_tts; audio_tts in dynamic skip_nodes
Captura de pantalla 2026-04-13 a las 15 41 41 Captura de pantalla 2026-04-13 a las 15 43 43

Type

Feature

Testing

  • Tests added or updated
  • Tested locally
  • ./builder test passes

Checklist

  • Commit messages follow conventional commits
  • No secrets or credentials included
  • Wiki updated (if applicable)
  • Breaking changes documented (if applicable)

Linked Issue

#583

Closes #123

Fixes #411

Summary by CodeRabbit

  • New Features

    • Added a Text-to-Speech node with selectable Kokoro voices, local or model-server synthesis, and WAV output; includes a Kokoro model loader and spaCy auto-install support.
    • Added a sample "Text File → Audio" pipeline.
  • Documentation

    • Added node README documenting lanes, usage, profiles, and configuration.
  • Tests

    • Test suite updated to skip the new audio_tts node by default (opt-in available).
  • Chores

    • Added runtime dependency files and exported Kokoro loader in model packages.

@github-actions github-actions bot added docs Documentation module:nodes Python pipeline nodes module:ai AI/ML modules labels Apr 13, 2026
@github-actions
Copy link
Copy Markdown

No description provided.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Kokoro-based text-to-speech node with global/instance lifecycle handlers, local and remote synthesis paths, a Kokoro model loader and spaCy helper, service metadata and requirements, example pipeline, and minor test and export updates.

Changes

Cohort / File(s) Summary
Node implementation
nodes/src/nodes/audio_tts/IGlobal.py, nodes/src/nodes/audio_tts/IInstance.py, nodes/src/nodes/audio_tts/__init__.py
New TTS node: IGlobal initializes remote ModelClient or local KPipeline, installs runtime deps, synthesizes WAV (remote base64 or local chunk concat), and cleans up; IInstance.writeText streams WAV bytes via writeAudio (BEGIN/WRITE/END).
Node config & docs
nodes/src/nodes/audio_tts/services.json, nodes/src/nodes/audio_tts/requirements.txt, nodes/src/nodes/audio_tts/README.md
Adds service descriptor, Kokoro profile/voice configuration, IO contract (text→audio), runtime requirements, and README describing local vs remote execution and expected behavior.
Model loader & deps
packages/ai/src/ai/common/models/audio/kokoro_loader.py, packages/ai/src/ai/common/models/audio/requirements_kokoro.txt, packages/ai/src/ai/common/models/audio/spacy_en_model.py
New KokoroLoader for inference (dependency ensure, device allocation, locking, preprocess/inference/postprocess producing base64 WAV); spacy helper to ensure en_core_web_sm; Kokoro requirements file.
Package exports
packages/ai/src/ai/common/models/__init__.py, packages/ai/src/ai/common/models/audio/__init__.py
Expose KokoroLoader in model package exports and update audio module docstring.
Tests & example pipeline
nodes/test/conftest.py, pipelines/text_file_to_audio.pipe
Adds audio_tts to dynamic-test skip list (opt‑in preserved); adds example pipeline wiring text→audio using kokoro profile and voice.
Base loader notes
packages/ai/src/ai/common/models/base.py
Minor doc/comment tweaks to note kokoro support and mark ai.common.torch import as intentional side effect.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Node as TTS Node\n(IGlobal / IInstance)
    participant Config as Config\nSystem
    participant ModelServer as ModelClient\n(Remote)
    participant LocalPipeline as KPipeline\n(Local)
    participant spaCy as spaCy\n(en_core_web_sm)
    participant FS as FileSystem

    Note over Node: beginGlobal() initialization
    Node->>Config: getNodeConfig() -> kokoro_voice, profile, model server addr
    alt Model server address present
        Node->>ModelServer: create client & load Kokoro model (lang_code)
        ModelServer-->>Node: model bundle (_remote_client)
    else No model server
        Node->>spaCy: ensure_spacy_en_model()
        spaCy-->>Node: model available or installed
        Node->>LocalPipeline: construct KPipeline(lang_code)
        LocalPipeline-->>Node: pipeline ready (_pipeline)
    end

    Note over Node: synthesize(text)
    alt Remote path
        Node->>ModelServer: inference request (text, voice, speed)
        ModelServer->>ModelServer: synthesize -> wav_base64
        ModelServer-->>Node: response with wav_base64
        Node->>FS: decode base64 -> write temp WAV
    else Local path
        Node->>LocalPipeline: pipeline(text, voice, speed)
        LocalPipeline-->>Node: stream audio chunks (ndarrays)
        Node->>Node: concat/clip -> write mono 24kHz 16-bit WAV
    end

    Node-->>IInstance: return {path, mime_type}
    IInstance->>FS: read temp WAV bytes
    IInstance->>Node: writeAudio BEGIN/WRITE/END (audio/wav)
    IInstance->>FS: delete temp WAV
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • jmaionchi
  • stepmikhaylov
  • Rod-Christensen

Poem

🐰
I nibble on config, hum a tiny tune,
Kokoro wakes and sings by moon.
From text to WAV, I bounce and play —
Hoppity hop, hooray for TTS day! 🎶

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR primarily addresses issue #411 with a Kokoro-only TTS implementation. However, the implementation is incomplete relative to the full acceptance criteria, which required multiple engines (Coqui, Kokoro, Piper, Bark) and cloud providers (ElevenLabs, OpenAI), and issue #123 documentation updates are not addressed. Complete the remaining engine/provider implementations and add comprehensive documentation updates to ROCKETRIDE_COMPONENT_REFERENCE.md and ROCKETRIDE_PIPELINE_RULES.md per issue #123.
Docstring Coverage ⚠️ Warning Docstring coverage is 26.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the primary change: adding a Text To Speech (TTS) node for the Kokoro engine with clear reference to the issue (RR-411).
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the TTS node and its dependencies. The modifications to test configuration (conftest.py) appropriately skip the new audio_tts node from general tests, and updates to base.py and model exports support the KokoroLoader integration.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/RR-411-kokoro

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 89-113: The _cleanup_stale_outputs method calls time.time() twice
and reads a config key ('temp_output_max_age_sec') that isn't populated by
_build_tts_config_dict; remove the redundant now = time.time() (keep the first
one used for both interval check and file mtime comparisons) and ensure the
temp_output_max_age_sec config is actually provided by adding it to
_build_tts_config_dict (or move its default to a shared constant and reference
that constant here), so _cleanup_stale_outputs uses a single timestamp and a
real/ documented config value instead of silently falling back to 3600.

In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 47-49: The calls to writeAudio in IInstance
(writeAudio(AVI_ACTION.BEGIN, mime) and writeAudio(AVI_ACTION.END, mime)) are
missing the required third parameter (buffer); update those calls to pass an
empty bytes object as the buffer (e.g., b'') so they match the signature
writeAudio(action, mimeType, buffer) and avoid a TypeError at runtime when using
AVI_ACTION.BEGIN and AVI_ACTION.END.

In `@nodes/src/nodes/audio_tts/tts_engine.py`:
- Around line 88-109: The _kokoro method reads out_path from config without
validation which can lead to confusing errors when opening/writing the file; add
an early guard that verifies out_path is a non-empty string and raises a clear
ValueError (or similar) if missing/invalid before any file operations
(references: method _kokoro, variable out_path, and downstream use in
_save_mono_float_audio and callers like IGlobal.synthesize) so consumers get an
explicit message like "output_path must be set in config" instead of a cryptic
file I/O error.
- Line 74: The RuntimeError string in tts_engine.py should use single quotes per
project style; update the raise in the branch that checks model server mode (the
raise RuntimeError(...) call in function/method handling Kokoro model server
mode) so the exception message is a single-quoted literal, e.g., replace any
double-quoted message with a single-quoted one to conform to nodes/**/*.py
coding guidelines.

In `@nodes/test/audio_tts/test_iglobal.py`:
- Around line 92-100: The test currently only checks exception propagation but
must also assert the temp file was deleted: after the pytest.raises block,
inspect mock_engine.synthesize.call_args to extract the filename/path that
ig.synthesize created (e.g. from call_args[0] or call_args[1] depending on how
synthesize was called), then assert that os.path.exists(extracted_path) is False
(or use pathlib.Path(extracted_path).exists() is False) to verify cleanup;
ensure you import os/pathlib if needed and keep the RuntimeError assertion
intact.

In `@packages/ai/src/ai/common/models/audio/spacy_en_model.py`:
- Around line 67-72: The subprocess.run call that installs the wheel in
spacy_en_model.py should include a timeout to avoid indefinite hangs; update the
call in the function where [sys.executable, '-m', 'pip', 'install', url] is
invoked (look for the subprocess.run block) to pass a reasonable timeout (e.g.
timeout=300) and add handling for subprocess.TimeoutExpired (and optionally
subprocess.CalledProcessError) to log/raise a clear error so initialization
fails fast instead of blocking indefinitely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3216c990-14c1-4ea3-ab4f-d185cba50fe5

📥 Commits

Reviewing files that changed from the base of the PR and between c04dbeb and 3a7c119.

📒 Files selected for processing (21)
  • nodes/src/nodes/audio_tts/IGlobal.py
  • nodes/src/nodes/audio_tts/IInstance.py
  • nodes/src/nodes/audio_tts/README.md
  • nodes/src/nodes/audio_tts/__init__.py
  • nodes/src/nodes/audio_tts/requirements.txt
  • nodes/src/nodes/audio_tts/services.json
  • nodes/src/nodes/audio_tts/tts_engine.py
  • nodes/test/audio_tts/__init__.py
  • nodes/test/audio_tts/conftest.py
  • nodes/test/audio_tts/test_iglobal.py
  • nodes/test/audio_tts/test_iinstance.py
  • nodes/test/audio_tts/test_normalize.py
  • nodes/test/audio_tts/test_tts_engine.py
  • nodes/test/conftest.py
  • packages/ai/src/ai/common/models/__init__.py
  • packages/ai/src/ai/common/models/audio/__init__.py
  • packages/ai/src/ai/common/models/audio/kokoro_loader.py
  • packages/ai/src/ai/common/models/audio/requirements_kokoro.txt
  • packages/ai/src/ai/common/models/audio/spacy_en_model.py
  • packages/ai/src/ai/common/models/base.py
  • pipelines/text_file_to_audio.pipe

Comment on lines +47 to +49
self.instance.writeAudio(AVI_ACTION.BEGIN, mime)
self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw)
self.instance.writeAudio(AVI_ACTION.END, mime)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
import ast, pathlib

base = pathlib.Path('packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py')
tree = ast.parse(base.read_text())
for node in ast.walk(tree):
    if isinstance(node, ast.FunctionDef) and node.name == 'writeAudio':
        args = [a.arg for a in node.args.args]
        print('IInstanceBase.writeAudio signature args:', args)
        print('defaults count:', len(node.args.defaults))
        break

target = pathlib.Path('nodes/src/nodes/audio_tts/IInstance.py')
t = ast.parse(target.read_text())
for node in ast.walk(t):
    if isinstance(node, ast.Call) and isinstance(node.func, ast.Attribute) and node.func.attr == 'writeAudio':
        print(f'Line {node.lineno}: positional args passed = {len(node.args)}')
PY

Repository: rocketride-org/rocketride-server

Length of output: 283


🏁 Script executed:

cat -n nodes/src/nodes/audio_tts/IInstance.py | sed -n '40,55p'

Repository: rocketride-org/rocketride-server

Length of output: 772


🏁 Script executed:

cat -n packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py | grep -A 15 'def writeAudio'

Repository: rocketride-org/rocketride-server

Length of output: 1512


Fix writeAudio calls on BEGIN and END actions to include empty buffer.

Lines 47 and 49 call writeAudio(action, mimeType) with 2 arguments, but the method signature requires 3: (action, mimeType, buffer). This will fail at runtime with a TypeError for missing the required buffer parameter. BEGIN and END are action markers without a data payload, so pass an empty bytes object.

Fix
-            self.instance.writeAudio(AVI_ACTION.BEGIN, mime)
+            self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'')
             self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw)
-            self.instance.writeAudio(AVI_ACTION.END, mime)
+            self.instance.writeAudio(AVI_ACTION.END, mime, b'')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IInstance.py` around lines 47 - 49, The calls to
writeAudio in IInstance (writeAudio(AVI_ACTION.BEGIN, mime) and
writeAudio(AVI_ACTION.END, mime)) are missing the required third parameter
(buffer); update those calls to pass an empty bytes object as the buffer (e.g.,
b'') so they match the signature writeAudio(action, mimeType, buffer) and avoid
a TypeError at runtime when using AVI_ACTION.BEGIN and AVI_ACTION.END.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (4)
packages/ai/src/ai/common/models/audio/spacy_en_model.py (1)

67-72: ⚠️ Potential issue | 🟠 Major

Add timeout for the wheel install subprocess (still unresolved).

Line 67 runs a networked pip install with no timeout, so initialization can hang indefinitely on stalled networks. This was already raised earlier and still applies in the current revision.

Suggested fix
-    subprocess.run(
-        [sys.executable, '-m', 'pip', 'install', url],
-        check=True,
-        stdout=subprocess.DEVNULL,
-        stderr=subprocess.DEVNULL,
-    )
+    try:
+        subprocess.run(
+            [sys.executable, '-m', 'pip', 'install', url],
+            check=True,
+            timeout=300,
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.DEVNULL,
+        )
+    except subprocess.TimeoutExpired as exc:
+        raise RuntimeError('Timed out installing spaCy model wheel') from exc
#!/bin/bash
set -euo pipefail

file="$(fd -a '^spacy_en_model\.py$' | head -n1)"
rg -n -A8 -B2 'subprocess\.run\(' "$file"

python - <<'PY'
from pathlib import Path
p = next(Path('.').rglob('spacy_en_model.py'))
text = p.read_text()
start = text.find('subprocess.run(')
end = text.find(')\n', start)
snippet = text[start:end]
print('timeout_present=', 'timeout=' in snippet)
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ai/src/ai/common/models/audio/spacy_en_model.py` around lines 67 -
72, The subprocess.run call in spacy_en_model.py that executes [sys.executable,
'-m', 'pip', 'install', url] should include a timeout parameter to avoid hanging
on stalled networks; update the subprocess.run invocation to pass a reasonable
timeout (e.g., timeout=60) and handle subprocess.TimeoutExpired by catching it
(try/except around subprocess.run) and logging or raising a clear error so
initialization fails fast; locate the call to subprocess.run in the module (the
pip install block) and add timeout and appropriate exception handling there.
nodes/src/nodes/audio_tts/IInstance.py (1)

47-49: ⚠️ Potential issue | 🔴 Critical

writeAudio BEGIN/END calls are missing the required buffer argument.

writeAudio is called with 2 args on BEGIN/END and 3 args on WRITE. BEGIN/END will raise TypeError at runtime.

Proposed fix
-            self.instance.writeAudio(AVI_ACTION.BEGIN, mime)
+            self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'')
             self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw)
-            self.instance.writeAudio(AVI_ACTION.END, mime)
+            self.instance.writeAudio(AVI_ACTION.END, mime, b'')
#!/bin/bash
set -euo pipefail

python - <<'PY'
import ast
from pathlib import Path

sig_path = Path('packages/server/engine-lib/rocketlib-python/lib/rocketlib/filters.py')
call_path = Path('nodes/src/nodes/audio_tts/IInstance.py')

sig_tree = ast.parse(sig_path.read_text())
for node in ast.walk(sig_tree):
    if isinstance(node, ast.FunctionDef) and node.name == 'writeAudio':
        print('writeAudio signature args:', [a.arg for a in node.args.args])

call_tree = ast.parse(call_path.read_text())
for node in ast.walk(call_tree):
    if isinstance(node, ast.Call) and isinstance(node.func, ast.Attribute) and node.func.attr == 'writeAudio':
        print(f'Line {node.lineno}: positional args passed = {len(node.args)}')
PY
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IInstance.py` around lines 47 - 49, The calls to
self.instance.writeAudio in IInstance.py use two arguments for AVI_ACTION.BEGIN
and AVI_ACTION.END but the writeAudio signature requires a buffer argument;
update the BEGIN and END invocations to pass an explicit buffer (e.g., an empty
bytes object) as the third parameter so the calls match the
writeAudio(signature) used by writeAudio in rocketlib (use
self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'') and similarly for END),
ensuring the buffer is of type bytes to avoid the TypeError at runtime.
nodes/test/audio_tts/test_iglobal.py (1)

89-98: ⚠️ Potential issue | 🟡 Minor

Assert temp-file cleanup in the failure path.

This test name says it verifies temp-file removal, but it currently only checks that the exception bubbles out. After the pytest.raises(...) block, assert that mock_engine.config['output_path'] no longer exists.

Suggested fix
         with pytest.raises(RuntimeError, match='boom'):
             ig.synthesize('hello')
+        assert not os.path.exists(mock_engine.config['output_path'])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/test/audio_tts/test_iglobal.py` around lines 89 - 98, The test
test_temp_file_removed_on_engine_exception currently only asserts the
RuntimeError is raised; after the with pytest.raises(...) block, add an
assertion that the temporary file was removed by checking that
mock_engine.config['output_path'] does not exist (use os.path.exists or
Path.exists on mock_engine.config['output_path']) to ensure ig.synthesize cleans
up temp files on engine exceptions.
nodes/src/nodes/audio_tts/IGlobal.py (1)

58-67: ⚠️ Potential issue | 🟡 Minor

temp_output_max_age_sec is ignored after config rebuilds.

_cleanup_stale_outputs() reads this value from self._config, but _build_tts_config_dict() drops it whenever beginGlobal() or synthesize() refreshes _config. Any override silently falls back to 3600 seconds.

Suggested fix
         return {
             'engine': engine,
             'kokoro_voice': kokoro_voice,
             'kokoro_lang_code': kokoro_voice[0],
+            'temp_output_max_age_sec': cfg.get('temp_output_max_age_sec', 3600),
         }

Also applies to: 91-92

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 58 - 67, The TTS config
builder _build_tts_config_dict currently drops temp_output_max_age_sec causing
_cleanup_stale_outputs (which reads self._config) to always fall back to the
default; update _build_tts_config_dict to include temp_output_max_age_sec
(parse/validate as int with the existing default, e.g., 3600) in the returned
dict so config refreshes preserve any override, and apply the same fix to the
other config-building site referenced (the similar dict return around lines
91-92) to ensure both paths keep temp_output_max_age_sec.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 33-35: The cleanup routine _cleanup_stale_outputs() and
synthesizer synthesize() are using a global temp filename prefix which risks
deleting unrelated files; introduce a node-specific temp namespace (e.g., add a
constant like _TTS_TEMP_DIR_NAME or _TTS_TEMP_PREFIX) and change synthesize() to
create output files inside that dedicated subdirectory (mkdir it under
tempfile.gettempdir() if missing) or to use the namespaced prefix when naming
files, then update _cleanup_stale_outputs() to only remove files matching that
exact namespace inside that subdirectory (or matching the namespaced prefix)
instead of globbing tts_*.wav across the system temp; adjust the code paths
referenced around the existing _CLEANUP_INTERVAL_SEC, _CANONICAL_TTS_ENGINES,
lines where files are created (synthesize()) and where cleanup occurs (the
_cleanup_stale_outputs() logic at ~97-107 and ~138-139) to use the new constant
and ensure directory creation and safe deletion.

In `@nodes/src/nodes/audio_tts/services.json`:
- Line 50: The public service config in nodes/src/nodes/audio_tts/services.json
does not expose output_format, so users cannot choose WAV vs MP3; update the
JSON schema to add an "output_format" option (enum e.g., "wav","mp3") alongside
"profile" and "voice" and default to the current value. Then update the runtime
wiring to read that new config key: ensure IGlobal.py (the global config
accessor), tts_engine.py (where output MIME/extension is selected), and
kokoro_loader.py (which currently hardcodes .wav / audio/wav) consume the
config.output_format value and map it to the correct file extension and MIME
type. Also add validation/defaulting in the service loader so missing
output_format falls back to the original WAV behavior.

In `@nodes/test/audio_tts/conftest.py`:
- Around line 101-109: The hook currently skips the entire suite; change
pytest_collection_modifyitems so it only skips tests explicitly marked as heavy
instead of every test: keep computing include_skip, then when iterating items
only add the skip marker (the marker variable in this diff) to items that have a
specific marker (e.g., check item.get_closest_marker('heavy') or a similar
marker name) and only when 'audio_tts' is NOT present in include_skip (so
setting ROCKETRIDE_INCLUDE_SKIP=audio_tts opt‑in will run those heavy tests);
remove the blanket add_marker for all items and target only the heavy-marked
tests in pytest_collection_modifyitems.

In `@nodes/test/audio_tts/test_iglobal.py`:
- Around line 99-116: The test currently injects a fake_path from tempfile into
mock_engine.synthesize which doesn't match the temp output_path that
IGlobal.synthesize creates; change the mock so mock_engine.synthesize returns
the actual output_path passed by IGlobal.synthesize (e.g., set
mock_engine.synthesize.side_effect to a function that accepts (text, *args,
**kwargs) and returns {'path': kwargs.get('output_path') or args[...] as
appropriate, 'mime_type': 'audio/wav'}), ensuring the test verifies the engine
contract and that the created temp file path is returned; update references in
the test to use mock_engine.synthesize.side_effect and keep ig._engine =
mock_engine.

In `@nodes/test/audio_tts/test_iinstance.py`:
- Around line 42-46: The test currently only checks the order of actions from wa
= inst.instance.writeAudio but not the full call signatures; update the
assertions to inspect wa.call_args_list for each call and assert both positional
args and keyword args: verify that each call's first positional arg equals the
action string ('BEGIN','WRITE','END'), that the 'WRITE' call includes the
expected payload (e.g., audio bytes or dict) as its subsequent
positional/keyword arg, and that 'BEGIN' and 'END' calls explicitly include the
expected payload/metadata (or None) rather than missing the argument; reference
wa and wa.call_args_list in nodes/test/audio_tts/test_iinstance.py to locate and
update the assertions accordingly.

In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py`:
- Around line 191-205: The code currently writes audio_arr to a temp WAV file
(using tempfile.mkstemp, wave.open on wav_path, then reopening and reading it
into raw) causing unnecessary disk I/O; replace this round-trip by importing io
and writing the WAV into an in-memory BytesIO: open a io.BytesIO(), use
wave.open on that buffer in 'wb' mode to setnchannels/setsampwidth/setframerate
and writeframes((audio_arr * 32767).astype(np.int16).tobytes()), then call
buffer.getvalue() to get raw — remove tempfile.mkstemp, os.close, os.remove and
the wav_path/wavf file operations accordingly so raw comes directly from memory.

---

Duplicate comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 58-67: The TTS config builder _build_tts_config_dict currently
drops temp_output_max_age_sec causing _cleanup_stale_outputs (which reads
self._config) to always fall back to the default; update _build_tts_config_dict
to include temp_output_max_age_sec (parse/validate as int with the existing
default, e.g., 3600) in the returned dict so config refreshes preserve any
override, and apply the same fix to the other config-building site referenced
(the similar dict return around lines 91-92) to ensure both paths keep
temp_output_max_age_sec.

In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 47-49: The calls to self.instance.writeAudio in IInstance.py use
two arguments for AVI_ACTION.BEGIN and AVI_ACTION.END but the writeAudio
signature requires a buffer argument; update the BEGIN and END invocations to
pass an explicit buffer (e.g., an empty bytes object) as the third parameter so
the calls match the writeAudio(signature) used by writeAudio in rocketlib (use
self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'') and similarly for END),
ensuring the buffer is of type bytes to avoid the TypeError at runtime.

In `@nodes/test/audio_tts/test_iglobal.py`:
- Around line 89-98: The test test_temp_file_removed_on_engine_exception
currently only asserts the RuntimeError is raised; after the with
pytest.raises(...) block, add an assertion that the temporary file was removed
by checking that mock_engine.config['output_path'] does not exist (use
os.path.exists or Path.exists on mock_engine.config['output_path']) to ensure
ig.synthesize cleans up temp files on engine exceptions.

In `@packages/ai/src/ai/common/models/audio/spacy_en_model.py`:
- Around line 67-72: The subprocess.run call in spacy_en_model.py that executes
[sys.executable, '-m', 'pip', 'install', url] should include a timeout parameter
to avoid hanging on stalled networks; update the subprocess.run invocation to
pass a reasonable timeout (e.g., timeout=60) and handle
subprocess.TimeoutExpired by catching it (try/except around subprocess.run) and
logging or raising a clear error so initialization fails fast; locate the call
to subprocess.run in the module (the pip install block) and add timeout and
appropriate exception handling there.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d5161b7-2f28-4813-9b62-73013ce54099

📥 Commits

Reviewing files that changed from the base of the PR and between 3a7c119 and 244ce4e.

📒 Files selected for processing (21)
  • nodes/src/nodes/audio_tts/IGlobal.py
  • nodes/src/nodes/audio_tts/IInstance.py
  • nodes/src/nodes/audio_tts/README.md
  • nodes/src/nodes/audio_tts/__init__.py
  • nodes/src/nodes/audio_tts/requirements.txt
  • nodes/src/nodes/audio_tts/services.json
  • nodes/src/nodes/audio_tts/tts_engine.py
  • nodes/test/audio_tts/__init__.py
  • nodes/test/audio_tts/conftest.py
  • nodes/test/audio_tts/test_iglobal.py
  • nodes/test/audio_tts/test_iinstance.py
  • nodes/test/audio_tts/test_normalize.py
  • nodes/test/audio_tts/test_tts_engine.py
  • nodes/test/conftest.py
  • packages/ai/src/ai/common/models/__init__.py
  • packages/ai/src/ai/common/models/audio/__init__.py
  • packages/ai/src/ai/common/models/audio/kokoro_loader.py
  • packages/ai/src/ai/common/models/audio/requirements_kokoro.txt
  • packages/ai/src/ai/common/models/audio/spacy_en_model.py
  • packages/ai/src/ai/common/models/base.py
  • pipelines/text_file_to_audio.pipe

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
nodes/src/nodes/audio_tts/IInstance.py (1)

28-30: ⚠️ Potential issue | 🔴 Critical

Pass b'' on the BEGIN/END marker writes.

Lines 28 and 30 still call writeAudio with only two arguments. The current writeAudio(action, mimeType, data) implementations require the third data parameter, so this path throws TypeError before any audio is emitted.

Match the `writeAudio` signature
-            self.instance.writeAudio(AVI_ACTION.BEGIN, mime)
+            self.instance.writeAudio(AVI_ACTION.BEGIN, mime, b'')
             self.instance.writeAudio(AVI_ACTION.WRITE, mime, raw)
-            self.instance.writeAudio(AVI_ACTION.END, mime)
+            self.instance.writeAudio(AVI_ACTION.END, mime, b'')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IInstance.py` around lines 28 - 30, The calls to
writeAudio in IInstance.py are missing the third data argument and should pass
an empty bytes marker; update the BEGIN and END invocations (the calls using
AVI_ACTION.BEGIN and AVI_ACTION.END) to call writeAudio(AVI_ACTION.BEGIN, mime,
b'') and writeAudio(AVI_ACTION.END, mime, b'') so the signature
writeAudio(action, mimeType, data) is satisfied (leave the WRITE call with
actual raw data unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 29-30: The code only reads kokoro_voice and always builds/requests
WAV data and returns audio/wav; update the logic to wire the configured output
format from Config.getNodeConfig(self.glb.logicalType, self.glb.connConfig) into
the synthesize() call and downstream handling: read the configured format (e.g.
"mp3" or "wav") alongside kokoro_voice, pass that format option into
synthesize() (or trigger wav_to_mp3 conversion when format == "mp3"), choose the
correct base64 response field (wav_base64 vs mp3_base64) from the API response,
and set the returned Content-Type accordingly; modify the code block that
currently always builds a .wav (the logic in synthesize()/lines 57-92) to branch
on the requested format and use wav_to_mp3 conversion path when required.
- Around line 19-25: Add PEP 257 docstrings to the IGlobal class and its
lifecycle methods (beginGlobal, synthesize, endGlobal) describing: the
CONFIG-mode no-op behavior, the split between remote and local setup (how
_remote_client and _pipeline are used), and the synthesize return contract as a
dict with keys 'path' and 'mime_type' (including expected types/semantics);
ensure docstrings use single-quoted strings, follow PEP 257 style for classes
and methods, and run ruff/formatting for Python 3.10+ compliance across IGlobal,
beginGlobal, synthesize, and endGlobal locations referenced in the diff.
- Around line 36-40: The depends(...) installation is being called before
checking model-server mode, causing unnecessary installs for remote-only
deployments; change the flow in IGlobal.py so get_model_server_address() is
evaluated first and only if it indicates a local mode do you call depends(...).
Concretely, move the depends(os.path.join(..., 'requirements.txt')) call inside
the branch that runs when get_model_server_address() indicates no remote server
(or local mode), and apply the same reorder/fix for the similar block around the
other depends usage (the block covering lines 49-54) so depends() is never
invoked when operating in remote/model-server mode.

---

Duplicate comments:
In `@nodes/src/nodes/audio_tts/IInstance.py`:
- Around line 28-30: The calls to writeAudio in IInstance.py are missing the
third data argument and should pass an empty bytes marker; update the BEGIN and
END invocations (the calls using AVI_ACTION.BEGIN and AVI_ACTION.END) to call
writeAudio(AVI_ACTION.BEGIN, mime, b'') and writeAudio(AVI_ACTION.END, mime,
b'') so the signature writeAudio(action, mimeType, data) is satisfied (leave the
WRITE call with actual raw data unchanged).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9227144b-8117-4bf4-ab10-898eba989d9a

📥 Commits

Reviewing files that changed from the base of the PR and between 244ce4e and 857b9b4.

📒 Files selected for processing (2)
  • nodes/src/nodes/audio_tts/IGlobal.py
  • nodes/src/nodes/audio_tts/IInstance.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nodes/src/nodes/audio_tts/IGlobal.py`:
- Around line 100-105: Replace the try/except/pass used to remove a temporary
file inside the except Exception handler with contextlib.suppress for clarity:
import contextlib at top and change the inner block that currently attempts to
os.remove(out_path) inside the except Exception in IGlobal.py to use with
contextlib.suppress(OSError): os.remove(out_path), then re-raise the original
exception as before.
- Around line 83-89: The loop over self._pipeline may receive None or
torch.Tensor audio; update the iteration in IGlobal (the block using for _gs,
_ps, audio in self._pipeline(...)) to skip when audio is None, to detect if
audio is a torch.Tensor and call .detach().cpu().numpy() before converting to a
numpy array, then convert to np.float32, check for zero-size and flatten
(arr.reshape(-1)) as currently done, and append to chunks; ensure you reference
the variables audio, _gs, _ps, self._pipeline, and chunks when making these
changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e1741c9-2c58-467d-b637-0ae093281709

📥 Commits

Reviewing files that changed from the base of the PR and between 857b9b4 and 360cc48.

📒 Files selected for processing (1)
  • nodes/src/nodes/audio_tts/IGlobal.py

Comment on lines +100 to +105
except Exception:
try:
os.remove(out_path)
except OSError:
pass
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider contextlib.suppress for cleaner exception handling.

The try-except-pass pattern can be simplified using contextlib.suppress(OSError).

Proposed fix
+import contextlib
...
         except Exception:
-            try:
-                os.remove(out_path)
-            except OSError:
-                pass
+            with contextlib.suppress(OSError):
+                os.remove(out_path)
             raise
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 101-104: Use contextlib.suppress(OSError) instead of try-except-pass

Replace try-except-pass with with contextlib.suppress(OSError): ...

(SIM105)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nodes/src/nodes/audio_tts/IGlobal.py` around lines 100 - 105, Replace the
try/except/pass used to remove a temporary file inside the except Exception
handler with contextlib.suppress for clarity: import contextlib at top and
change the inner block that currently attempts to os.remove(out_path) inside the
except Exception in IGlobal.py to use with contextlib.suppress(OSError):
os.remove(out_path), then re-raise the original exception as before.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
packages/ai/src/ai/common/models/audio/kokoro_loader.py (1)

187-201: ⚠️ Potential issue | 🟠 Major

Keep WAV serialization in memory.

This path already has the PCM buffer in audio_arr; writing a temp file for every item adds avoidable filesystem latency and temp-file churn on the model-server hot path.

♻️ Proposed fix
-                fd, wav_path = tempfile.mkstemp(suffix='.wav')
-                os.close(fd)
-                try:
-                    with wave.open(wav_path, 'wb') as wavf:
-                        wavf.setnchannels(1)
-                        wavf.setsampwidth(2)
-                        wavf.setframerate(24000)
-                        wavf.writeframes((audio_arr * 32767).astype(np.int16).tobytes())
-                    with open(wav_path, 'rb') as f:
-                        raw = f.read()
-                finally:
-                    try:
-                        os.remove(wav_path)
-                    except OSError:
-                        pass
+                buffer = io.BytesIO()
+                with wave.open(buffer, 'wb') as wavf:
+                    wavf.setnchannels(1)
+                    wavf.setsampwidth(2)
+                    wavf.setframerate(24000)
+                    wavf.writeframes((audio_arr * 32767).astype(np.int16).tobytes())
+                raw = buffer.getvalue()

Also add import io near the other imports and drop the now-unused tempfile import.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py` around lines 187 -
201, The code currently writes audio_arr to a temporary WAV file using
tempfile.mkstemp and reads it back into raw, causing unnecessary filesystem I/O;
change this to serialize the WAV in-memory by creating an io.BytesIO() buffer,
opening it with wave.open(buffer, 'wb'), setnchannels/setsampwidth/setframerate
and writeframes((audio_arr * 32767).astype(np.int16).tobytes()), then obtain raw
= buffer.getvalue() and close the buffer; also add import io to the module and
remove the now-unused tempfile.mkstemp/os.close/os.remove usage and imports to
eliminate temp-file churn (refer to audio_arr, raw, wave.open and
tempfile.mkstemp in the existing block).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py`:
- Around line 221-223: In postprocess, the branch that returns a list of empty
dicts uses "batch_size or 1" which forces a single phantom item when batch_size
is 0; change it to respect a zero-length batch by returning [{} for _ in
range(batch_size if batch_size is not None else 0)] (i.e., remove the "or 1"
fallback) so that when batch_size == 0 the function returns an empty list and
preserves input/output cardinality; update the branch where items is falsy to
use that expression.

---

Duplicate comments:
In `@packages/ai/src/ai/common/models/audio/kokoro_loader.py`:
- Around line 187-201: The code currently writes audio_arr to a temporary WAV
file using tempfile.mkstemp and reads it back into raw, causing unnecessary
filesystem I/O; change this to serialize the WAV in-memory by creating an
io.BytesIO() buffer, opening it with wave.open(buffer, 'wb'),
setnchannels/setsampwidth/setframerate and writeframes((audio_arr *
32767).astype(np.int16).tobytes()), then obtain raw = buffer.getvalue() and
close the buffer; also add import io to the module and remove the now-unused
tempfile.mkstemp/os.close/os.remove usage and imports to eliminate temp-file
churn (refer to audio_arr, raw, wave.open and tempfile.mkstemp in the existing
block).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c38d177b-cc92-499c-bdb2-b9e6942905ef

📥 Commits

Reviewing files that changed from the base of the PR and between 360cc48 and 9ee9bd4.

📒 Files selected for processing (1)
  • packages/ai/src/ai/common/models/audio/kokoro_loader.py

- Node: IGlobal/IInstance/TTSEngine aligned with reference for engine=kokoro
- services.json: single kokoro profile and voice catalog
- packages/ai: KokoroLoader, spacy_en_model, wav_to_mp3, requirements_kokoro
- depends() installs slim requirements (no piper/transformers/cloud)
- Tests under nodes/test/audio_tts; audio_tts in dynamic skip_nodes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation module:ai AI/ML modules module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add text-to-speech node to pipeline library [docs-drift] Documentation may need updates for new Qwen LLM pipeline node

1 participant