Skip to content

feat: text-to-speech x LLM integration#936

Open
IgorSwat wants to merge 5 commits intomainfrom
@is/llm-to-speech
Open

feat: text-to-speech x LLM integration#936
IgorSwat wants to merge 5 commits intomainfrom
@is/llm-to-speech

Conversation

@IgorSwat
Copy link
Contributor

@IgorSwat IgorSwat commented Mar 4, 2026

Description

This pull request introduces a few changes to the Text-to-Speech module:

  • Improved streaming mode by allowing an incrementally expanded text input. This change focuses on integrating T2S with text generation models (e.g. Llama 3.2).
  • Added simple test cases for T2S module.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

To test the Text-to-Speech module, run the set of tests for this module.
To test the new streaming mode and it's integration with text generation models, one can use 'text-to-speech-llm' demo app.

Screenshots

Related issues

#773
#897

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@IgorSwat IgorSwat changed the title @is/llm to speech feat: text-to-speech x LLM integration & text-to-speech tests Mar 4, 2026
@IgorSwat IgorSwat changed the title feat: text-to-speech x LLM integration & text-to-speech tests feat: text-to-speech x LLM integration Mar 4, 2026
@IgorSwat IgorSwat added test Issue and PR related to tests or testing infrastructure feature PRs that implement a new feature labels Mar 5, 2026
Comment on lines +130 to +135
await tts.stream({
text: '',
speed: 0.9,
stopAutomatically: false,
onNext,
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need this, can you clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean? The 'text' field?

:::

2. [**`stream({ text, speed })`**](../../06-api-reference/classes/TextToSpeechModule.md#stream): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences.
2. [**`stream({ speed, stopAutomatically })`**](../../06-api-reference/classes/TextToSpeechModule.md#stream): An async generator that yields chunks of audio as they are computed. This is ideal for reducing the "time to first audio" for long sentences. In contrast to `forward`, it enables inserting text chunks dynamically into processing buffer with [**`streamInsert(text)`**](../../06-api-reference/classes/TextToSpeechModule.md#streaminsert) and allows stopping generation early with [**`streamStop(instant)`**](../../06-api-reference/classes/TextToSpeechModule.md#streamstop).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we're ditching the speed param?

#include <rnexecutorch/data_processing/Sequential.h>
#include <thread>

#include <rnexecutorch/Log.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant include

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note: it should be a factory from now, see #937

await this.nativeModule.stream(
speed,
stopAutomatically,
(audio: number[]) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return JSTensorViewOut from the native side, so there's less copying going on

Comment on lines +162 to +171
while (!this.streamFinished) {
if (queue.length > 0) {
yield queue.shift()!;
if (finished && queue.length === 0) {
if (this.streamFinished && queue.length === 0) {
return;
}
continue;
}
if (error) throw error;
if (finished) return;
if (this.streamFinished) return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we checking this.streamFinished twice?

Comment on lines +98 to +100
if (input.text) {
moduleInstance.streamInsert(input.text);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this thread safe? This appends to a std::string on a JS thread while a bg thread reads from it, right?

Comment on lines +86 to +91
# Phonemis
set(LIBS_DIR "${PACKAGE_ROOT}/third-party/android/libs")
set(PHONEMIS_LIBS
"${LIBS_DIR}/phonemis/${ANDROID_ABI}/libphonemis.a"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets follow the convention

add_library(phonemis STATIC IMPORTED)
  set_target_properties(phonemis PROPERTIES
      IMPORTED_LOCATION "${ANDROID_THIRD_PARTY}/phonemis/${ANDROID_ABI}/libphonemis.a"
  )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PRs that implement a new feature test Issue and PR related to tests or testing infrastructure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants