Conversation
.../react-native-executorch/common/rnexecutorch/models/speech_to_text/common/schema/OnlineASR.h
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/ASR.h
Outdated
Show resolved
Hide resolved
3ca7f15 to
ea943e4
Compare
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.h
Outdated
Show resolved
Hide resolved
7b1e6ff to
2ee6d1d
Compare
msluszniak
left a comment
There was a problem hiding this comment.
Some comments are not needed imo
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/SpeechToText.h
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/Utils.h
Outdated
Show resolved
Hide resolved
...ages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/OnlineASR.cpp
Outdated
Show resolved
Hide resolved
...ages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/OnlineASR.cpp
Show resolved
Hide resolved
...act-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/HypothesisBuffer.cpp
Outdated
Show resolved
Hide resolved
chmjkb
left a comment
There was a problem hiding this comment.
Overall solid work, thanks 👏🏻
Left a couple of nits
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/common/schema/ASR.h
Outdated
Show resolved
Hide resolved
...act-native-executorch/common/rnexecutorch/models/speech_to_text/common/types/ProcessResult.h
Show resolved
Hide resolved
...-native-executorch/common/rnexecutorch/models/speech_to_text/common/types/GenerationResult.h
Show resolved
Hide resolved
packages/react-native-executorch/common/rnexecutorch/models/speech_to_text/whisper/Constants.h
Outdated
Show resolved
Hide resolved
packages/react-native-executorch/src/modules/natural_language_processing/SpeechToTextModule.ts
Outdated
Show resolved
Hide resolved
| this->decoder->unload(); | ||
| : callInvoker_(std::move(callInvoker)) { | ||
| // Switch between the ASR implementations based on model name | ||
| if (modelName == "whisper") { |
There was a problem hiding this comment.
food for thought: as we discussed a few days back, think about how we can make it work so that the native side doesn't need the model name, but accepts a bunch of configurable pipeline steps. no need to do this now IMO, but just a note.
Maybe we can have different ASR implementations based on whether the model does support timestamps or not?
| std::shared_ptr<OwningArrayBuffer> | ||
| SpeechToText::encode(std::span<float> waveform) const { | ||
| std::vector<float> encoderOutput = this->asr->encode(waveform); | ||
| std::vector<float> encoderOutput = transcriber_->encode(waveform); |
There was a problem hiding this comment.
I'm thinking whether we need to return std::vector from the encoder? Maybe we would just return a span. We wrap this in OwningArrayBuffer, which copies the data.
There was a problem hiding this comment.
I changed the encode & decode methods to return ExecuTorch Tensor - so no redundant copies now I guess.
chmjkb
left a comment
There was a problem hiding this comment.
Two more things:
- I wasn't able to compile the app for Android (due to Norbert bumping minSdkVersion in RNET). You have to bump the minSdkVersion in the example app.
- Once compiled, it doesn't ask for mic permissions (im using a Pixel 10) and silently fails.
ef854bc to
d253381
Compare
a140cfb to
bf69627
Compare
816c75a to
ae017ef
Compare
| if ( | ||
| !modelSources || | ||
| !tokenizerSources || | ||
| !modelSources[0] || | ||
| !tokenizerSources[0] | ||
| ) { |
There was a problem hiding this comment.
| if ( | |
| !modelSources || | |
| !tokenizerSources || | |
| !modelSources[0] || | |
| !tokenizerSources[0] | |
| ) { | |
| if ( | |
| !modelSources?.[0] || | |
| !tokenizerSources?.[0] | |
| ) { |
chmjkb
left a comment
There was a problem hiding this comment.
I think you should change the TS side as you're returnign a different thing from C++, for example:
public async encode(waveform: Float32Array): Promise<Float32Array> {
return new Float32Array(await this.nativeModule.encode(waveform));
}
Also, why did you switch back to type in SpeechToTextModelConfig?
Description
Various improvements & adjustments in Speech-to-Text module. The list of changes includes:
Introduces a breaking change?
Type of change
Tested on
Testing instructions
You can run the tests defined for Speech-to-Text module, as well as test it manually with the 'speech' demo app (SpeechToText screen).
Screenshots
Related issues
Checklist
Additional notes