Add a persistent Parakeet helper for low-latency host integrations#18861
Add a persistent Parakeet helper for low-latency host integrations#18861seyeong-han wants to merge 1 commit intomainfrom
Conversation
Factor the Parakeet transcription logic out of the one-shot runner so host apps can keep the model warm across requests. Build the new helper alongside the runner and document the helper workflow for app integrations. Made-with: Cursor
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18861
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 1 Cancelled Job, 1 Unrelated FailureAs of commit b54a81c with merge base 411ede2 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Summary
parakeet_runnerinto a sharedParakeetTranscriberclassparakeet_helperbinary plus a stdin/stdout helper protocol for long-lived host integrationsWhy a helper?
The Voxtral Realtime macOS app (
executorch-examples/voxtral_realtime/macos) didn't need any changes to the executorch repo becausevoxtral_realtime_runnerwas already designed as a streaming, long-running process — the app just launches it and feeds audio.parakeet_runneris different: it's a one-shot batch CLI tool that loads the model, transcribes one WAV file, prints the result, and exits. There's no way to send it a second request without restarting the process and paying the ~1.4 s model-load cost again.The ExecuWhisper macOS app (meta-pytorch/executorch-examples#232) runs repeated record-then-transcribe requests via system dictation, so a fresh process per recording is too slow.
parakeet_helperfills that gap — it's the Parakeet equivalent of what the Voxtral Realtime runner already does natively: stay alive, keep the model warm, and accept multiple requests over stdin/stdout.Test plan
cmake --preset llm-metal-stats -DEXECUTORCH_BUILD_MLX=OFFcmake --build --preset llm-metal-stats-installcd examples/models/parakeet && cmake --build --preset parakeet-metal-- bothparakeet_runnerandparakeet_helperlink successfullyMade-with: Cursor