feat(parakeet-cpp): real segment timestamps (NeMo-faithful) by localai-bot · Pull Request #10207 · mudler/LocalAI

localai-bot · 2026-06-07T08:47:50Z

What

Gives the parakeet-cpp backend real, NeMo-faithful segment timestamps instead of the single synthetic whole-clip segment it emits today.

Offline (/v1/audio/transcriptions):

Words are grouped into segments exactly like NeMo's get_segment_offsets — a new segment starts after sentence-ending punctuation (., ?, !), and each segment carries start/end plus the token ids whose timestamps fall in its window.
Punctuation-only by default (matches NeMo's default segment_gap_threshold=None). An opt-in model option segment_gap_threshold (NeMo's unit: encoder frames, default 0=off) additionally splits on inter-word silence; it's converted to seconds via the new frame_sec the engine reports.
Per-segment words remain gated behind timestamp_granularities=["word"]; a zero-word document falls back to a single text segment (no regression).

Streaming (stream=true):

When libparakeet.so exposes the new ABI v4 JSON entry points (probed at load), the backend drives parakeet_capi_stream_feed_json / _finalize_json and accumulates the streamed per-word timestamps into per-utterance segments (EOU stays the boundary), so streaming FinalResult segments now carry start/end. Falls back to the existing text-only feed against an older library — no hard version coupling.

Why

Matches what NeMo produces for these checkpoints (model.transcribe(..., timestamps=True) at the segment level), so downstream consumers get usable segment timing. Diarization/speaker labels are explicitly out of scope — the Parakeet/Nemotron models don't support it.

Depends on

mudler/parakeet.cpp#16 (adds frame_sec to the JSON + the ABI v4 streaming JSON entry points). The Go side probes for the new symbols, so it builds and runs against an older libparakeet.so (punctuation-only, text-only streaming) until that lands.

Tests

Pure-Go Ginkgo specs (no model needed) cover splitWordsIntoSegments (punctuation + gap rules, NeMo elif order, empty/fallback), transcriptResultFromDoc (multi-segment output, token-window assignment, word-granularity gate, zero-word fallback), and the streaming segmenter. make lint (new-from-merge-base) clean; existing model-gated specs still skip without a model.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Offline: replace the single synthetic whole-clip segment with multiple segments grouped exactly like NeMo's get_segment_offsets - a new segment after sentence-ending punctuation ('. ? !'), each carrying start/end and its time-window token ids. The optional model option segment_gap_threshold (NeMo's unit: encoder FRAMES, default 0=off) adds NeMo's silence-gap split, converted to seconds via the JSON frame_sec the engine now reports. Per-segment words are still gated behind timestamp_granularities=["word"]; a zero-word document falls back to a single text segment. Streaming: when libparakeet.so exposes the ABI v4 JSON entry points (probed), drive parakeet_capi_stream_feed_json / _finalize_json and accumulate the streamed per-word timestamps into per-utterance segments (EOU stays the boundary), so streaming FinalResult segments now carry start/end. Falls back to the text-only feed against an older library. Pure-Go specs cover splitWordsIntoSegments (punctuation + gap rules, NeMo elif order, fallback), transcriptResultFromDoc (multi-segment, token windows, word-granularity gate), and the streaming segmenter. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…hreshold Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

The offline AudioTranscription specs asserted the old single synthetic segment (Segments HaveLen(1), Segments[0].Text == res.Text). With NeMo-faithful segmentation a multi-sentence clip now yields multiple punctuation-delimited segments, so assert the new contract instead: one-or-more time-ordered segments, each with text and (under word granularity) per-segment words whose span tracks the segment start/end. Caught by running the model-gated suite on the dgx (GB10) against the real tdt_ctc-110m + realtime_eou models. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler added 2 commits June 7, 2026 08:47

docs(audio): document parakeet-cpp segment timestamps + segment_gap_t…

dd04a9b

…hreshold Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

localai-bot mentioned this pull request Jun 7, 2026

feat(capi): segment-timestamp support (frame_sec + streaming JSON, ABI v4) mudler/parakeet.cpp#16

Merged

mudler merged commit a7cb587 into master Jun 7, 2026
74 of 75 checks passed

mudler deleted the feat/parakeet-segment-timestamps branch June 7, 2026 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(parakeet-cpp): real segment timestamps (NeMo-faithful)#10207

feat(parakeet-cpp): real segment timestamps (NeMo-faithful)#10207
mudler merged 3 commits into
masterfrom
feat/parakeet-segment-timestamps

localai-bot commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Jun 7, 2026

What

Why

Depends on

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants