Skip to content

feat(capi): segment-timestamp support (frame_sec + streaming JSON, ABI v4)#16

Open
localai-bot wants to merge 1 commit into
masterfrom
feat/segment-timestamps-capi
Open

feat(capi): segment-timestamp support (frame_sec + streaming JSON, ABI v4)#16
localai-bot wants to merge 1 commit into
masterfrom
feat/segment-timestamps-capi

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

What

Adds the C-ABI surface LocalAI needs to emit NeMo-faithful segment timestamps from the parakeet-cpp backend.

  • Offline JSON (parakeet_capi_transcribe_*_json) now includes a top-level "frame_sec" field (the encoder frame stride in seconds, hop * subsampling / sample_rate). Consumers multiply NeMo's frame-unit segment_gap_threshold by it to get the seconds gap between words when forming segments.
  • New streaming JSON entry points parakeet_capi_stream_feed_json / parakeet_capi_stream_finalize_json return {text, eou, frame_sec, words}, surfacing the streaming session's already-existing drain_words() per-word start/end/conf alongside the newly-finalized text and EOU flag — so callers can build timestamped per-utterance segments.

Bumps PARAKEET_CAPI_ABI_VERSION to 4. All existing entry points are unchanged; the new symbols are additive (consumers probe for them, so older libparakeet.so still works).

Why

The LocalAI parakeet-cpp backend currently collapses everything into one synthetic whole-clip segment offline, and untimestamped text-only segments while streaming. This PR exposes the per-word timestamps + frame_sec the LocalAI side needs to replicate NeMo's get_segment_offsets (punctuation-only by default, optional frame-gap split) and attach real start/end times to segments.

Tests

tests/test_capi_stream_json.cpp drives the new streaming JSON path on the cache-aware EOU model and asserts the documents carry frame_sec + per-word timestamps. Skips (exit 77) when PARAKEET_TEST_GGUF_EOU is unset, like the sibling streaming tests. Library + test build and link clean.

Consumed by the companion LocalAI PR (real segment timestamps for parakeet-cpp).

…SON)

Add the data LocalAI needs to build NeMo-faithful segment timestamps:

- Offline JSON (transcribe_*_json) now carries "frame_sec", the encoder
  frame stride in seconds, so a consumer can convert NeMo's frame-unit
  segment_gap_threshold into the seconds gap between words.

- New streaming JSON entry points parakeet_capi_stream_feed_json /
  parakeet_capi_stream_finalize_json return {text, eou, frame_sec, words}
  by surfacing the streaming session's existing drain_words() per-word
  start/end/conf alongside the newly-finalized text and EOU flag.

Bumps PARAKEET_CAPI_ABI_VERSION to 4. All existing entry points are
unchanged; the new symbols are additive (consumers probe for them).

tests/test_capi_stream_json.cpp drives the new streaming JSON path on the
EOU model (skips with 77 when PARAKEET_TEST_GGUF_EOU is unset, like the
sibling streaming tests).

Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@localai-bot
Copy link
Copy Markdown
Collaborator Author

Consumed by LocalAI PR: mudler/LocalAI#10207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants