Batch mode for nemotron: batched causal subsampling + batched target_lang C-API#11
Merged
Conversation
…ang / _batch_lang) Add language-aware batched C-API entry points so a request-coalesced batch can select one language prompt for the whole batch on multilingual (nemotron) models: char* parakeet_capi_transcribe_pcm_batch_json_lang(...) int parakeet_capi_transcribe_pcm_batch_lang(...) The existing non-lang batch functions now delegate to these with nullptr (model default), mirroring the Phase 4 single-clip pattern, so no logic is duplicated. target_lang threads into the C++ batch methods that already accept it; NULL/"" means the model default, non-prompt models ignore it, and an unknown locale is caught by the existing try/catch (NULL / nonzero + last_error). ABI stays v3 (unreleased on this branch); the v3 comment now lists the two new symbols. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… to per-item); enable batched nemotron The build_graph_batched causal branch already applied the leading ggml_pad_ext uniformly across the batch and masked each item's trailing pad time frames per stage via the all_paddings=3 valid-length recurrence, so it reproduces the standalone causal boundary per item. The B>1 guard assert was a conservative leftover; remove it so the multilingual streaming nemotron model can run real batches. Validated byte-identical: a clip transcribed inside a B>1 batch (uniform, mixed-length, reversed order, and a non-empty truncated item as the padded/masked clip) equals the same clip transcribed standalone, plus batched timestamps text parity. Re-enable the positive valid-language (de) 2-clip batch JSON assertion in test_capi_batch_json. No change to the non-causal path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #10. Makes the multilingual nemotron model usable in batch mode, so LocalAI's request-coalescing batcher can serve it.
Two changes:
Batched causal subsampling. nemotron is a causal model, and the batched subsampling path asserted
!(causal_ && B > 1)(an uncatchable SIGABRT on any B>1 batch). It turns out the causal branch inbuild_graph_batchedwas already batch-aware (leading pad applied uniformly across the batch, per-item trailing-pad time masking on the batch axis, theall_paddings=3valid-length recurrence). The guard was conservative, not a real limitation. Removed it and validated rigorously: a clip in a B>1 batch is byte-for-byte identical to the same clip transcribed standalone (uniform, mixed-length, reversed order, and with timestamps). Since per-item already equals NeMo at WER 0, batched equals NeMo too. The non-causal path is untouched (the diff is only the assert line and its comment).Batched target_lang C-API. Added
parakeet_capi_transcribe_pcm_batch_json_langandparakeet_capi_transcribe_pcm_batch_langso the batched path can select a language (one language per batch). The existing non-lang batch functions now delegate to these with NULL, so behavior is unchanged for every other model. ABI stays at v3 (unreleased), comment extended.Validation
tests/test_subsampling_batch_causal.cpp: batched vs per-item byte-identical for uniform, mixed-length, reversed, and a truncated-clip batch, plus batched timestamps. Exact string equality, no tolerances.tests/test_capi_batch_json.cpp: a 2-clipbatch_json_lang(..., "de")returns a length-2 JSON array; unknown locale returns NULL with last_error.Notes
_langbatch variant, pass the requestlanguage) is tracked separately for the gallery work.🤖 Generated with Claude Code