Skip to content

Batch mode for nemotron: batched causal subsampling + batched target_lang C-API#11

Merged
mudler merged 2 commits into
masterfrom
feat/nemotron-batch-mode
Jun 6, 2026
Merged

Batch mode for nemotron: batched causal subsampling + batched target_lang C-API#11
mudler merged 2 commits into
masterfrom
feat/nemotron-batch-mode

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #10. Makes the multilingual nemotron model usable in batch mode, so LocalAI's request-coalescing batcher can serve it.

Two changes:

  1. Batched causal subsampling. nemotron is a causal model, and the batched subsampling path asserted !(causal_ && B > 1) (an uncatchable SIGABRT on any B>1 batch). It turns out the causal branch in build_graph_batched was already batch-aware (leading pad applied uniformly across the batch, per-item trailing-pad time masking on the batch axis, the all_paddings=3 valid-length recurrence). The guard was conservative, not a real limitation. Removed it and validated rigorously: a clip in a B>1 batch is byte-for-byte identical to the same clip transcribed standalone (uniform, mixed-length, reversed order, and with timestamps). Since per-item already equals NeMo at WER 0, batched equals NeMo too. The non-causal path is untouched (the diff is only the assert line and its comment).

  2. Batched target_lang C-API. Added parakeet_capi_transcribe_pcm_batch_json_lang and parakeet_capi_transcribe_pcm_batch_lang so the batched path can select a language (one language per batch). The existing non-lang batch functions now delegate to these with NULL, so behavior is unchanged for every other model. ABI stays at v3 (unreleased), comment extended.

Validation

  • New tests/test_subsampling_batch_causal.cpp: batched vs per-item byte-identical for uniform, mixed-length, reversed, and a truncated-clip batch, plus batched timestamps. Exact string equality, no tolerances.
  • tests/test_capi_batch_json.cpp: a 2-clip batch_json_lang(..., "de") returns a length-2 JSON array; unknown locale returns NULL with last_error.
  • Full suite green (55 tests). The non-causal batched paths are byte-identical by construction.

Notes

  • One language applies to the whole batch (the prompt one-hot is per-utterance, constant over time).
  • The LocalAI parakeet-cpp backend wiring (call the _lang batch variant, pass the request language) is tracked separately for the gallery work.

🤖 Generated with Claude Code

mudler and others added 2 commits June 6, 2026 08:37
…ang / _batch_lang)

Add language-aware batched C-API entry points so a request-coalesced batch
can select one language prompt for the whole batch on multilingual
(nemotron) models:

  char* parakeet_capi_transcribe_pcm_batch_json_lang(...)
  int   parakeet_capi_transcribe_pcm_batch_lang(...)

The existing non-lang batch functions now delegate to these with nullptr
(model default), mirroring the Phase 4 single-clip pattern, so no logic is
duplicated. target_lang threads into the C++ batch methods that already
accept it; NULL/"" means the model default, non-prompt models ignore it,
and an unknown locale is caught by the existing try/catch (NULL / nonzero +
last_error). ABI stays v3 (unreleased on this branch); the v3 comment now
lists the two new symbols.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… to per-item); enable batched nemotron

The build_graph_batched causal branch already applied the leading
ggml_pad_ext uniformly across the batch and masked each item's trailing
pad time frames per stage via the all_paddings=3 valid-length
recurrence, so it reproduces the standalone causal boundary per item.
The B>1 guard assert was a conservative leftover; remove it so the
multilingual streaming nemotron model can run real batches.

Validated byte-identical: a clip transcribed inside a B>1 batch (uniform,
mixed-length, reversed order, and a non-empty truncated item as the
padded/masked clip) equals the same clip transcribed standalone, plus
batched timestamps text parity. Re-enable the positive valid-language
(de) 2-clip batch JSON assertion in test_capi_batch_json. No change to
the non-causal path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mudler mudler merged commit 50dfc24 into master Jun 6, 2026
8 checks passed
@mudler mudler deleted the feat/nemotron-batch-mode branch June 6, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants