Implement formatter to ensure Nemotron VoiceChat speech decoder reproducibility, speed up training and support half precision inference by Edresson · Pull Request #15583 · NVIDIA-NeMo/NeMo

Edresson · 2026-04-06T13:01:13Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

This PR improves the Nemotron VoiceChat speech decoder's reproducibility and training performance through the following changes:

Adds Data Formatter: Implements the s2s_duplex_reverse_role formatter to reliably swap speaker roles and audio streams.
Fixes Device Mismatch: Forces the RVQ audio codec to instantiate on the main model's device during setup, resolving CPU/GPU DDP synchronization crashes on true precision.
Accelerates Training: Changes the default DDP strategy to find_unused_parameters: false to reduce overhead and speed up the training loop.
Support half precision inference: Changes evaluation script to support half precision inference and update it to use torch dataloader for simplicity.

Collection: [Note which collection this PR will affect]
SpeechLM2

pzelasko · 2026-04-06T13:36:59Z

nemo/collections/common/data/lhotse/cutset.py



+@data_type_parser(["s2s_duplex_reverse_role"])
+def read_s2s_duplex_reverse_role(config) -> Tuple[CutSet, bool]:


Add unit test coverage

I added 3 new unit test covering all 3 duplex formatters (all the duplex formatters was not covered by unit tests). Let me know if you think it is good enough.

nemo/collections/common/data/lhotse/cutset.py

pzelasko · 2026-04-06T13:40:05Z

nemo/collections/common/data/lhotse/cutset.py

+
+    def convert_cut_fn(cut: Cut) -> Cut:
+        """Convert a single cut by swapping supervisions and audio streams."""
+        new_cut = fastcopy(cut)


use copy.copy() or deepcopy() instead, fastcopy is a very shallow copy, when you modify supervisions later, they will be modified on the original object too.

you can keep using fastcopy() if you construct a new list of supervisions, then fastcopy(cut, supervisions=[...])

Done. I also updated magpietts/tts data formatter.

Signed-off-by: Edresson <Edresson@users.noreply.github.com>

Signed-off-by: Edresson Casanova <edresson1@gmail.com>

pzelasko · 2026-04-08T18:47:02Z

examples/speechlm2/duplex_eartts_eval.py

-        )
+        if cfg.get("keep_codec_original_dtype", True):
+            model.tts_model.to(dtype=target_dtype)
+            model.on_train_epoch_start()  # ensures that codec is in the right precision


I don't like training details leaking into inference; could we create a method model.setup_precision() on NemotronVoicechat?

I will move ensures_codec_target_dtype that is called inside of on_train_epoch_start() to a method inside of the Duplex EARTTS class, that way we can call it directly without calling on_train_epoch_start()

pzelasko · 2026-04-08T18:48:17Z

examples/speechlm2/duplex_eartts_eval.py

-                out_path,
-                wav,
-                samplerate=model.target_sample_rate,
+        if cfg.get("debug_dtype", False) and batch_id == 0:


Can debug logic be moved to a separate function and invoked here for better readability / to avoid inflating the inference loop size?

pzelasko

Thanks, minor comments left, good work

…sion Signed-off-by: Edresson Casanova <edresson1@gmail.com>

Signed-off-by: Edresson <Edresson@users.noreply.github.com>

github-actions bot added the common label Apr 6, 2026

Edresson requested a review from pzelasko April 6, 2026 13:01

Edresson added the Run CICD label Apr 6, 2026

Edresson temporarily deployed to test April 6, 2026 13:07 — with GitHub Actions Inactive

pzelasko reviewed Apr 6, 2026

View reviewed changes

chtruong814 added Run CICD and removed Run CICD labels Apr 6, 2026

Edresson force-pushed the main_april branch from 06640be to 5b43960 Compare April 6, 2026 17:59

chtruong814 added Run CICD and removed Run CICD labels Apr 6, 2026

Edresson force-pushed the main_april branch from 2cc0f47 to 42da55b Compare April 6, 2026 18:01

chtruong814 added Run CICD and removed Run CICD labels Apr 6, 2026

chtruong814 temporarily deployed to test April 6, 2026 18:04 — with GitHub Actions Inactive

chtruong814 added Run CICD and removed Run CICD labels Apr 7, 2026

chtruong814 had a problem deploying to test April 7, 2026 21:36 — with GitHub Actions Error

chtruong814 added Run CICD and removed Run CICD labels Apr 7, 2026

chtruong814 had a problem deploying to test April 7, 2026 22:23 — with GitHub Actions Error