[None][fix] Fix chunked prefill API contract for nemotron nano VL by 2ez4bz · Pull Request #13025 · NVIDIA/TensorRT-LLM

2ez4bz · 2026-04-14T06:02:18Z

Summary by CodeRabbit

Improvements
- Enhanced diagnostic logging for multimodal embedding handling with more detailed information about returned tensors, embedding types, and token counts.
Tests
- Added comprehensive tests validating multimodal embedding caching behavior, encoder contract compliance, and chunked prefill compatibility.

Description

Why?

In order to opt into the caching functionality for chunked prefix, there are certain assumptions on the return type of the encoder's forward function. These assumptions did not hold for nemotron nano VL prior to this commit.

What?

This commit fixes this issue, and adds tests to catch regressions.

Test Coverage

See above.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-14T06:11:19Z

Caution

Review failed

An error occurred during the review process. Please try again later.

📝 Walkthrough

Walkthrough

These changes refactor multimodal embedding handling in Nemotron Nano models. The _encode_multimodal method now returns only a list of concatenated tensors instead of a tuple with token counts; token counts are stored directly in multimodal parameters. The forward method reconstructs token counts from context parameters when needed. Enhanced logging provides more detailed warning information for debugging.

Changes

Cohort / File(s)	Summary
Logging Enhancement `tensorrt_llm/_torch/models/modeling_multimodal_utils.py`	Enhanced `logger.warning` message to include number of returned embeddings, embedding type names, shapes, count of uncached multimodal params, and encoder function reference when `encoder_forward_fn` returns multiple tensors.
Core API Changes `tensorrt_llm/_torch/models/modeling_nemotron_nano.py`	Modified `_encode_multimodal` to return `List[torch.Tensor]` instead of tuple with token counts. Stores per-request `num_tokens_in_video` in `MultimodalParams.multimodal_data`. Updated `forward` to reconstruct `num_tokens_in_videos` from context params. Changed `merge_evs_mm_embeds` invocation to use sliced context params.
Test Updates `tests/unittest/_torch/modeling/test_modeling_nemotron_nano_v2_vl.py`	Added imports and comprehensive tests for multimodal caching contract. Updated existing dispatch tests for new return signature. Introduced `TestEncodeMultimodalContract` validating list-of-single-tensor return, and `TestChunkedPrefillCaching` verifying encoder caching behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Forward as Forward()
    participant Encoder as _encode_multimodal()
    participant Cache as MultimodalParams<br/>(Cache Storage)
    participant Merge as merge_evs_mm_embeds()
    participant Context as Context Params

    Forward->>Encoder: Call with uncached params
    Encoder->>Encoder: Vision/Audio encoding
    Encoder->>Cache: Store embeddings in<br/>multimodal_data
    Encoder->>Cache: Store num_tokens_in_video
    Encoder-->>Forward: Return List[Tensor]
    Forward->>Merge: Pass merged embeddings<br/>+ context params
    Merge-->>Forward: Merged result
    Forward->>Context: Reconstruct num_tokens<br/>from cached values
    Forward-->>Forward: Continue with<br/>merged embeddings

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: fixing the chunked prefill API contract for nemotron nano VL, which aligns with the primary objective and code modifications across all three files.
Description check	✅ Passed	The description adequately explains the issue (assumptions about encoder return type not holding for nemotron nano VL), the solution (fixing the API contract and adding tests), and provides test coverage details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

* Why? In order to opt into the caching functionality for chunked prefix, there are certain assumptions on the return type of the encoder's forward function. These assumptions did not hold for nemotron nano VL prior to this commit. * What? This commit fixes this issue, and adds tests to catch regressions. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>

2ez4bz · 2026-04-14T16:53:47Z

/bot run

tensorrt-cicd · 2026-04-14T17:02:41Z

PR_Github #43265 [ run ] triggered by Bot. Commit: e7868dd Link to invocation

tensorrt-cicd · 2026-04-14T18:53:49Z

PR_Github #43265 [ run ] completed with state SUCCESS. Commit: e7868dd
/LLM/main/L0_MergeRequest_PR pipeline #33818 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

2ez4bz requested a review from a team as a code owner April 14, 2026 06:02

2ez4bz requested a review from symphonylyh April 14, 2026 06:02

github-actions bot assigned 2ez4bz Apr 14, 2026

2ez4bz force-pushed the dev-nano-v3-fix-chunked-prefill-api branch 2 times, most recently from f4313d5 to 35608fe Compare April 14, 2026 06:09

yechank-nvidia approved these changes Apr 14, 2026

View reviewed changes

2ez4bz force-pushed the dev-nano-v3-fix-chunked-prefill-api branch from 35608fe to e7868dd Compare April 14, 2026 16:52

2ez4bz enabled auto-merge (squash) April 14, 2026 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Fix chunked prefill API contract for nemotron nano VL#13025

[None][fix] Fix chunked prefill API contract for nemotron nano VL#13025
2ez4bz wants to merge 1 commit intoNVIDIA:mainfrom
2ez4bz:dev-nano-v3-fix-chunked-prefill-api

2ez4bz commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 14, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

2ez4bz commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

2ez4bz commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

2ez4bz commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

2ez4bz commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading