[None][fix] Fix chunked prefill API contract for nemotron nano VL#13025
[None][fix] Fix chunked prefill API contract for nemotron nano VL#130252ez4bz wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
f4313d5 to
35608fe
Compare
|
Caution Review failedAn error occurred during the review process. Please try again later. 📝 WalkthroughWalkthroughThese changes refactor multimodal embedding handling in Nemotron Nano models. The Changes
Sequence Diagram(s)sequenceDiagram
participant Forward as Forward()
participant Encoder as _encode_multimodal()
participant Cache as MultimodalParams<br/>(Cache Storage)
participant Merge as merge_evs_mm_embeds()
participant Context as Context Params
Forward->>Encoder: Call with uncached params
Encoder->>Encoder: Vision/Audio encoding
Encoder->>Cache: Store embeddings in<br/>multimodal_data
Encoder->>Cache: Store num_tokens_in_video
Encoder-->>Forward: Return List[Tensor]
Forward->>Merge: Pass merged embeddings<br/>+ context params
Merge-->>Forward: Merged result
Forward->>Context: Reconstruct num_tokens<br/>from cached values
Forward-->>Forward: Continue with<br/>merged embeddings
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
* Why? In order to opt into the caching functionality for chunked prefix, there are certain assumptions on the return type of the encoder's forward function. These assumptions did not hold for nemotron nano VL prior to this commit. * What? This commit fixes this issue, and adds tests to catch regressions. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
35608fe to
e7868dd
Compare
|
/bot run |
|
PR_Github #43265 [ run ] triggered by Bot. Commit: |
|
PR_Github #43265 [ run ] completed with state
|
Summary by CodeRabbit
Improvements
Tests
Description
In order to opt into the caching functionality for chunked prefix, there are certain assumptions on the return type of the encoder's forward function. These assumptions did not hold for nemotron nano VL prior to this commit.
This commit fixes this issue, and adds tests to catch regressions.
Test Coverage
See above.
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.