Skip to content

use static signature for sfd_col_d_srelu_tensor#281

Open
jiemingz wants to merge 1 commit into
NVIDIA:developfrom
jiemingz:jiemingz/dev_recompile
Open

use static signature for sfd_col_d_srelu_tensor#281
jiemingz wants to merge 1 commit into
NVIDIA:developfrom
jiemingz:jiemingz/dev_recompile

Conversation

@jiemingz
Copy link
Copy Markdown

@jiemingz jiemingz commented Jun 4, 2026

Summary by CodeRabbit

  • Bug Fixes
    • Improved caching efficiency for grouped matrix multiplication operations with dynamic activation functions, resulting in more precise instance selection and better performance optimization.

Signed-off-by: Jieming Zhang <jiemingz@nvidia.com>
@Anerudhan Anerudhan added mod-cutedsl CuTeDSL kernels, generated kernels, examples, or related integration work. cat-enhancements orig-nv-eng Reported or requested by NVIDIA engineering. labels Jun 4, 2026
@Anerudhan
Copy link
Copy Markdown
Collaborator

@cudnn-ci-bot run

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
python/cudnn/grouped_gemm/grouped_gemm_dsrelu/api.py (1)

1483-1493: ⚡ Quick win

Add documentation explaining the dimension selection logic.

The helper omits dimension 4 from the static shape and marks stride dimensions 2 and 5 as dynamic, but the rationale for this specific selection is not documented. Consider adding a docstring or inline comment explaining:

  • Why dimension 4 (rest_m) is excluded from the static shape
  • Why stride dimensions 2 and 5 are treated as dynamic
  • How this relates to the SFD column tensor structure

This would improve maintainability and help future developers understand the cache key granularity design. As per coding guidelines, documentation is a key focus area for this module.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cudnn/grouped_gemm/grouped_gemm_dsrelu/api.py` around lines 1483 -
1493, Add a short docstring or inline comment to
dynamic_sfd_col_tensor_signature explaining the dimension-selection rationale:
state that static_shape intentionally omits dimension 4 (rest_m) because rest_m
varies per-column and should not be part of the cache key, and that
dynamic_stride_dims=(2, 5) marks the stride-related dimensions (the inner M
chunk and the leading stride for packed layout) as dynamic because their strides
can vary even when logical sizes match; also mention how this choice maps to the
SFD column tensor layout and why it yields the desired cache key granularity
when delegating to dynamic_m_tensor_signature.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@python/cudnn/grouped_gemm/grouped_gemm_dsrelu/api.py`:
- Around line 1483-1493: Add a short docstring or inline comment to
dynamic_sfd_col_tensor_signature explaining the dimension-selection rationale:
state that static_shape intentionally omits dimension 4 (rest_m) because rest_m
varies per-column and should not be part of the cache key, and that
dynamic_stride_dims=(2, 5) marks the stride-related dimensions (the inner M
chunk and the leading stride for packed layout) as dynamic because their strides
can vary even when logical sizes match; also mention how this choice maps to the
SFD column tensor layout and why it yields the desired cache key granularity
when delegating to dynamic_m_tensor_signature.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 941018c2-14e8-4ce1-a369-5b4cb4767865

📥 Commits

Reviewing files that changed from the base of the PR and between 035b520 and bf8b797.

📒 Files selected for processing (1)
  • python/cudnn/grouped_gemm/grouped_gemm_dsrelu/api.py

@NVIDIA NVIDIA deleted a comment from coderabbitai Bot Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat-enhancements mod-cutedsl CuTeDSL kernels, generated kernels, examples, or related integration work. orig-nv-eng Reported or requested by NVIDIA engineering.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants