Skip to content

Conversation

@vpietila-amd
Copy link
Contributor

@vpietila-amd vpietila-amd commented Jan 29, 2026

Proposed changes

Added conv group merging to the (universal) V3 fwd conv pipeline. The new instance improves fwd conv performance when the number of input/output channel per group is low.

On MI300 (gfx942) we get

CK prof command Baseline (TFLOPS) V3 group merging (TFLOPS)
grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 4 4 3 3 200 200 1 1 1 1 1 1 1 1 3.86035 8.36796
grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 8 8 3 3 200 200 2 2 1 1 1 1 1 1 10.1867 13.4677
grouped_conv_fwd 1 1 1 0 1 0 1 2 32 32 8 8 3 3 100 100 1 2 1 1 1 1 1 1 11.7875 16.3657

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants