Skip to content

Clean up Oink benchmarks and document SM103 results#132

Merged
Laurawly merged 4 commits intomainfrom
laura/refactor-oink
Apr 29, 2026
Merged

Clean up Oink benchmarks and document SM103 results#132
Laurawly merged 4 commits intomainfrom
laura/refactor-oink

Conversation

@Laurawly
Copy link
Copy Markdown
Contributor

Summary

This PR cleans up the Oink benchmark/docs surface and updates the repo for current SM10x / GB300 usage.
Changes include:

  • Make the top-level Oink README more concise and current.
  • Update benchmark docs for SM103 / GB300 setup and measured roofline context.
  • Add DSv4 norm benchmark coverage for RMSNorm, RMSNorm backward, fused-add RMSNorm, and LayerNorm.
  • Add fused-add RMSNorm DSv3/DSv4 speedup tables vs Quack to benchmarks/README.md.
  • Generate and include a current SM103 BF16 benchmark SVG.
  • Centralize CuTeDSL cache-dir setup in _cutedsl_cache.py.
  • Update package/docs wording from SM100-only to Blackwell SM10x / SM100-SM103 where appropriate.

Validation

Ran on GB300 / SM103 with CUTE_DSL_ARCH=sm_103a:

git diff --check
PYTHONNOUSERSITE=1 CUTE_DSL_ARCH=sm_103a python -m compileall -q src benchmarks tests
PYTHONNOUSERSITE=1 CUTE_DSL_ARCH=sm_103a python -m pytest tests -q

Result:

15 passed

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 29, 2026
@Laurawly Laurawly merged commit 2d8ed0d into main Apr 29, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants