[metal] Add elementwise kernel tests#1796
Closed
aditvenk wants to merge 5 commits intoaditvenk/stack/24from
Closed
[metal] Add elementwise kernel tests#1796aditvenk wants to merge 5 commits intoaditvenk/stack/24from
aditvenk wants to merge 5 commits intoaditvenk/stack/24from
Conversation
aditvenk
added a commit
that referenced
this pull request
Mar 24, 2026
Extend MslAstWalker to handle the Triton AST patterns produced by elementwise operations: - libdevice.func / tl_math.func → MSL intrinsics (exp via exp2 + ln2) - triton_helpers.maximum/minimum → max/min - tl.* catch-all: strip _rn suffix, inline sigmoid, pass through others - ast.Pow → sqrt(x) for **0.5, pow(x, y) otherwise - Strip Triton broadcasting subscripts ([:, None]) as no-ops Tests cover arithmetic (add/sub/mul/div/neg), scalar args (saxpy), activations (relu/silu/gelu_approx), math ops (exp/log/sqrt/abs/ sin+cos/clamp), dtypes (float16/bfloat16/int32), bounds masking, and >1D tensors (2D aligned + non-aligned, 3D). stack-info: PR: #1796, branch: aditvenk/stack/14
7c675d9 to
5b3c02a
Compare
fe08c0b to
ccbd95d
Compare
This was referenced Mar 27, 2026
ccbd95d to
ba050a8
Compare
7c7f971 to
805f6d6
Compare
805f6d6 to
2d8ac76
Compare
Add tests for arithmetic (add/sub/mul/div/neg), scalar args (saxpy), activations (relu/silu/gelu_approx), math ops (exp/log/sqrt/abs/ sin+cos/clamp), dtypes (float16/bfloat16/int32), bounds masking, and >1D tensors (2D aligned + non-aligned, 3D). stack-info: PR: #1796, branch: aditvenk/stack/14
e5a154d to
7cf5efd
Compare
7cf5efd to
3e36ec4
Compare
Add msl_ast_walker.py which translates Python AST to MSL C++ source. Handles statement-level translation (assignments, if/for, etc.), tl.load/tl.store → pointer dereferences, and C++ namespace restoration (metal.precise.sin → metal::precise::sin). This is a standalone library module — not yet wired into the backend. stack-info: PR: #1794, branch: aditvenk/stack/13
Add metal_jit decorator that JIT-compiles a Python function to an MSL Metal shader on first call. The decorator: 1. Parses the decorated function's source to recover the Python AST 2. Calls _generate_msl to translate the AST body to MSL C++ source 3. Compiles the MSL via torch.mps.compile_shader 4. Caches the compiled library for subsequent calls Metadata (tensor arg dtypes, block sizes) is passed as decorator arguments by Helion's codegen: @metal_jit(args=[...], block_sizes=[...]) stack-info: PR: #1991, branch: aditvenk/stack/25
- MetalBackend.function_decorator returns "metal_jit" - Add Backend.function_decorator_expr hook; MetalBackend overrides it to serialize arg metadata and block sizes into the decorator call - device_function.py calls backend.function_decorator_expr(self) - Launcher simplified: metal_jit returns compiled lib directly, no more source hashing or compile_shader in the launcher - 3D threadgroup dispatch model with _block_dims stack-info: PR: #1992, branch: aditvenk/stack/26
Add a macOS Metal entry to matrix.json (macos-m2-26, cpu runtime, metal backend) and update test.yml to handle macOS: - Gate apt-get on runner.os == 'Linux' - Add cpu runtime branch for PyTorch install - Add MPS availability check for metal backend - Run test/test_metal.py without xdist for metal stack-info: PR: #1862, branch: aditvenk/stack/24
This was referenced Apr 9, 2026
Contributor
Author
|
Combined into #1992 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked PRs:
[metal] Add elementwise kernel tests
Add tests for arithmetic (add/sub/mul/div/neg), scalar args (saxpy),
activations (relu/silu/gelu_approx), math ops (exp/log/sqrt/abs/
sin+cos/clamp), dtypes (float16/bfloat16/int32), bounds masking,
and >1D tensors (2D aligned + non-aligned, 3D).