Skip to content

NKI Beta 2 Interpreter#296

Draft
latentCall145 wants to merge 10 commits intomainfrom
nki-namespace
Draft

NKI Beta 2 Interpreter#296
latentCall145 wants to merge 10 commits intomainfrom
nki-namespace

Conversation

@latentCall145
Copy link
Copy Markdown
Collaborator

@latentCall145 latentCall145 commented Feb 28, 2026

NOTE: This PR is only a reference, I'll add the NKI Beta 2 interpreter in several stages based on the plan listed here

Summary

  • Adds the NKI Beta 2 interpreter
  • Adds NKI Beta 2 examples
  • Does NOT connect with triton-viz yet

Test Plan

  • lots of unit tests in 'tests/nki/test_nki_beta2.py`
  • e2e tests using:
    • test_nkilib_interpreter_e2e.py (uses nkilib, which is bundled with nki and are NKI kernels for various ops like RMSNorm, projection, etc.)
    • running examples/nki NKI beta 2 kernels (matmul, rmsnorm, rope, softmax, attention) to make sure they equal numpy impls

Related Issues

This will be merged in several stages.

stage 1: dtypes foundation (current PR)

Scope:

  • Add triton_viz/utils/dtypes.py with STORAGE_DTYPES backed by ml_dtypes for low-precision formats.
  • Add unit tests that validate alias resolution and cast behavior for low-precision formats.

Files:

  • triton_viz/utils/dtypes.py
  • tests/unit/test_dtypes.py

Validation:

  • pytest tests/unit/test_dtypes.py

stage 2: beta2 interpreter scaffold (side-by-side)

Scope:

  • Add triton_viz/core/nki_beta2.py without routing trace/patch to it yet.
  • Add minimal beta2 tests/examples that exercise direct behavior.

stage 3: runtime wiring switch

Scope:

  • Route NKI tracing/patching to beta2 interpreter.
  • Include small compatibility/typing adjustments (trace.py, patch.py, client.py).

stage 4: op coverage expansion

Scope:

  • Land broad nl/nisa operation support in nki_beta2.py.

stage 5: hardening cleanup

Scope:

  • Apply type-fix and debloat commits (590d058, 9873ccd, 0692573) as cleanup-only PRs.

stage 6: test expansion

Scope:

  • Land large beta2 test coverage and nkilib e2e tests (b218865 + quantization-related updates from 8ecd095).

stage 7: beta2 examples

Scope:

  • Land beta2 examples (rmsnorm_beta2.py, rope_beta2.py, softmax_beta2.py, tiled_attention_beta2.py, nki2.py).

Breaking Changes

Once this is fully merged with triton-viz support, I'll remove the old NKI interpreter since it's deprecated and incomplete

Checklist

  • I added tests to all new functionality I added/bugs I fixed.
  • I verified that a human has reviewed all code in this PR.
  • I ran npm run build:frontend if the PR modified any TypeScript code.
  • I made sure that my code is well documented (comments explaining strange code, docstrings for functions, website modified if new functionality added).

@latentCall145 latentCall145 marked this pull request as draft February 28, 2026 21:26
@github-actions
Copy link
Copy Markdown

Sanitizer Performance Benchmark

Benchmark main (median) PR (median) Change
simple_load_store 0.006s 0.005s -14.8%
gemm 0.025s 0.024s -3.3%
gemm_oob 0.026s 0.026s -2.7%
indirect_load 0.082s 0.081s -2.1%
nested_loop 0.027s 0.027s -0.2%
block_pointer_loop_advance 0.008s 0.008s +2.1%
liger_jsd 0.160s 0.162s +1.3%
flaggems_layernorm 3.119s 3.087s -1.0%
Total 3.454s 3.420s -1.0%

Threshold: >15% regression flagged with ⚠️
Iterations: 1 warmup + 20 measured

@latentCall145 latentCall145 mentioned this pull request Mar 10, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant