Skip to content

kernels: nvfp4: make scale constants tl.constexpr for triton>=3.6#29

Open
zhitwang17 wants to merge 1 commit into
mainfrom
zhitao/fix-global-var
Open

kernels: nvfp4: make scale constants tl.constexpr for triton>=3.6#29
zhitwang17 wants to merge 1 commit into
mainfrom
zhitao/fix-global-var

Conversation

@zhitwang17

Copy link
Copy Markdown
Collaborator

Summary

NVFP4 quantization kernels failed to compile on triton >= 3.6 with
NameError: Cannot access global variable F4_E2M1_MAX from within @jit'ed function.
triton >= 3.6 forbids @triton.jit kernels from reading plain (non-constexpr)
module globals, which broke the module-level scale constants
(F4_E2M1_MAX, F8E4M3_MAX, E4M3_EPS) used inside _calculate_nvfp4_scales.
Previously this required the TRITON_ALLOW_NON_CONSTEXPR_GLOBALS=1 env var as a
workaround.

Fix

Split the constants into two clearly-scoped definitions instead of overloading a
single name:

  • Host-facing F4_E2M1_MAX / F8E4M3_MAX / E4M3_EPS stay plain Python
    floats, so eager/host code (e.g. compute_dynamic_outer_scale) uses them
    directly with no accessor.
  • Kernel-facing _F4_E2M1_MAX / _F8E4M3_MAX / _E4M3_EPS are
    tl.constexpr mirrors, referenced only inside @triton.jit kernels.

This removes the need for TRITON_ALLOW_NON_CONSTEXPR_GLOBALS=1 and eliminates a
class of bugs where host code touching a tl.constexpr object would raise an
AttributeError (it avoids the fragile .value-everywhere pattern).

No behavioral change: the inlined constant values are identical, and the
generated AMDGCN is byte-for-byte equal (only DWARF debug line numbers shift),
so there is no performance impact.

Test status

  • Full NVFP4 op-level suite + test_decomposed_linear.py: 512 passed,
    run without TRITON_ALLOW_NON_CONSTEXPR_GLOBALS.
  • Host-side smoke check: the three constants are plain floats and work directly
    in arithmetic / .clamp(...) calls.
  • Lint: clean.

Note: tests/unittest/nvfp4/test_real_data_snr.py is excluded — it depends on
the amdfp4 module from a separate branch and is unrelated to this change.

@zhitwang17 zhitwang17 self-assigned this Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant