[FEAT] Lazy fake-to-real tensor materialization by mark14wu · Pull Request #303 · Deep-Learning-Profiling-Tools/triton-viz

mark14wu · 2026-03-02T04:55:42Z

Summary

Start with fake tensors (no CPU copies) by default and lazily retry with real tensors only when IndirectSymbolicExprBase.concretize() is called (i.e., indirect loads require concrete data)
Avoids expensive host copies for kernels without data dependencies while transparently handling kernels that need them
SANITIZER_ENABLE_FAKE_TENSOR semantics: unset = auto (lazy retry), 1 = force fake, 0 = force real

Fixes #111

Test plan

All 205 existing tests pass (uv run pytest tests/ -x)
Verified SANITIZER_ENABLE_FAKE_TENSOR=0 (force real) passes all sanitizer tests
Verified SANITIZER_ENABLE_FAKE_TENSOR=1 (force fake) passes all sanitizer tests
Default auto mode correctly retries with real tensors on indirect load kernels (e.g., test_gemm_oob_call_stack)

Start with fake tensors (no CPU copies) by default and lazily retry with real tensors only when indirect loads require concrete data. This avoids expensive host copies for kernels without data dependencies while transparently handling kernels that need them. SANITIZER_ENABLE_FAKE_TENSOR semantics: unset = auto (lazy retry), 1 = force fake, 0 = force real.

github-actions · 2026-03-02T05:08:12Z

Sanitizer Performance Benchmark

Benchmark	main (min)	PR (min)	Change
gemm	0.167s	0.169s	+1.4%
gemm_oob	0.175s	0.175s	+0.2%
indirect_load	0.255s	0.257s	+0.7%
nested_loop	0.332s	0.335s	+0.7%
block_pointer_loop_advance	0.162s	0.160s	-1.3%
liger_jsd	0.138s	0.140s	+1.7%
flaggems_layernorm	0.414s	0.417s	+0.7%
swiglu	0.170s	0.171s	+1.0%
cross_entropy	0.160s	0.161s	+0.8%
fused_linear_jsd	0.208s	0.210s	+0.6%
Total	2.181s	2.195s	+0.7%

Iterations: 1 warmup + 20 measured

Replace the coarse-grained retry strategy (NeedRealTensorsError + full kernel re-run) with fine-grained lazy materialization: a TensorMaterializer rebases GPU pointers to on-demand CPU copies only when IndirectSymbolicExprBase.concretize() is called, avoiding unnecessary full-tensor copies for kernels with indirect loads. - Add TensorMaterializer class to patch.py - Update concretize() to rebase pointers via materializer - Remove NeedRealTensorsError, reset_for_retry(), retry logic - Simplify virtual_memory config back to bool

# Conflicts: # triton_viz/core/client.py

mark14wu · 2026-03-06T18:21:49Z

I guess test is missing for this branch.

rebase_pointers() assumed all pointers came from a single GPU storage, breaking when a pointer tensor spans multiple storages. It also crashed on masked-out garbage addresses because concretize() called rebase before computing the mask. - Make rebase_pointers() mask-aware with per-storage-group rebasing - Reorder concretize() to compute mask before calling rebase_pointers - Add regression tests for TensorMaterializer and Config env parsing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix mask broadcast bug where scalar/lower-dimensional masks were not expanded to match ptr_data shape, leaving most lanes unrebased. Add np.broadcast_to before flattening. Fix README documenting wrong default for SANITIZER_ENABLE_FAKE_TENSOR (0 → 1). Add 6 regression tests for non-zero offset rebasing and broadcast masks. Rename test classes to match pytest.ini python_classes = *Test pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…llback Replace boolean `virtual_memory` config with explicit `TensorMode` enum: - FORCE_REAL (env=0): always copy tensors to CPU upfront - LAZY_AUTO (unset/auto): fake tensors + lazy materialization with eager fallback on unmappable pointers - FORCE_FAKE (env=1): fake tensors + lazy materialization, errors on unmappable pointers Unrecognised env values now emit a warning and default to LAZY_AUTO. Fix rebase_pointers mask handling: cast to dtype=bool and use .ravel() so scalar, broadcastable, and same-shape masks all work correctly per Triton load/store semantics. Add _eager_materialise_all() fallback in LAZY_AUTO mode: when _find_base fails, materialise every registered storage to CPU before retrying, instead of surfacing a raw RuntimeError. Extract _rebase_core() helper to avoid duplicating fast/slow path logic between the normal path and the fallback retry. Tests: - 7 new config tests (0/1/unset/auto/AUTO/unrecognised + warning) - scalar False mask, failure-path tests (FORCE_FAKE raises, LAZY_AUTO fallback materialises all storages) - Update e2e fixture from _isolate_virtual_memory to _isolate_tensor_mode Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…Error, e2e tests Address code-review blockers and high-priority items: Blocker 1 — Thread safety: _cpu_offset() now uses double-checked locking so concurrent workers (TRITON_VIZ_NUM_SMS > 1) never materialise the same storage twice. _eager_materialise_all() also runs under the lock. Blocker 2 — Fallback contract: Replace catch-all `except RuntimeError` with dedicated UnmappablePointerError. Rename the LAZY_AUTO fallback from "eager-real fallback" to "pre-materialise all storages + retry" in docs, Config docstring, and README — no real-tensor rebuild or kernel re-run happens, so the old wording was misleading. High — Default flipped to FORCE_REAL: Unset / "0" → FORCE_REAL (safe default, same as main). "auto" → LAZY_AUTO (opt-in). "1" → FORCE_FAKE. Unrecognised values warn and default to FORCE_REAL. This separates "new mechanism lands" from "default behaviour changes". Medium — Backwards compat: config.virtual_memory is now a deprecation shim via __getattr__ / __setattr__. Reads map FORCE_REAL→False, else True. Writes map False→FORCE_REAL, True→LAZY_AUTO. DeprecationWarning on every access. E2E tests (5 new): - LAZY_AUTO indirect load — no false OOB - FORCE_FAKE indirect load — no false OOB - LAZY_AUTO indirect store — no false OOB - LAZY_AUTO + num_sms=2 concurrent — no crash, no false OOB - LAZY_AUTO OOB indirect — sanitizer detects and aborts Unit tests (7 new): - virtual_memory deprecation shim (read/write, 5 cases) - Thread safety: concurrent _cpu_offset / rebase_pointers (2 cases) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[FIX] Fix mypy no-redef error for virtual_memory type annotation

3245e1b

mark14wu marked this pull request as draft March 2, 2026 05:53

mark14wu force-pushed the feat/lazy-fake-to-real-tensor branch from 2d654b8 to 3245e1b Compare March 2, 2026 05:54

mark14wu mentioned this pull request Mar 2, 2026

[FIX][TEST] Add device attr to FakeTensor for OOB report #302

Merged

2 tasks

mark14wu added 3 commits March 3, 2026 00:24

Merge branch 'main' into feat/lazy-fake-to-real-tensor

fdf008a

# Conflicts: # triton_viz/core/client.py

Merge branch 'main' into feat/lazy-fake-to-real-tensor

af96298

mark14wu marked this pull request as ready for review March 6, 2026 16:48

[STYLE] Apply ruff-format to config.py and patch.py

0190906

mark14wu marked this pull request as draft March 6, 2026 18:21

mark14wu and others added 6 commits March 15, 2026 19:09

Merge branch 'main' into feat/lazy-fake-to-real-tensor

3152098

Merge branch 'main' into feat/lazy-fake-to-real-tensor

2452f1a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Lazy fake-to-real tensor materialization#303

[FEAT] Lazy fake-to-real tensor materialization#303
mark14wu wants to merge 12 commits intomainfrom
feat/lazy-fake-to-real-tensor

mark14wu commented Mar 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

mark14wu commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mark14wu commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sanitizer Performance Benchmark

Uh oh!

mark14wu commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mark14wu commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading