Development updates 20260409 by jduprat · Pull Request #8 · facebookresearch/tensor-layouts

jduprat · 2026-04-09T17:58:40Z

[New] Negative stride support across Layout, Tensor, analysis, and visualization — cosize() and compose() handle negative strides via magnitude decomposition; Tensor.view() preserves base offset; analysis functions rebase the addressed footprint to a local origin; TV visualization rebases negative offsets and avoids Python negative-index wraparound in explicit cell labels
[New] CuTe C++ oracle test suite — compiles regression cases directly against installed CUTLASS headers for compose(), logical_divide(), zipped_divide(), tiled_divide(), flat_divide(), left_inverse(), and logical_product(); gracefully skips when CUTLASS or a C++ compiler is unavailable
[New] tensor[:] whole-view full slice on Tensor, matching the explicit tensor[:, :] behavior
[New] Vertical arrangement in draw_swizzle for wide layouts — before/after grids stack top-to-bottom when columns exceed a threshold
[New] Paper examples test suite (arXiv:2603.02298v1) with --draw pytest option for visual output
[Fix] left_inverse for non-contiguous (padded) layouts
[Fix] compose() to truncate unreachable modes before the divisibility check
[Fix] logical_divide() to support Layout tilers embedded in by-mode tuples
[Fix] compose() and logical_divide() for nested tuple tilers
[Fix] zipped_divide(), tiled_divide(), flat_divide() to preserve Layout tiler strides instead of silently degrading to shape tilers
[Fix] Canonicalize stride to 0 for unit-extent modes in logical_divide()
[Fix] Layout.__call__(None) as full-slice identity
[Fix] Preserve swizzle attribute in slice_and_offset() sublayout results
[Fix] explain(compose) crash on tuple tilers
[Fix] explain(logical_product) to use cosize(B) for complement bound
[Fix] Duplicate test name shadowing draw_swizzle coverage — one of two identically-named tests was silently never running
[Robustness] Reject free coordinates (slices, None) in Tensor.__setitem__ with a clear TypeError guiding users to the slice-then-index pattern
[Cleanup] Move exhaustive introspection helpers (image(), is_injective(), is_surjective(), is_bijective(), is_contiguous(), functionally_equal()) from layouts.py to analysis.py — keeps the core module efficient, O(size) enumeration is opt-in
[Cleanup] Configure Ruff with correct src/tensor_layouts/ paths, fix lint warnings across the codebase
[Docs] im2col() figure and CONV→GEMM mapping clarification in applications notebook
[Docs] Document shape_div() strict scalar divisibility policy

The SW128 swizzle visualization (8×128) was too wide when drawn side-by-side. Add arrangement parameter to draw_swizzle and _build_swizzle_figure ("horizontal" default, "vertical" stacks panels). Use vertical arrangement in the notebook for this cell.

- Add parameterized _generate_im2col() to docs/generate_figures.py (accepts H, W, R, S; defaults to 4×4 input with 2×2 filter) - Replace ASCII art in algorithms.ipynb cell 65 with generated figure - Restructure 2D CONV section: explain general N-D mapping first (defining K, C, T, R, S, N, Z, P, Q), then specialize to 2D

_logical_divide_by_shape tried to compare Layout objects with <= against integers, causing TypeError when the tiler tuple contained Layout elements like (Layout(4,1), Layout(8,2)). The fix detects Layout elements and dispatches them through the compose/complement path per mode, matching CuTe C++ which treats tiler elements as Layouts (layout.hpp:1562). Found via arXiv:2603.02298v1 §3.5.2 examples.

When (remaining_shape-1)*remaining_stride < curr_shape, all of B fits within the current mode and higher modes are unreachable. The old code raised a divisibility error instead of absorbing B into this mode. This matches CuTe C++ layout.hpp:1077 which allows rest_stride < curr_shape as a valid composition case. The paper (§3.3.3) calls these "apparent violations" resolved by truncation. Example: compose((4,2,8):(3,12,97), 3:3) now correctly returns 3:9 instead of raising ValueError. Found via arXiv:2603.02298v1 §3.3.3 examples, verified against CuTe C++ composition_impl.

Rewrote left_inverse to match the CuTe C++ algorithm (layout.hpp:1324) instead of using right_inverse(Layout(L, complement(L))) which produced wrong results when complement coalesced away stride information. The C++ algorithm: 1. Coalesce, extract shapes/strides 2. Compute prefix products of shapes 3. Sort modes by stride (ascending) 4. Build inverse: new_shape = stride / result_size_so_far, new_stride = prefix_product[original_mode_index] 5. Append last sorted mode's shape, coalesce Example: left_inverse((4,8):(1,5)) now correctly returns (5,8):(1,4) matching Table 6 of arXiv:2603.02298v1, instead of the incorrect 4:1. Note: pycute also returns 4:1 for this case (same bug). Our implementation now matches the C++ ground truth and the paper.

56 tests derived from concrete examples in the CuTe paper: - Figures 1-3, 5, 10: layout construction, folding, slicing - Tables 2, 4-7: COPY, partitioning, inverses, complement - §3.1-3.5.2: concatenation, coalesce, composition, inverse, complement, logical product, logical divide, zipped divide Tests cite specific figure/table/equation numbers. Run with --draw to generate corresponding paper figures as SVGs: pytest tests/paper_examples.py --draw

The old check `len(data) >= cosize(layout)` only works for zero-offset, nonnegative-stride layouts. It silently accepted Tensors that would read/write out of bounds (e.g. offset=10 with a 4-element buffer, or negative strides without a compensating offset). Replace with _address_bounds() which computes the actual min/max storage indices from (offset, layout, swizzle) and validates that the storage covers the full range.

cosize() now uses abs(stride) so negative-stride layouts report the correct memory span. _composition_1d carries the stride sign separately, matching CuTe's signed composition rules. Tensor.view() preserves the parent's base offset, enabling reverse-order views like Layout(4, -1).

Coalescing and segment analyses now rebase accessed offsets to the group's minimum, so reversed dense layouts analyze identically to their forward equivalents. Permutation analysis (cycles, order) rebases dense shifted images to [0, n) before decomposing. to_F2_matrix rejects negative strides (affine, not F2-linear) and fixes swizzle matrix construction for negative shift values.

Layouts with negative strides produce negative output indices which broke cell-label lookup (Python wraparound), TV grid sizing (based on cosize which is always positive), and TV mapping (assumed image starts at 0). Fix by rebasing: _tv_output_bounds() finds the actual min/max offsets, _compute_tv_mapping() shifts all indices so the minimum maps to cell 0, and _lookup_cell_label() guards against negative-index wraparound on user-provided label lists.

Previously t[:, 3] = 99 silently computed an offset from the slice object, writing to an arbitrary location. Now __setitem__ checks for slice/None markers anywhere in the key (including inside hierarchical tuples) and raises TypeError with guidance to write through a sliced sub-Tensor instead.

Nested tuple tilers like ((2, 3), 4) were flattened into Layout((2,3), 1) by the is_pure_shape path, collapsing mode structure that should recurse. Now _normalize_compose_tiler_element preserves tuple nesting so each level composes against its corresponding mode, matching CuTe semantics.

zipped_divide, tiled_divide, and flat_divide were reducing Layout tilers to their shape, silently discarding non-unit strides (e.g. Layout(4, 2) became just 4). Now Layout tilers take CuTe's tile_unzip terminal path, preserving their stride structure. tiled_divide and flat_divide are rebuilt on top of zipped_divide.

The complement bound for Layout tilers should be size(A) * cosize(B), not size(A) * size(B). With non-unit strides (e.g. Layout(3, 2), cosize=5) the old formula underestimated the codomain and produced wrong complement layouts in the explanation output.

explain(compose, A, (2, 4)) called B(i) on a plain tuple, raising TypeError. Now detects non-Layout tilers and shows the mode-by-mode decomposition instead of the pointwise A(B(i)) trace.

Update pyproject.toml: fix stale layout_algebra paths to tensor_layouts, add analysis.py to wildcard-import exceptions, exclude notebooks from the lint surface. Fix all reported warnings: f-strings with no interpolation, unused imports and variables, ambiguous variable name 'l'. Add Ruff instructions to README.

Two tests shared the name test_draw_swizzle_delegates_to_shared_builder, so pytest silently ran only the second. Rename the second to test_draw_swizzle_saves_figure_from_shared_builder and update the first to properly mock _save_figure through the full delegation path.

CuTe C++ sets stride=0 on any extent-1 tile or rest mode produced by logical_divide (e.g. logical_divide(4:3, 4) = (4,1):(3,0)). Our implementation kept the original stride, producing (4,1):(3,3) which is functionally equivalent but breaks exact-match tests against CuTe.

Compiles a small C++ program against CUTLASS/CuTe headers and compares exact layout strings for the cases fixed in recent commits: nested tuple tiler composition, unit-mode stride canonicalization, and oversize tiler division. Skipped automatically when CUTLASS headers are not installed.

When slice_and_offset builds the sublayout for a partial hierarchical slice, it now propagates the parent layout's swizzle to the result. Previously the swizzle was silently dropped, causing incorrect address computations for slices of swizzled layouts. Add tests verifying swizzle preservation for both the low-level slice_and_offset function and Tensor.__getitem__ hierarchical slicing.

Layout.__call__(None) — a bare None rather than a tuple containing None — now returns the layout unchanged, matching CuTe's slice(_, layout) identity operation. Previously bare-None fell through to has_none(), which does not handle a non-tuple None correctly. Add C++ oracle tests verifying slice(_, layout) for both rank-2 and scalar layouts, and add external compatibility tests for regular, scalar, and swizzled layouts.

Tensor.__getitem__ now recognizes a bare slice(None) (i.e. tensor[:]) and returns a new Tensor view with the same layout, offset, and data. This matches the behavior of the explicit tensor[:, :] full slice and provides the natural Python idiom for creating a view of the whole tensor. Previously tensor[:] fell through to _slice_single, which does not handle slice(None) correctly for this purpose. Update docs/tensor_api.md with the tensor[:] entry and add tests for both regular and swizzled tensor full slices.

Add a "Shape Factorization" section to docs/layout_api.md documenting shape_div and shape_mod, including the intentional policy difference from dynamic CuTe C++: this Python implementation requires exact scalar divisibility (b|a or a|b), raising ValueError for pairs like shape_div(6, 4) where CuTe C++ would return ceil_div(6, 4) = 2. Update the shape_div docstring with the same strict-policy explanation and add a ValueError example to the docstring examples.

image, is_injective, is_surjective, is_bijective, is_contiguous, and functionally_equal are O(size) exhaustive enumerations — analysis-tier operations, not core algebra. Moving them to analysis.py keeps the cost model clear: the core layouts module is efficient, exhaustive introspection is explicitly opt-in via the analysis module. Updated all imports, tests, examples, notebooks, and documentation.

jduprat added 24 commits April 8, 2026 16:20

Fix explain(compose) crash on tuple tilers

6a0ce0e

explain(compose, A, (2, 4)) called B(i) on a plain tuple, raising TypeError. Now detects non-Layout tilers and shows the mode-by-mode decomposition instead of the pointwise A(B(i)) trace.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 9, 2026

jduprat merged commit c305ec9 into facebookresearch:main Apr 9, 2026
3 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development updates 20260409#8

Development updates 20260409#8
jduprat merged 24 commits intofacebookresearch:mainfrom
jduprat:dev

jduprat commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jduprat commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant