Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ outlined on that page and do not file a public issue.

## License
By contributing to tensor-layouts, you agree that your contributions will be licensed
under the LICENSE file in the root directory of this source tree.
under the LICENSE file in the root directory of this source tree.
53 changes: 50 additions & 3 deletions docs/analysis_api.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
<!--
MIT License

Copyright (c) 2026 Meta Platforms, Inc. and affiliates.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
-->

# Analysis API

GPU kernel performance lives or dies by memory access patterns. Two
Expand Down Expand Up @@ -44,7 +68,7 @@ offset_table(Layout((4, 2), (0, 1)))
# 1: [(0,1), (1,1), (2,1), (3,1)]}
```

## bank_conflicts(layout, *, num_banks=32, element_bytes=2, bank_width_bytes=4, group_size=32)
## bank_conflicts(layout, *, element_bytes, num_banks=32, bank_width_bytes=4, group_size=32)

Analyze shared memory bank conflicts for a thread-to-offset layout.

Expand Down Expand Up @@ -75,6 +99,18 @@ The `max_ways` value is the worst-case serialization factor: 1 means no
conflicts, N means N-way serialization. Two threads accessing the
*same* word get a broadcast (no conflict on NVIDIA hardware).

For multi-mode (TV) layouts where mode 0 is the thread dimension and
mode 1+ are value dimensions, all values per thread are included in the
analysis. This models vectorized loads where each thread accesses
multiple elements:

```python
# TV layout: 32 threads, each loading 2 fp16 elements
tv = Layout((32, 2), (1, 32))
result = bank_conflicts(tv, element_bytes=2)
result['conflict_free'] # True: values land in distinct banks
```

Returns a dict:

| Key | Type | Description |
Expand All @@ -83,7 +119,7 @@ Returns a dict:
| `max_ways` | int | Worst-case serialization factor across all banks |
| `bank_to_threads` | dict | `{bank_id: [thread_ids...]}` for all accessed banks |

## coalescing_efficiency(layout, *, warp_size=32, element_bytes=2, cache_line_bytes=128)
## coalescing_efficiency(layout, *, element_bytes, warp_size=32, cache_line_bytes=128)

Analyze global memory coalescing for a thread-to-offset layout.

Expand All @@ -98,7 +134,7 @@ result['transactions'] # 1
result['efficiency'] # 1.0 (128 unique useful bytes / 128 transferred)

# Worst case: each thread hits a separate cache line
result = coalescing_efficiency(Layout(32, 64))
result = coalescing_efficiency(Layout(32, 64), element_bytes=2)
result['transactions'] # 32
result['efficiency'] # 0.016 (64 unique useful bytes / 4096 transferred)
```
Expand All @@ -111,6 +147,17 @@ Returns a dict:
| `efficiency` | float | Unique useful bytes / transferred bytes (1.0 = perfect) |
| `cache_lines` | list | Sorted cache line indices touched |

For multi-mode (TV) layouts, all values per thread are included,
modeling vectorized loads:

```python
# TV layout: 32 threads, 4 values each, contiguous within each thread
tv = Layout((32, 4), (4, 1))
result = coalescing_efficiency(tv, element_bytes=2)
result['transactions'] # 2 (256 bytes spans 2 cache lines)
result['efficiency'] # 1.0 (256 unique bytes / 256 transferred)
```

## Permutation Analysis

When a layout is bijective (every offset is hit exactly once), it defines
Expand Down
22 changes: 22 additions & 0 deletions docs/generate_figures.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
# MIT License
#
# Copyright (c) 2026 Meta Platforms, Inc. and affiliates.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

#!/usr/bin/env python3
"""Regenerate all PNG figures used in the documentation.

Expand Down
24 changes: 24 additions & 0 deletions docs/layout_api.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
<!--
MIT License

Copyright (c) 2026 Meta Platforms, Inc. and affiliates.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
-->

# Layout Algebra API

This document covers the core `tensor_layouts` API: constructing layouts,
Expand Down
37 changes: 37 additions & 0 deletions docs/viz_api.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
<!--
MIT License

Copyright (c) 2026 Meta Platforms, Inc. and affiliates.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
-->

# Visualization API

This document covers the `tensor_layouts.viz` module for drawing layouts,
Expand Down Expand Up @@ -257,6 +281,12 @@ layout = Layout(((3, 2), ((2, 3), 2)), ((4, 1), ((2, 15), 100)))
draw_slice(layout, ((1, None), ((None, 0), None)), title="((1,:),((:,0),:))")
```

For 1D layouts, wrap the slice in a single-element tuple:

```python
draw_slice(Layout(8, 1), (slice(2, 5),), title="1D slice [2:5]")
```

![draw_slice](images/draw_slice.png)

**Parameters:**
Expand Down Expand Up @@ -302,6 +332,13 @@ draw_composite(panels, "comparison.png",
| `panel_size` | `(w, h)` | `(4, 4)` | Size per panel |
| `colorize` | `bool` | `False` | Rainbow colors |
| `tv_mode` | `bool` | `False` | Use TV-layout rendering |
| `flatten_hierarchical` | `bool` | `True` | Flatten nested shapes to 2D grid |
| `label_hierarchy_levels` | `bool` | `False` | In nested hierarchical mode, annotate hierarchy levels |

Per-panel options (`(Layout, opts_dict)` tuples) override the top-level
defaults: `colorize`, `color_layout`, `num_colors`, `tv_mode`,
`flatten_hierarchical`, `label_hierarchy_levels`, and the TV-specific
`grid_rows`, `grid_cols`, `thr_id_layout`, `col_major`.

## draw_tiled_grid

Expand Down
2 changes: 1 addition & 1 deletion examples/layouts.py
Original file line number Diff line number Diff line change
Expand Up @@ -763,7 +763,7 @@ def example_analysis():
f"efficiency {r1['efficiency']:.0%}")

scattered = Layout(32, 64)
r2 = coalescing_efficiency(scattered)
r2 = coalescing_efficiency(scattered, element_bytes=2)
print(f" Stride-64 (fp16): {r2['transactions']} transactions, "
f"efficiency {r2['efficiency']:.1%}")

Expand Down
Loading
Loading