Skip to content

gstoner/tesserae

Repository files navigation

Tessera

Tessera is a next-generation programming model and compiler stack for deep learning, HPC, and scientific workloads.
It introduces a multi-level IR (Graph → Schedule → Tile → Target) and a DSL for operator-based modeling, enabling deterministic, scalable, and portable execution across NVIDIA, AMD, Intel, and CPU backends.


🔎 Overview Diagrams

1. Tessera Execution Flow

Tessera Overview

  • Graph IR: Operator algebra, autodiff, symbolic transformations
  • Schedule IR: Fusion, tiling, autotuning, pipeline scheduling
  • Tile IR: Explicit GPU mapping (blocks, warps, Tensor Cores)
  • Target IR: Lowered to PTX, ROCm LLVM, Intel Level Zero, CPU LLVM

2. Tessera Execution Hierarchy (like CUDA’s Grid/Block/Thread Pyramid)

Tessera Hierarchy

  • Graph Level → Operator graph, autodiff
  • Schedule Level → Fusion, tiling, pipeline
  • Tile Level → Block/thread mapping, Tensor Core tiles
  • Thread Level → Individual GPU threads, SIMT execution

📚 Documentation

The full documentation set is organized by topic:

Tessera Model Examples

This section provides end-to-end examples of mapping real-world models and reasoning frameworks onto the Tessera programming model.
Each example illustrates how Tessera’s multi-level IR (Graph IR → Schedule IR → Tile IR → Target IR) and runtime abstractions support scalable, efficient, and interpretable model design.


Available Examples

  • Hierarchical Reasoning Model (HRM)
    Mapping HRM’s Planner–Decomposer–Executor architecture into Tessera’s Graph IR and Schedule IR layers. Includes diagram of HRM ↔ Tessera IR mapping.

  • GPT-OSS-120B
    Illustrates how Tessera handles large-scale transformer models with distributed tensors, checkpointing, and efficient sharding strategies.

  • Physics-Informed Neural Networks (PINN)
    A 2D Navier–Stokes example with incompressibility constraints. Demonstrates Tessera’s operator adjoints for coupled PDE systems.

  • Spectral Mixture-of-Experts (Spectral-MoE)
    Combines FFT-based spectral decomposition, recursive operators, and MoE routing. Highlights Tessera’s strength in operator factorization and distributed expert parallelism.


🚀 Quick Example

from tessera import op, dist

# Create a distributed mesh across 8 GPUs
mesh = dist.Mesh(axes=["dp"], devices=range(8))

# Define a sharded tensor
X = dist.tensor((1024, 1024), layout=dist.ShardSpec(("row",), ("dp",)), mesh=mesh)

# Apply a fused operator pipeline
Y = op.pipeline([
    op.matmul(X, X.T),
    op.relu,
    op.layernorm
])

📌 Repository Structure

tessera/
├── docs/
│   ├── overview/                 # High-level diagrams
│   ├── programming_guide/        # Main programming guide
│   ├── performance/              # Performance best practices
│   ├── numerical/                # Numerical behavior guide
│   ├── runtime_abi/              # Runtime & ABI spec
│   ├── hardware_mapping/         # GPU mapping guide
│   ├── tutorials/                # Hands-on tutorials
│   ├── interop/                  # Interop & tooling guide
│   └── reference/                # Operator reference
└── README.md                     # This file
docs/models_examples/
 ├── HRM/  
 │    ├── Tessera_HRM_Mapping.md  
 │    └── Tessera_HRM_Mapping_Diagram.png  
 ├── GPT-OSS-120B/  
 │    └── Tessera_GPTOSS_Example.md  
 ├── PINN/  
 │    └── Tessera_PINN_NavierStokes.md  
 ├── MoE/  
 │    └── Tessera_Spectral_MoE.md  
 └── README.md   ← index listing all example models

Tessera Standard Operator Library (TSOL)

docs/TSOL/
├── TSOL_Guide.md
├── tessera/
│   └── ops.pyi
├── include/
│   └── tessera/
│       └── ops.hpp
├── tests/
│   └── test_stubs.py
└── mypy.ini

Tessera Collectives & Distributed Systems Guide

docs/Tessera_Collectives_Distributed/
├── Tessera_Collectives_Distributed.md
├── README.md
└── images/
    ├── mesh_collectives.png
    └── zero_flow.png

🔮 Roadmap

  • Expand operator libraries (cuBLAS, cuDNN, cuFFT equivalents in Tessera).
  • Add integration with Hugging Face models for inference.
  • Optimize distributed training at 128+ node scale.
  • Extend tooling (profiler, debugger, autotuner caches).

About

Tesserae Project - DL Learning Compiler Concept

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors