Skip to content

Commit 5cb9674

Browse files
docs: clarify licensing and add third-party attributions
- Add code-level attribution in backbones.py for the foundation models.
1 parent e84a9de commit 5cb9674

28 files changed

Lines changed: 76 additions & 53 deletions

README.md

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,10 @@
33
> [!WARNING]
44
> **Work in Progress**: This project is under active development. Core architectures, CLI flags, and data formats are subject to major changes.
55
6-
**SpatialTranscriptFormer** bridges histology and biological pathways through a high-performance transformer architecture. By modeling the dense interplay between morphological features and gene expression signatures, it provides an interpretable and spatially-coherent mapping of the tissue microenvironment.
6+
> [!TIP]
7+
> **Framework Release**: SpatialTranscriptFormer has been restructured from a research codebase into a robust framework. You can now use the Python API to train on your own spatial transcriptomics data with custom backbones and architectures.
8+
9+
**SpatialTranscriptFormer** is a modular deep learning framework designed to bridge histology and biological pathways. It leverages transformer architectures to model the interplay between morphological features and gene expression signatures, providing interpretable mapping of the tissue microenvironment.
710

811
## Key Technical Pillars
912

@@ -35,13 +38,13 @@ This project requires [Conda](https://docs.conda.io/en/latest/).
3538
1. Clone the repository.
3639
2. Run the automated setup script:
3740
3. On Windows: `.\setup.ps1`
38-
- On Linux/HPC: `bash setup.sh`
41+
4. On Linux/HPC: `bash setup.sh`
3942

40-
## Usage
43+
## Usage: HEST-1k Benchmark Recipe
4144

42-
### Dataset Access
45+
While the core `SpatialTranscriptFormer` framework can be integrated programmatically with any dataset (see the **[Python API Reference](docs/API.md)** and **[Bring Your Own Data Guide](src/spatial_transcript_former/recipes/custom/README.md)**), this repository includes a complete, out-of-the-box CLI pipeline specifically for reproducing our benchmarks on the [HEST-1k dataset](https://huggingface.co/datasets/MahmoodLab/hest).
4346

44-
The model uses the **HEST1k** dataset. You can download specific subsets (by organ, technology, etc.) or the entire dataset using the `stf-download` utility:
47+
### Dataset Access
4548

4649
```bash
4750
# List available filtering options
@@ -94,10 +97,19 @@ Visualization plots will be saved to the `./results` directory.
9497

9598
## Documentation
9699

97-
- [Models](docs/MODELS.md): Detailed model architectures and scaling parameters.
98-
- [Data Structure](docs/DATA_STRUCTURE.md): Organization of HEST data on disk.
99-
- [Pathway Mapping](docs/PATHWAY_MAPPING.md): Clinical interpretability and pathway integration.
100-
- [Gene Analysis](docs/GENE_ANALYSIS.md): Modeling strategies for high-dimensional gene space.
100+
### Framework APIs & Usage
101+
102+
- **[Python API Reference](docs/API.md)**: Full documentation for `Trainer`, `Predictor`, and `SpatialDataset`.
103+
- **[Bring Your Own Data Guide](src/spatial_transcript_former/recipes/custom/README.md)**: Templates and examples for training on your own non-HEST spatial transcriptomics data.
104+
- **[HEST Recipe Docs](src/spatial_transcript_former/recipes/hest/README.md)**: Detailed documentation for the included HEST-1k dataset recipe.
105+
- **[Training Guide](docs/TRAINING_GUIDE.md)**: Complete list of configuration flags and preset configurations for HEST models.
106+
107+
### Theory & Interpretability
108+
109+
- **[Models & Architecture](docs/MODELS.md)**: Deep dive into the quad-flow interaction logic and network scaling.
110+
- **[Pathway Mapping](docs/PATHWAY_MAPPING.md)**: Clinical interpretability, pathway bottleneck design, and MSigDB integration.
111+
- **[Gene Analysis](docs/GENE_ANALYSIS.md)**: Modeling strategies for mapping morphology to high-dimensional gene spaces.
112+
- **[Data Structure](docs/DATA_STRUCTURE.md)**: Detailed breakdown of the HEST data structure on disk, metadata conventions, and preprocessing invariants.
101113

102114
## Development
103115

scripts/diagnose_collapse.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
import numpy as np
2222

2323
from spatial_transcript_former.models.interaction import SpatialTranscriptFormer
24-
from spatial_transcript_former.data.dataset import (
24+
from spatial_transcript_former.recipes.hest.dataset import (
2525
HEST_FeatureDataset,
2626
load_global_genes,
2727
)

scripts/download_hest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
# Add src to path
77
sys.path.append(os.path.abspath("src"))
8-
from spatial_transcript_former.data.download import (
8+
from spatial_transcript_former.recipes.hest.download import (
99
download_hest_subset,
1010
download_metadata,
1111
)

scripts/inspect_outputs.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import argparse
55
import numpy as np
66
from spatial_transcript_former.models import SpatialTranscriptFormer
7-
from spatial_transcript_former.data.utils import get_sample_ids, setup_dataloaders
7+
from spatial_transcript_former.recipes.hest.utils import get_sample_ids, setup_dataloaders
88

99

1010
class Args:

scripts/inspect_sample.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
# Add src to path
77
sys.path.append(os.path.abspath("src"))
8-
from spatial_transcript_former.data.io import get_hest_data_dir, load_h5ad_metadata
8+
from spatial_transcript_former.recipes.hest.io import get_hest_data_dir, load_h5ad_metadata
99
from spatial_transcript_former.config import get_config
1010
from spatial_transcript_former.data.pathways import (
1111
download_msigdb_gmt,

src/spatial_transcript_former/data/__init__.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,3 @@
2020
apply_dihedral_to_tensor,
2121
normalize_coordinates,
2222
)
23-
24-
# HEST-specific (backward compatibility)
25-
from .dataset import HEST_Dataset, get_hest_dataloader
26-
from .splitting import split_hest_patients
27-
from .download import download_hest_subset, download_metadata, filter_samples
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# HEST-1k Recipe
2+
3+
This directory contains the recipe for training `SpatialTranscriptFormer` on the **HEST-1k** benchmark dataset.
4+
5+
While the core `SpatialTranscriptFormer` framework is dataset-agnostic, this recipe provides a complete, out-of-the-box pipeline for reproducing our benchmarks, including data downloading, preprocessing, and specialized dataloaders.
6+
7+
## Components
8+
9+
- **`dataset.py`**: Contains `HEST_Dataset` and `HEST_FeatureDataset`, which subclass `SpatialDataset` to handle the specific `.h5ad` structure and metadata conventions of the HEST dataset.
10+
- **`io.py`**: Utilities for reading spatial graphs, coordinates, and `.h5ad` matrices.
11+
- **`utils.py`**: HEST-specific dataset setup routines, splitting logic, and vocabulary loading.
12+
- **`download.py`**: Logic for fetching subsets of the gated HEST dataset from Hugging Face.
13+
14+
## Usage
15+
16+
For complete CLI usage and training preset commands, refer to the main **[README.md](../../../../README.md)** and the **[Training Guide](../../../../docs/TRAINING_GUIDE.md)**.

src/spatial_transcript_former/recipes/hest/__init__.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
"""
2020

2121
# Dataset classes and DataLoader factories
22-
from spatial_transcript_former.data.dataset import (
22+
from spatial_transcript_former.recipes.hest.dataset import (
2323
HEST_Dataset,
2424
HEST_FeatureDataset,
2525
get_hest_dataloader,
@@ -29,31 +29,31 @@
2929
)
3030

3131
# I/O utilities
32-
from spatial_transcript_former.data.io import (
32+
from spatial_transcript_former.recipes.hest.io import (
3333
get_hest_data_dir,
3434
load_h5ad_metadata,
3535
get_image_from_h5ad,
3636
decode_h5_string,
3737
)
3838

3939
# Download
40-
from spatial_transcript_former.data.download import (
40+
from spatial_transcript_former.recipes.hest.download import (
4141
download_hest_subset,
4242
download_metadata,
4343
filter_samples,
4444
)
4545

4646
# Sample discovery and dataloader setup
47-
from spatial_transcript_former.data.utils import (
47+
from spatial_transcript_former.recipes.hest.utils import (
4848
get_sample_ids,
4949
setup_dataloaders,
5050
)
5151

5252
# Splitting
53-
from spatial_transcript_former.data.splitting import split_hest_patients
53+
from spatial_transcript_former.recipes.hest.splitting import split_hest_patients
5454

5555
# Vocab building
56-
from spatial_transcript_former.data.build_vocab import scan_h5ad_files
56+
from spatial_transcript_former.recipes.hest.build_vocab import scan_h5ad_files
5757

5858
__all__ = [
5959
# Datasets

src/spatial_transcript_former/data/build_vocab.py renamed to src/spatial_transcript_former/recipes/hest/build_vocab.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
# Add src to path
1313
sys.path.append(os.path.abspath("src"))
14-
from spatial_transcript_former.data.io import get_hest_data_dir, load_h5ad_metadata
14+
from spatial_transcript_former.recipes.hest.io import get_hest_data_dir, load_h5ad_metadata
1515
from spatial_transcript_former.config import get_config
1616
from spatial_transcript_former.data.pathways import (
1717
download_msigdb_gmt,

src/spatial_transcript_former/data/dataset.py renamed to src/spatial_transcript_former/recipes/hest/dataset.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
import pandas as pd
2424
import numpy as np
2525
from .io import decode_h5_string, load_h5ad_metadata
26-
from .base import (
26+
from spatial_transcript_former.data.base import (
2727
SpatialDataset,
2828
apply_dihedral_augmentation,
2929
apply_dihedral_to_tensor,
@@ -37,7 +37,7 @@
3737

3838
# Augmentation helpers and normalize_coordinates are now in data.base
3939
# and imported above. Kept here for backward compatibility:
40-
# from spatial_transcript_former.data.dataset import apply_dihedral_augmentation
40+
# from spatial_transcript_former.recipes.hest.dataset import apply_dihedral_augmentation
4141
# still works via the import at the top of this file.
4242

4343

0 commit comments

Comments
 (0)