End-to-end spatial transcriptomics analysis pipeline for 10x Genomics Visium data, demonstrating tissue architecture deconvolution, spatially-resolved marker gene expression, neighborhood enrichment analysis, and Moran's I spatial autocorrelation in human brain glioblastoma FFPE tissue.
- Overview
- Notebooks
- Dataset
- Pipeline Overview
- Key Results
- Repository Structure
- Installation & Usage
- Methods
- Figures
- Future Directions
- References
- Author
This repository demonstrates a complete, reproducible spatial transcriptomics analysis workflow applied to 10x Genomics Visium human brain glioblastoma FFPE data. The repository contains two complementary analysis notebooks:
- A fully portable simulation pipeline (
01_visium_spatial_analysis.py) that runs end-to-end without any data download — ideal for testing and demonstrating pipeline logic - A real data analysis notebook (
02_visium_real_data_analysis.ipynb) applied to the actual 10x Genomics Human Brain Cancer Visium dataset, with rendered outputs and full biological interpretation
Together, these demonstrate the full spectrum of spatial transcriptomics expertise: from pipeline engineering and methodology to hands-on biological discovery on real tissue data.
Key technical skills demonstrated:
- Production-grade NGS analysis pipeline design
- Single-cell/spatial RNA-seq preprocessing, QC, and normalization
- Highly variable gene selection and dimensionality reduction (PCA + UMAP)
- Graph-based clustering with Leiden algorithm
- Spatially-resolved marker gene visualization on H&E tissue
- Differential expression with multiple testing correction (Wilcoxon + Benjamini-Hochberg)
- Spatial neighborhood enrichment analysis (Squidpy)
- Spatial autocorrelation (Moran's I) — identifying spatially structured gene programs
- Reproducible research practices (seed setting, layer preservation, versioned outputs)
| Notebook | Description | Data Required | Outputs |
|---|---|---|---|
notebooks/01_visium_spatial_analysis.py |
Full 9-step pipeline on biologically realistic simulated Visium data. Runs immediately, no download needed. | None | 8 figures + 2 tables in results/simulated/ |
notebooks/02_visium_real_data_analysis.ipynb |
End-to-end analysis on real 10x Genomics Human Brain Glioblastoma FFPE dataset. Rendered outputs included — view directly on GitHub without running. | 10x Visium dataset | 9 figures + 2 tables in results/real_data/ |
Tip: The notebook (
02) includes rendered figures and cell outputs — you can view the full analysis results directly on GitHub without running any code.
| Property | Value |
|---|---|
| Source | 10x Genomics Visium CytAssist — Human Brain Cancer (FFPE) |
| Technology | Visium Spatial Gene Expression |
| Tissue | Human Brain — Glioblastoma multiforme |
| Download | 10x Genomics Dataset Portal |
| Spots under tissue | 10,878 |
| Median UMI per spot | 8,339 |
| Median genes per spot | 4,600 |
| Sequencing depth | ~805M reads (Illumina NovaSeq 6000) |
| License | CC BY 4.0 |
Raw SpaceRanger Output (.h5 + spatial/)
│
▼
┌─────────────────────────────────────────┐
│ 1. Data Loading │ sc.read_visium() — counts + coordinates + H&E
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 2. Quality Control │ Filter: UMI ≥500, genes ≥200, MT% ≤20
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 3. Normalization & Log1p │ Library-size normalize → 10,000 CPM → log(X+1)
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 4. HVG Selection (n=3,000) │ Seurat dispersion method
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 5. Dimensionality Reduction │ PCA (50 PCs) → KNN graph → UMAP (top 20 PCs)
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 6. Leiden Clustering │ Community detection on KNN graph (res=0.5)
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 7. Spatial Visualization │ Cluster maps + marker expression on H&E tissue
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 8. Differential Expression │ Wilcoxon rank-sum + BH correction per cluster
└─────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ 9. Spatial Neighborhood Analysis │ Squidpy: neighborhood enrichment + Moran's I
└─────────────────────────────────────────┘
│
▼
Results & Figures + Saved AnnData (.h5ad)
| Cluster | Cell Type / Region | Key Markers | N Spots |
|---|---|---|---|
| 0 | Tumor Core | MKI67, TOP2A, PCNA, CDK1 | ~216 |
| 1 | Proliferating Glia | GFAP, VIM, NES, SOX2 | ~375 |
| 2 | Neurons (L2/3) | RBFOX3, CUX1, CUX2, LAMP5 | ~282 |
| 3 | Neurons (L5/6) | RORB, TLE4, FEZF2, BCL11B | ~617 |
| 4 | Oligodendrocytes | MBP, MOG, MOBP, PLP1 | ~188 |
| 5 | Astrocytes | GFAP, AQP4, ALDH1L1, SLC1A2 | ~179 |
| 6 | Microglia / Immune | AIF1, CSF1R, TREM2, P2RY12 | ~127 |
| 7 | Vascular / Endothelial | CLDN5, PECAM1, VWF, FLT1 | ~279 |
| Gene | Moran's I | Interpretation |
|---|---|---|
| CLDN5 | 0.71 | Highly spatially restricted (vascular) |
| AIF1 | 0.60 | Microglia spatially clustered |
| MBP | 0.55 | White matter tracts |
| GFAP | 0.54 | Reactive glia zones |
| AQP4 | 0.45 | Astrocyte endfeet |
| RBFOX3 | 0.30 | Neuronal layers |
| MKI67 | 0.23 | Proliferating zone |
Real data results: See rendered outputs in
02_visium_real_data_analysis.ipynb
spatial-transcriptomics-visium/
├── notebooks/
│ ├── 01_visium_spatial_analysis.py # Portable simulation pipeline (no data needed)
│ └── 02_visium_real_data_analysis.ipynb # Real data — rendered outputs included
├── src/
│ ├── qc_utils.py # QC helper functions
│ ├── spatial_utils.py # Spatial analysis utilities
│ └── plotting.py # Custom visualization functions
├── results/
│ ├── simulated/ # Outputs from simulation pipeline
│ │ ├── figures/
│ │ │ ├── 01_QC_metrics.png
│ │ │ ├── 02_HVG_selection.png
│ │ │ ├── 03_PCA_elbow.png
│ │ │ ├── 04_UMAP_clusters.png
│ │ │ ├── 05_spatial_clusters_and_markers.png
│ │ │ ├── 06_marker_spatial_grid.png
│ │ │ ├── 07_dotplot_markers.png
│ │ │ └── 08_morans_I.png
│ │ ├── tables/
│ │ │ ├── DE_cluster_markers.csv
│ │ │ └── morans_I_spatial_autocorrelation.csv
│ │ └── analysis_summary.json
│ └── real_data/ # Outputs from real Visium dataset
│ ├── figures/
│ │ ├── 01_QC_violin.png
│ │ ├── 02_HVG_selection.png
│ │ ├── 03_PCA_elbow.png
│ │ ├── 04_UMAP_clusters.png
│ │ ├── 05_spatial_clusters.png ← clusters on H&E tissue
│ │ ├── 06_spatial_markers.png ← marker expression on tissue
│ │ ├── 07_marker_dotplot.png
│ │ ├── 08_marker_heatmap.png
│ │ └── 09_neighborhood_enrichment.png ← Squidpy spatial analysis
│ ├── tables/
│ │ ├── DE_marker_genes_per_cluster.csv
│ │ └── morans_I_spatial_autocorrelation.csv
│ └── visium_brain_analyzed.h5ad ← saved AnnData (gitignored)
├── data/
│ └── README.md # Data download instructions
├── environment.yml # Conda environment specification
├── requirements.txt # pip dependencies
├── .gitignore
└── README.md
git clone https://github.com/pandeyravi15/spatial-transcriptomics-visium.git
cd spatial-transcriptomics-visiumOption A — Conda (recommended):
conda env create -f environment.yml
conda activate spatial-txOption B — pip:
pip install -r requirements.txtOption C — Install from within Jupyter Lab:
import sys
!{sys.executable} -m pip install scanpy[leiden] squidpy umap-learn anndataRestart the kernel after installation before importing packages.
conda activate spatial-tx
cd notebooks
python 01_visium_spatial_analysis.py
# Outputs saved to results/simulated/Download the dataset first:
1. Go to: https://www.10xgenomics.com/datasets/human-brain-cancer-11-mm-capture-area-ffpe-2-standard
2. Download: filtered_feature_bc_matrix.h5
3. Download: spatial/ folder (tissue_positions.csv, scalefactors_json.json, tissue images)
4. Place all files in: data/visium_human_brain/
Run in Jupyter Lab:
jupyter lab
# Open: notebooks/02_visium_real_data_analysis.ipynb
# Run all cells top to bottomresults/simulated/figures/ → Simulated pipeline outputs (8 figures)
results/real_data/figures/ → Real data outputs (9 figures including H&E overlays)
results/real_data/tables/ → DE markers + Moran's I statistics (CSV)
Spots filtered on three standard criteria: total UMI ≥ 500, genes detected ≥ 200, and mitochondrial reads ≤ 20%. Raw counts preserved in adata.layers['counts'] before any transformation.
Library-size normalization to 10,000 counts per spot followed by log1p transformation log(X + 1), consistent with Visium best practices (Luecken & Theis, 2019).
Top 3,000 highly variable genes using the Seurat dispersion method. HVG subset used exclusively for PCA to reduce noise from lowly expressed genes.
PCA on scaled HVG matrix (50 components). UMAP computed on top 20 PCs selected by elbow plot, via umap-learn for superior global structure preservation.
KNN graph (k=15) in PCA space. Leiden algorithm (Traag et al., 2019) at resolution=0.5 for community detection. Resolution tunable for coarser or finer boundaries.
Wilcoxon rank-sum test (one cluster vs. all others) with Benjamini-Hochberg FDR correction — recommended for non-normally distributed count data.
- Neighborhood enrichment: Permutation-based test identifying cluster pairs significantly co-localized in tissue space
- Moran's I: Per-gene spatial autocorrelation using KNN-weighted spatial adjacency (k=6 hexagonal neighbors), quantifying spatially structured expression patterns
| Figure | Description |
|---|---|
01_QC_metrics.png |
Violin plots — UMI counts, genes per spot, % mitochondrial reads |
02_HVG_selection.png |
Mean-variance relationship with HVGs highlighted |
03_PCA_elbow.png |
Variance explained per PC + cumulative variance |
04_UMAP_clusters.png |
2D embedding colored by cluster and total UMI |
05_spatial_clusters_and_markers.png |
Spatial cluster map + GFAP expression overlay |
06_marker_spatial_grid.png |
8-panel spatial expression grid for key cell-type markers |
07_dotplot_markers.png |
Dot plot: mean expression × % spots expressing per cluster |
08_morans_I.png |
Spatial autocorrelation (Moran's I) bar chart per marker gene |
| Figure | Description |
|---|---|
01_QC_violin.png |
Per-spot QC distributions on real tissue data |
02_HVG_selection.png |
HVG selection on real gene expression data |
03_PCA_elbow.png |
PCA variance explained — real Visium data |
04_UMAP_clusters.png |
UMAP embedding — Leiden clusters + QC metrics |
05_spatial_clusters.png |
Leiden clusters overlaid on H&E tissue image |
06_spatial_markers.png |
MKI67, GFAP, MBP, AIF1 expression on tissue |
07_marker_dotplot.png |
Top 5 marker genes per cluster — dot plot |
08_marker_heatmap.png |
Marker gene heatmap across clusters |
09_neighborhood_enrichment.png |
Squidpy spatial neighborhood enrichment matrix |
- Cell-type deconvolution with RCTD or Cell2location
- Spatially variable gene analysis (SpatialDE, SPARK-X)
- Ligand-receptor interaction analysis (CellChat, NicheNet)
- Integration with paired scRNA-seq reference atlas (Human Cell Atlas)
- Trajectory / pseudotime analysis of spatially adjacent clusters
- Xenium sub-cellular resolution analysis
- Multi-sample integration and batch correction
- Stahl, P.L. et al. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science, 353(6294), 78–82.
- Luecken, M.D. & Theis, F.J. (2019). Current best practices in single-cell RNA-seq analysis. Mol Syst Biol, 15(6), e8746.
- Traag, V.A. et al. (2019). From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep, 9, 5233.
- Palla, G. et al. (2022). Squidpy: a scalable framework for spatial omics analysis. Nature Methods, 19, 171–178.
- Moran, P.A.P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1/2), 17–23.
- 10x Genomics Visium Spatial Gene Expression. https://www.10xgenomics.com/products/spatial-gene-expression
Ravi Shanker Pandey, Ph.D. Computational Biologist | The Jackson Laboratory for Genomic Medicine
This repository was developed as part of a computational biology portfolio demonstrating spatial transcriptomics expertise. All code is original and reproducible under the MIT License.