🧠 Spatial Transcriptomics Analysis: 10x Genomics Visium Human Brain

End-to-end spatial transcriptomics analysis pipeline for 10x Genomics Visium data, demonstrating tissue architecture deconvolution, spatially-resolved marker gene expression, neighborhood enrichment analysis, and Moran's I spatial autocorrelation in human brain glioblastoma FFPE tissue.

Overview

This repository demonstrates a complete, reproducible spatial transcriptomics analysis workflow applied to 10x Genomics Visium human brain glioblastoma FFPE data. The repository contains two complementary analysis notebooks:

A fully portable simulation pipeline (01_visium_spatial_analysis.py) that runs end-to-end without any data download — ideal for testing and demonstrating pipeline logic
A real data analysis notebook (02_visium_real_data_analysis.ipynb) applied to the actual 10x Genomics Human Brain Cancer Visium dataset, with rendered outputs and full biological interpretation

Together, these demonstrate the full spectrum of spatial transcriptomics expertise: from pipeline engineering and methodology to hands-on biological discovery on real tissue data.

Key technical skills demonstrated:

Production-grade NGS analysis pipeline design
Single-cell/spatial RNA-seq preprocessing, QC, and normalization
Highly variable gene selection and dimensionality reduction (PCA + UMAP)
Graph-based clustering with Leiden algorithm
Spatially-resolved marker gene visualization on H&E tissue
Differential expression with multiple testing correction (Wilcoxon + Benjamini-Hochberg)
Spatial neighborhood enrichment analysis (Squidpy)
Spatial autocorrelation (Moran's I) — identifying spatially structured gene programs
Reproducible research practices (seed setting, layer preservation, versioned outputs)

Notebooks

Notebook	Description	Data Required	Outputs
`notebooks/01_visium_spatial_analysis.py`	Full 9-step pipeline on biologically realistic simulated Visium data. Runs immediately, no download needed.	None	8 figures + 2 tables in `results/simulated/`
`notebooks/02_visium_real_data_analysis.ipynb`	End-to-end analysis on real 10x Genomics Human Brain Glioblastoma FFPE dataset. Rendered outputs included — view directly on GitHub without running.	10x Visium dataset	9 figures + 2 tables in `results/real_data/`

Tip: The notebook (02) includes rendered figures and cell outputs — you can view the full analysis results directly on GitHub without running any code.

Dataset

Property	Value
Source	10x Genomics Visium CytAssist — Human Brain Cancer (FFPE)
Technology	Visium Spatial Gene Expression
Tissue	Human Brain — Glioblastoma multiforme
Download	10x Genomics Dataset Portal
Spots under tissue	10,878
Median UMI per spot	8,339
Median genes per spot	4,600
Sequencing depth	~805M reads (Illumina NovaSeq 6000)
License	CC BY 4.0

Pipeline Overview

Raw SpaceRanger Output (.h5 + spatial/)
              │
              ▼
┌─────────────────────────────────────────┐
│  1. Data Loading                         │  sc.read_visium() — counts + coordinates + H&E
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  2. Quality Control                      │  Filter: UMI ≥500, genes ≥200, MT% ≤20
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  3. Normalization & Log1p               │  Library-size normalize → 10,000 CPM → log(X+1)
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  4. HVG Selection (n=3,000)             │  Seurat dispersion method
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  5. Dimensionality Reduction            │  PCA (50 PCs) → KNN graph → UMAP (top 20 PCs)
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  6. Leiden Clustering                   │  Community detection on KNN graph (res=0.5)
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  7. Spatial Visualization               │  Cluster maps + marker expression on H&E tissue
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  8. Differential Expression             │  Wilcoxon rank-sum + BH correction per cluster
└─────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────┐
│  9. Spatial Neighborhood Analysis       │  Squidpy: neighborhood enrichment + Moran's I
└─────────────────────────────────────────┘
              │
              ▼
       Results & Figures + Saved AnnData (.h5ad)

Key Results

Spatial Domains Identified (Simulated pipeline)

Cluster	Cell Type / Region	Key Markers	N Spots
0	Tumor Core	MKI67, TOP2A, PCNA, CDK1	~216
1	Proliferating Glia	GFAP, VIM, NES, SOX2	~375
2	Neurons (L2/3)	RBFOX3, CUX1, CUX2, LAMP5	~282
3	Neurons (L5/6)	RORB, TLE4, FEZF2, BCL11B	~617
4	Oligodendrocytes	MBP, MOG, MOBP, PLP1	~188
5	Astrocytes	GFAP, AQP4, ALDH1L1, SLC1A2	~179
6	Microglia / Immune	AIF1, CSF1R, TREM2, P2RY12	~127
7	Vascular / Endothelial	CLDN5, PECAM1, VWF, FLT1	~279

Spatial Autocorrelation — Moran's I (Simulated data)

Gene	Moran's I	Interpretation
CLDN5	0.71	Highly spatially restricted (vascular)
AIF1	0.60	Microglia spatially clustered
MBP	0.55	White matter tracts
GFAP	0.54	Reactive glia zones
AQP4	0.45	Astrocyte endfeet
RBFOX3	0.30	Neuronal layers
MKI67	0.23	Proliferating zone

Real data results: See rendered outputs in 02_visium_real_data_analysis.ipynb

Repository Structure

spatial-transcriptomics-visium/
├── notebooks/
│   ├── 01_visium_spatial_analysis.py               # Portable simulation pipeline (no data needed)
│   └── 02_visium_real_data_analysis.ipynb          # Real data — rendered outputs included
├── src/
│   ├── qc_utils.py                                  # QC helper functions
│   ├── spatial_utils.py                             # Spatial analysis utilities
│   └── plotting.py                                  # Custom visualization functions
├── results/
│   ├── simulated/                                   # Outputs from simulation pipeline
│   │   ├── figures/
│   │   │   ├── 01_QC_metrics.png
│   │   │   ├── 02_HVG_selection.png
│   │   │   ├── 03_PCA_elbow.png
│   │   │   ├── 04_UMAP_clusters.png
│   │   │   ├── 05_spatial_clusters_and_markers.png
│   │   │   ├── 06_marker_spatial_grid.png
│   │   │   ├── 07_dotplot_markers.png
│   │   │   └── 08_morans_I.png
│   │   ├── tables/
│   │   │   ├── DE_cluster_markers.csv
│   │   │   └── morans_I_spatial_autocorrelation.csv
│   │   └── analysis_summary.json
│   └── real_data/                                   # Outputs from real Visium dataset
│       ├── figures/
│       │   ├── 01_QC_violin.png
│       │   ├── 02_HVG_selection.png
│       │   ├── 03_PCA_elbow.png
│       │   ├── 04_UMAP_clusters.png
│       │   ├── 05_spatial_clusters.png              ← clusters on H&E tissue
│       │   ├── 06_spatial_markers.png               ← marker expression on tissue
│       │   ├── 07_marker_dotplot.png
│       │   ├── 08_marker_heatmap.png
│       │   └── 09_neighborhood_enrichment.png       ← Squidpy spatial analysis
│       ├── tables/
│       │   ├── DE_marker_genes_per_cluster.csv
│       │   └── morans_I_spatial_autocorrelation.csv
│       └── visium_brain_analyzed.h5ad               ← saved AnnData (gitignored)
├── data/
│   └── README.md                                    # Data download instructions
├── environment.yml                                  # Conda environment specification
├── requirements.txt                                 # pip dependencies
├── .gitignore
└── README.md

Installation & Usage

1. Clone the repository

git clone https://github.com/pandeyravi15/spatial-transcriptomics-visium.git
cd spatial-transcriptomics-visium

2. Set up environment

Option A — Conda (recommended):

conda env create -f environment.yml
conda activate spatial-tx

Option B — pip:

pip install -r requirements.txt

Option C — Install from within Jupyter Lab:

import sys
!{sys.executable} -m pip install scanpy[leiden] squidpy umap-learn anndata

Restart the kernel after installation before importing packages.

3. Run Notebook 1 — Simulated pipeline (no data download needed)

conda activate spatial-tx
cd notebooks
python 01_visium_spatial_analysis.py
# Outputs saved to results/simulated/

4. Run Notebook 2 — Real Visium data

Download the dataset first:

1. Go to: https://www.10xgenomics.com/datasets/human-brain-cancer-11-mm-capture-area-ffpe-2-standard
2. Download: filtered_feature_bc_matrix.h5
3. Download: spatial/ folder (tissue_positions.csv, scalefactors_json.json, tissue images)
4. Place all files in: data/visium_human_brain/

Run in Jupyter Lab:

jupyter lab
# Open: notebooks/02_visium_real_data_analysis.ipynb
# Run all cells top to bottom

5. View results

results/simulated/figures/   → Simulated pipeline outputs (8 figures)
results/real_data/figures/   → Real data outputs (9 figures including H&E overlays)
results/real_data/tables/    → DE markers + Moran's I statistics (CSV)

Methods

Quality Control

Spots filtered on three standard criteria: total UMI ≥ 500, genes detected ≥ 200, and mitochondrial reads ≤ 20%. Raw counts preserved in adata.layers['counts'] before any transformation.

Normalization

Library-size normalization to 10,000 counts per spot followed by log1p transformation log(X + 1), consistent with Visium best practices (Luecken & Theis, 2019).

HVG Selection

Top 3,000 highly variable genes using the Seurat dispersion method. HVG subset used exclusively for PCA to reduce noise from lowly expressed genes.

Dimensionality Reduction

PCA on scaled HVG matrix (50 components). UMAP computed on top 20 PCs selected by elbow plot, via umap-learn for superior global structure preservation.

Clustering

KNN graph (k=15) in PCA space. Leiden algorithm (Traag et al., 2019) at resolution=0.5 for community detection. Resolution tunable for coarser or finer boundaries.

Differential Expression

Wilcoxon rank-sum test (one cluster vs. all others) with Benjamini-Hochberg FDR correction — recommended for non-normally distributed count data.

Spatial Neighborhood Analysis (Squidpy)

Neighborhood enrichment: Permutation-based test identifying cluster pairs significantly co-localized in tissue space
Moran's I: Per-gene spatial autocorrelation using KNN-weighted spatial adjacency (k=6 hexagonal neighbors), quantifying spatially structured expression patterns

Figures

Simulated Pipeline (`results/simulated/figures/`)

Figure	Description
`01_QC_metrics.png`	Violin plots — UMI counts, genes per spot, % mitochondrial reads
`02_HVG_selection.png`	Mean-variance relationship with HVGs highlighted
`03_PCA_elbow.png`	Variance explained per PC + cumulative variance
`04_UMAP_clusters.png`	2D embedding colored by cluster and total UMI
`05_spatial_clusters_and_markers.png`	Spatial cluster map + GFAP expression overlay
`06_marker_spatial_grid.png`	8-panel spatial expression grid for key cell-type markers
`07_dotplot_markers.png`	Dot plot: mean expression × % spots expressing per cluster
`08_morans_I.png`	Spatial autocorrelation (Moran's I) bar chart per marker gene

Real Data (`results/real_data/figures/`)

Figure	Description
`01_QC_violin.png`	Per-spot QC distributions on real tissue data
`02_HVG_selection.png`	HVG selection on real gene expression data
`03_PCA_elbow.png`	PCA variance explained — real Visium data
`04_UMAP_clusters.png`	UMAP embedding — Leiden clusters + QC metrics
`05_spatial_clusters.png`	Leiden clusters overlaid on H&E tissue image
`06_spatial_markers.png`	MKI67, GFAP, MBP, AIF1 expression on tissue
`07_marker_dotplot.png`	Top 5 marker genes per cluster — dot plot
`08_marker_heatmap.png`	Marker gene heatmap across clusters
`09_neighborhood_enrichment.png`	Squidpy spatial neighborhood enrichment matrix

Future Directions

Cell-type deconvolution with RCTD or Cell2location
Spatially variable gene analysis (SpatialDE, SPARK-X)
Ligand-receptor interaction analysis (CellChat, NicheNet)
Integration with paired scRNA-seq reference atlas (Human Cell Atlas)
Trajectory / pseudotime analysis of spatially adjacent clusters
Xenium sub-cellular resolution analysis
Multi-sample integration and batch correction

References

Stahl, P.L. et al. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science, 353(6294), 78–82.
Luecken, M.D. & Theis, F.J. (2019). Current best practices in single-cell RNA-seq analysis. Mol Syst Biol, 15(6), e8746.
Traag, V.A. et al. (2019). From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep, 9, 5233.
Palla, G. et al. (2022). Squidpy: a scalable framework for spatial omics analysis. Nature Methods, 19, 171–178.
Moran, P.A.P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37(1/2), 17–23.
10x Genomics Visium Spatial Gene Expression. https://www.10xgenomics.com/products/spatial-gene-expression

Author

Ravi Shanker Pandey, Ph.D. Computational Biologist | The Jackson Laboratory for Genomic Medicine

This repository was developed as part of a computational biology portfolio demonstrating spatial transcriptomics expertise. All code is original and reproducible under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
notebooks		notebooks
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 Spatial Transcriptomics Analysis: 10x Genomics Visium Human Brain

📋 Table of Contents

Overview

Notebooks

Dataset

Pipeline Overview

Key Results

Spatial Domains Identified (Simulated pipeline)

Spatial Autocorrelation — Moran's I (Simulated data)

Repository Structure

Installation & Usage

1. Clone the repository

2. Set up environment

3. Run Notebook 1 — Simulated pipeline (no data download needed)

4. Run Notebook 2 — Real Visium data

5. View results

Methods

Quality Control

Normalization

HVG Selection

Dimensionality Reduction

Clustering

Differential Expression

Spatial Neighborhood Analysis (Squidpy)

Figures

Simulated Pipeline (results/simulated/figures/)

Real Data (results/real_data/figures/)

Future Directions

References

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Simulated Pipeline (`results/simulated/figures/`)

Real Data (`results/real_data/figures/`)

Packages