🧬 scRNAseq-Figure-Replication

Project Overview

This project replicates key figures from the study:

Rubenstein, A. B., Smith, G. R., Raue, U., Begue, G., Minchev, K., Ruf-Zamojski, F., Nair, V. D., Wang, X., Zhou, L., Zaslavsky, E., Trappe, T. A., Trappe, S., & Sealfon, S. C. (2020). Single-cell transcriptional profiles in human skeletal muscle. Scientific reports, 10(1), 229. https://doi.org/10.1038/s41598-019-57110-6

The original paper investigates gene signatures across mononuclear and multinucleated skeletal muscle cell types using single-cell RNA sequencing (scRNA-seq).

Unlike bulk RNA-seq, single-cell analysis enables:

Identification of cell-type–specific transcriptional signatures
Detection of heterogeneous gene expression patterns
Separation of changes in cell-type proportions from gene expression differences

Objective

The goal of this project was to computationally replicate:

Supplementary Figure S9
Figure 3a

using:

Hierarchical clustering
PCA (Principal Component Analysis)
Louvain clustering
tSNE visualization
Marker gene dot plots
Heatmaps

All analyses were performed in Python using Scanpy.

Data Source

Raw count matrices were downloaded from GEO (Gene Expression Omnibus).

Four skeletal muscle scRNA-seq count files were merged
Final dataset contained 2,876 cells, matching the published study

Computational Workflow

Data Processing

Counts Matrix Generation

Raw count .csv.gz files were loaded using pandas
Gene names were set as the dataframe index
Datasets were merged using an outer join
Missing values were removed
Data was transposed to cell × gene format
Converted into an AnnData object

Final AnnData object:

2,876 cell observations
Gene expression matrix
Metadata annotations

Preprocessing (Scanpy)

The following Scanpy preprocessing steps were applied:

normalize_total() → library size normalization
log1p() → log transformation
highly_variable_genes() → feature selection
pca(n_comps=10) → dimensionality reduction
neighbors() → graph construction using 10 PCs

Unsupervised Clustering

Louvain clustering (resolution = 0.6)
Clusters initially unlabeled
tSNE computed for visualization

sc.tl.louvain()
sc.tl.tsne()

Figure Replication

A. tSNE Clustering (Figure 2A Equivalent)

Unlabeled clusters were visualized using tSNE.

Cluster identities were determined using:

Marker gene expression
Dot plots
Hierarchical clustering via dendrogram

Clusters were manually annotated by mapping:

cluster ID → biological cell type

Final labeled tSNE shows:

Endothelial subtypes
Satellite cells
Pericytes
NK cells
T + B cells
Myeloid cells
Smooth muscle cells
FAP subtypes

B. Marker Gene Dot Plot

To identify cell types:

Top marker genes per cluster (from Supplementary Table 1) were used
Scanpy dotplot() visualized:
- Expression level (color intensity)
- Fraction of expressing cells (dot size)
Dendrogram computed using PCA correlations between clusters

This step guided manual cluster annotation.

C. Hierarchical Clustering

Hierarchical clustering was computed using:

sc.tl.dendrogram()

Clustering was based on:

Correlation of PCA components
Cluster-level aggregation

This matches the hierarchical grouping described in the paper.

D. Heatmap (Figure 3a Replication)

A curated marker gene dictionary was constructed using:

Top-ranked differentially expressed genes
Supplementary Table 1 from the paper

Heatmap generated using:

sc.pl.heatmap()

The resulting heatmap:

Displays marker expression across annotated clusters
Recapitulates published cell-type–specific transcriptional signatures
Reflects hierarchical relationships between cell types

Biological Interpretation

This replication demonstrates:

Clear transcriptional separation of skeletal muscle cell subtypes
Distinct endothelial subpopulations
Immune cell presence (NK, T, B, Myeloid)
Fibro-adipogenic progenitor (FAP) heterogeneity
Satellite cell signature specificity

The results confirm the original study’s findings regarding cell-type–specific gene expression patterns.

Technologies Used

Python
Scanpy
Pandas
NumPy
scikit-learn
Matplotlib
Seaborn
SciPy

Key Learning Outcomes

This project demonstrates:

✔ Single-cell RNA-seq preprocessing

✔ AnnData object construction

✔ Feature selection via highly variable genes

✔ Dimensionality reduction (PCA, tSNE)

✔ Graph-based clustering (Louvain)

✔ Marker-based cell type annotation

✔ Hierarchical clustering

✔ Figure replication from published literature

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
BF550-Final-Project.ipynb		BF550-Final-Project.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧬 scRNAseq-Figure-Replication

Project Overview

Objective

Data Source

Computational Workflow

Data Processing

Counts Matrix Generation

Preprocessing (Scanpy)

Unsupervised Clustering

Figure Replication

A. tSNE Clustering (Figure 2A Equivalent)

B. Marker Gene Dot Plot

C. Hierarchical Clustering

D. Heatmap (Figure 3a Replication)

Biological Interpretation

Technologies Used

Key Learning Outcomes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧬 scRNAseq-Figure-Replication

Project Overview

Objective

Data Source

Computational Workflow

Data Processing

Counts Matrix Generation

Preprocessing (Scanpy)

Unsupervised Clustering

Figure Replication

A. tSNE Clustering (Figure 2A Equivalent)

B. Marker Gene Dot Plot

C. Hierarchical Clustering

D. Heatmap (Figure 3a Replication)

Biological Interpretation

Technologies Used

Key Learning Outcomes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages