Immune Cell Type Annotation from scRNA-seq using scGPT

We applied the scGPT transformer-based model to single-cell RNA-seq data to classify and annotate immune cell types from blood samples. This end-to-end pipeline processed ~20,000 cells and leveraged scGPT embeddings for dimensionality reduction, clustering, and immune subtype classification.

We identified ~10 immune subpopulations, validated using canonical marker genes and highly variable gene (HVG) selection. Our results demonstrate the power of large-scale generative models for high-resolution immune profiling, showcasing how transformer-based deep learning can effectively analyze complex, high-dimensional biological data.

📊 1. Data Preprocessing

Loaded raw single-cell RNA-seq count matrix.
Performed quality control: filtered low-quality cells and genes.
Normalized and log-transformed gene expression values.
Selected highly variable genes (HVGs) for downstream analysis.

🤖 2. Embedding Extraction with scGPT

Used the pre-trained whole_human checkpoint from scGPT.
Extracted cell embeddings using the embed_data function for all ~20K cells.

🧬 3. Clustering and Visualization

Applied UMAP for 2D visualization of the scGPT-generated cell embeddings.
Performed Leiden clustering to detect distinct cell populations.

UMAP of scGPT embeddings with Leiden clusters

🧾 4. Cell Type Annotation

Used canonical marker genes to assign immune cell identities.
Grouped cell types into three major immune categories:
- Lymphocyte
- Myeloid
- Platelet
Visualized both detailed annotations and grouped categories on UMAP plots.

Annotated UMAP with immune subtypes and categories

---

✅ Results

UMAP visualization revealed clear separation of major immune cell types.
Leiden clustering detected distinct subpopulations with high biological relevance.
Marker gene-based annotation identified the following cell types:
- CD4+ T cells
- CD8+ T cells
- Regulatory T cells (Tregs)
- B cells
- Plasma cells
- Natural Killer (NK) cells
- Monocytes
- Dendritic cells
- Macrophages
- Neutrophils
- Platelets
- Erythrocytes
Main immune categories (Lymphocyte, Myeloid, Platelet) were visualized and quantified.

✅ Strengths

Transformer-powered: scGPT models complex, nonlinear, and sparse single-cell data.
Pre-trained on large-scale data: The whole_human checkpoint enables strong generalization and transfer learning.
Gene-gene and cell-cell interactions: Captures long-range biological dependencies.
Superior embeddings: Clearer separation of cell types and subtypes.
Highly flexible: Adaptable to multi-modal data, metadata integration, and custom tasks.
Scalable and efficient: Optimized for large-scale datasets.
Effective for rare cell types: Better annotation in underrepresented populations.
Minimal fine-tuning required: Transfer learning enables use on new datasets with limited labeled data.

⚠️ Limitations

Requires careful preprocessing and thoughtful marker selection.
Performance depends on pre-trained model quality and comprehensive marker gene lists.
Computationally intensive, particularly when training from scratch.
Transformer models are less interpretable than simpler alternatives (e.g., PCA, scVI).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
single_cell_RNAseq_data_scGPT_analysis.ipynb		single_cell_RNAseq_data_scGPT_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Immune Cell Type Annotation from scRNA-seq using scGPT

📊 1. Data Preprocessing

🤖 2. Embedding Extraction with scGPT

🧬 3. Clustering and Visualization

🧾 4. Cell Type Annotation

✅ Results

✅ Strengths

⚠️ Limitations

🔗 Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Immune Cell Type Annotation from scRNA-seq using scGPT

📊 1. Data Preprocessing

🤖 2. Embedding Extraction with scGPT

🧬 3. Clustering and Visualization

🧾 4. Cell Type Annotation

✅ Results

✅ Strengths

⚠️ Limitations

🔗 Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages