This project encompasses the comprehensive analysis of single-cell RNA-sequencing (scRNA-seq) data from a study on the effect of chronic house dust mite (HDM) exposure in mice. The analysis compares wild-type (WT) and IL-1β knockout (KO) mice under both vehicle (VEH) and HDM treatment conditions.
The primary goal is to identify cell type-specific transcriptional changes and compositional shifts in the lung immune microenvironment in response to HDM exposure and to understand the role of IL-1β in this process.
The analysis is structured as a series of R Markdown (.Rmd) scripts, designed to be executed sequentially. Each script performs a distinct step in the analysis pipeline, from initial data preprocessing to downstream functional analysis and visualization.
Raw FASTQ/metadata accession number: GSE300531 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE300531)
00_install_packages.Rmd: Installs all required R packages from CRAN, Bioconductor, and GitHub.01_Ambient_RNA_Removal.Rmd: Removes ambient RNA contamination from the raw count data using SoupX.02_preprocessing.Rmd: Applies quality control (QC) filters to remove low-quality cells based on metrics like feature counts, UMI counts, and mitochondrial gene percentage.03_DoubletFinder_QC.Rmd: Identifies and removes potential doublets from each sample using the DoubletFinder package.
04_Integration.Rmd: Integrates the eight individual datasets using Seurat's SCTransform-based integration workflow to correct for batch effects.05_Initial_Cluster_Annotation.Rmd: Performs initial clustering and assigns broad cell type identities using SingleR for automated annotation and canonical marker genes for manual curation.
06_Subclustering_*: A series of scripts dedicated to subclustering major cell lineages to identify more cell subtypes:06_Subclustering_B_cells.Rmd06_Subclustering_T_NK_cells.Rmd06_Subclustering_neutrophils.Rmd06_Subclustering_MNPs.Rmd(Mononuclear Phagocytes and other stromal cells)
07_Merged_cluster.Rmd: Merges the refined subcluster annotations back into the main Seurat object.08_Reintegration.Rmd: Re-runs the integration on the fully annotated dataset to improve visualization and downstream analysis.
09_*_DGE_*: A series of scripts dedicated to conducting DGE analysis:09_1_DGE_GO_KEGG.Rmd: Conducts initial differential gene expression (DGE) analysis between experimental conditions for each cell type. It also includes Gene Ontology (GO) and KEGG pathway enrichment analysis.09_2_DGE_GSEA_Reactome.Rmd: Performs Gene Set Enrichment Analysis (GSEA) and Reactome pathway analysis on the DGE results.09_3_DGE_decoupleR.Rmd: Infers transcription factor (TF) and pathway activities using the decoupleR.
10_compositional_analysis.Rmd: Performs statistical analysis to identify significant changes in cell type proportions across the different experimental conditions.
- Setup: Begin by running the
00_install_packages.Rmdscript to ensure all dependencies are installed. - Execution Order: Run the Rmd scripts in the numerical order of their prefixes (from
01to10). The output of each script serves as the input for the next. - Data Paths: Before running, ensure that the
project_dirvariable in each script is set to the correct path of the project directory on the local machine. - Outputs: Processed data, intermediate results, and figures will be saved in the
processed_filesdirectory within each script's respective folder.