OVisium

Analysis Pipeline for Spatial Transcription Profiling of Fallopian Tube Tissues from Germline BRCA1/2 Mutation Carriers

OVisium data analysis workflow overview:

Analysis of 10X Visium Spatial data

1. SpaceRanger and Loupe Browser

spaceRanger count default output includes feature-barcode matrices (gene x UMI counts) in tsv.gz or h5 format, a cloup file which can be opened in the Loupe Browser and visualize the analysis result ex. graph-based clustering and differential gene expression spatially, and a web_summary.html file to overview the QC matrices of the analysis. Manually annotatioin based on morphology can be exported and integrated to the Seurat analysis.

SpaceRanger also provide a pipeline called spaceranger aggr to allow user to aggregate multiple capture areas. The command takes a CSV file specifying a list of output files from individual samples and additional categories information for Loupe Browser to view or subset.

library_id	molecule_h5	cloupe_file	spatial_folder	sample_info	slide_info	seq_info
PO20	/path/to/molecule_info.h5	/path/to/cloupe.cloupe	/path/to/spatial	BRCA_Fimbrial	V10T06-110	CTG_2021_075
PO20_2	/path/to/molecule_info.h5	/path/to/cloupe.cloupe	/path/to/spatial	BRCA_Proximal	V10T06-110	CTG_2021_075
PO20a	/path/to/molecule_info.h5	/path/to/cloupe.cloupe	/path/to/spatial	BRCA_Otherside	V10S21-050	CTG_2021_099

In the spaceranger aggr function, one parameter can be adjusted is either perform a sequencing depth normalization or not before merging, which is recommended by 10XGenomics to avoid batch effect introduced by sequencing depth. The normalization can be turned off to maximize sensitivity (keep the original amount of reads of each sample) and perform batch correction in the downstream analysis (ex. Harmony). The output is basically same as the output for a single capture area except for the feature-barcode matrix which contains tissue related barcode nucleotide sequence plus number id to distinguish same spots from different tissues/capture areas. For example, AAACAACGAATAGTTC-1 and AAACAACGAATAGTTC-2. The numbering order reflects the sample order in the 'Aggregation CSV' above.

#' Run spaceranger on individual sample
./OVisium/SpaceRanger-2.0.1/SR_count.sh

#' Run spaceranger to aggregate all samples
./OVisium/SpaceRanger-2.0.1/SR_aggregation.sh

#' Generate a samplesheet of the spaceranger outputs for downstream analysis
./OVisium/SpaceRanger-2.0.1/infoTable.sh

We also have two samples from two different patients and the capture areas were manually selected in the Loupe Browser using the Visium manual alignment -> Fiducial Alignment and obtained individual jason files and re-run the spaceRanger count on fastQ analysis, and lastly generated two separated outputs instead of one.

2. Visium Data Analysis with Seurat

Based on the default graph-based clustering results from Spaceranger and Loupe Browser, we will need to run some batch correctionor sample integration to bring different sequencing libraries as well as the FTEs clusters into alignment. The workflow below is adapted to the 10XGenomics batch correction method on Visium data, Seurat R package for analysis of spatial datasets, Harmony and other R packages.

2.1. Workflow Details

The newer version of Seurat introduces the SCTransform workflow, which is an alternative to the NormalizeData, FindVariableFeatures, ScaleData workflow. This procedure skips heuristic steps such as pseudocount addition or log-transformation and improves downstream analysis such as variable gene selection, dimensional reduction and differential expression.

STutility is an R-package with the goal of providing an easy-to-use visualization and analysis tool kit for spatial transcriptomics data. The package was developed by Ludvig Larsson and Joseph Bergenstråhle in the Genomics research group spatialresearch.org at the Royal Institute of Technology (KTH).

#' Manually Merge individual spacerange outputs instead of using the `spacerange agg`
#' Use the `infoTable` as mentioned above to merge the individuals in the same order  
./OVisium/R/Data_Preprocessing/Merge_Visium_Samples.R

2.2. Imaging and QC

Visualize the QC matrices of individuals in the merged object including "nFeature_RNA", "nCount_RNA", "Mito.percent", "Ribo.percent" and "Hb.percent" by spatial plots and violin plots as well as overall density histogram of the merged object splited by tissue origins (fimbrial, proximal and other fimbrial). "log10FeaturesPerUMI", the overall complexity of the gene expression (novelty score) by visualizing the number of genes detected per UMI. Additionally, we also checked which genes contribute the most reads by plotting the percentage of counts per gene per spot.

#' This script generate QC plots in the supplementary figure 3 & 4
Rscript ./OVisium/R/Data_Preprocessing/Visualization_Data_Quality_Distribution.R

2.3. Subset, Filter, SCTransform and Select Variable Features

Filter, SCTransform individual samples and merge them through STutility. The variable features will be identified by the SCTransform from the individuals (3000) and then combined:

#' According to our data QC distribution, filter the spots as below before SCTransform:
#' nCount_RNA > 500 &
#' nFeature_RNA > 500 &
#' log10GenesPerUMI > 0.83 &
#' Mito.percent < 15 &
#' Ribo.percent >5 &
#' Ribo.percent < 50 
Rscript ./OVisium/R/Data_Preprocessing/Subset_SCT_Merge.R

#' Output file: OVisium_SCT_merged.rds

#' Get statistic summary of the merged data
Rscript ./OVisium/R/Help_functions/Statistic_Result.R

Alternative, filter, SCTransform individual samples and integrate or merge them through Seurat. Gene symbols will be first converted to Ensembl ids. Individuals will be integrated and 3000 variable features will be selected after integration. Individuals can be also merged and individual variable features will be combined instead. The output files can be used in the deconvolution later.

#' According to our data distribution, filter the spots as below before SCTransform:
#' nCount_RNA > 500 &
#' nFeature_RNA > 500 &
#' log10GenesPerUMI > 0.83 &
#' Mito.percent < 15 &
#' Ribo.percent >5 &
#' Ribo.percent < 50 
Rscript ./OVisium/R/Data_Preprocessing/Subset_SCT_Integrate_Seurat.R

#' Output file 1: OVisium_ens_SCT_integrated.rds and
#' Output file 2: OVisium_ens_SCT_merged.rds

2.4. PCA Dimensional Reduction and Harmony Integration (Standard Seurat)

Using the standard seurat analysis pipeline to process the data. The PCA is performed on the scale.data from the variable features. The scale.data of the SCTransform data is obtained through GetResidual function. The optimal number of principle components is determined by quantitative elbowplot method. The PCA step is applied in the previous step 2.3 right after merging.

The PCs threshold in the quantitative elbowplot:

Cutoff_1: The point where the PCs contribute less than 5% of standard deviation and the PCs cumulatively contribute over 90% of the standard deviation.
Cutoff_2: The point where the percent change in variation between the consecutive PCs is less than 0.1%.

#' This script is sourced in the step 2.3
#' Generate supplementary figure 8A
Rscript ./OVisium/R/Data_Preprocessing/Choose_Num_PCs.R

Next, Harmony R package will be used to integrates individual Visium samples (sample ids as covarient) by adjusting PCA embedding value of the features accordingly instead of the actual expression level. Further dimensional reduction ex. TSNE, UMAP and UWOT will be performed on both the PCA and the harmony embedding values so the data can be visualized on 2-dimensional plots. More details of how to use Harmony in Seurat can be found in this vignette.

#' Covarient: group.by.vars = "sample"
#' In the covergence plot, Harmony objective function value should descend over each harmony_idx
#' Generate plots in the supplementary figure 8B-D
Rscript ./OVisium/R/Data_Preprocessing/Harmony_Integration_DimReduction.R

#' Output file: OVisium_SCT_merged_dims.rds

2.5. Determine the Number of Clusters

The optimal number of clusters for the Visium data can be determined by the clustree R package, which utilizes C7 (Gene730) house keeping gene and SC3 stability to visualize the clusterings, together with the morphological appearance of the tissue. The resolution of clustering is tested between 0.1 and 1 with 0.1 increasement in the Seurat FindClusters function.

#' Perform clustering on PCA and Harmony data
#' clustree plots for the supplementary figure 9
Rscript ./OVisium/R/Data_Preprocessing/Choose_Number_Clusters.R

#' Output file: OVisium_SCT_merged_clust.rds

#' Export clusters annotation for Loupe Browser
Rscript ./OVisium/R/Help_functions/Export_Clusters_LoupeBrowser.R

#' Subset and rename clusters based on the morphology
Rscript ./OVisium/R/Help_functions/Subset_Rename_Clusters.R

#' Plot clusters annotation spatially
#' The optimal resolution based on morphology and preliminary DEG is 0.6 for the OVisium data 
Rscript ./OVisium/R/Help_functions/Plot_Clusters_Spatial.R

2.6 Differential Expression Analaysis

Before running the DEA using the standard Seurat FindMarkers function, we will perform log2 transformation for better fold-change calculation and harmony batch correction on the expression matrix (SCTransformed count data) to remove batch effects from the Visium preparation.

The detail steps for the OVisium data:

Use all spots from 12 clusters in the OVisium_SCT_merged_clust.rds.
Subset SCTcounts from 6189 most variable genes united from 18 samples.
Calculate log2(SCTcounts+1) on spot x gene matrix.
Perform HarmonyMatrix using sample id as covariant.

$$ HarmonyLog2Data = HarmonyMatrix(log2(SCT\ Counts_{variable\ features} + 1)) $$

Subset away spots from cluster_4
Filter variable genes based on distribution of the OVisium data:
- Total counts per gene >500 &
- Max counts > 4 &
- Percentage of spots > 1% &
- 98 Percentile of log2(Data+1) > log2(2.5) &
- Remove genes contain: "^MT-", "^RP[SL]", "^MTRNR", "^LINC"

#' Input file: OVisium_SCT_merged_clust.rds
Rscript ./OVisium/R/Differential_Expression_Analysis/HarmonyMatrix_log2DataPlus1_QC_Filter_VariableGenes.R

#' Output file: Variable_features_filt_SCT_log2counts+1_harmony.RData
#' Additional output files for the manuscript:
#' 11clusters_filt_vfeatureas_SCTcounts_log2+1_harmony_samples.csv
#' Filtered_spots_sample_cluster_annotation.csv

2.6.1 `FindMarkers` between Clusters using MAST

As a default, Seurat performs differential expression based on the non-parameteric Wilcoxon rank sum test. Here, we choose “MAST”, GLM-framework that treates cellular detection rate as a covariate (Finak et al, Genome Biology, 2015).

The following steps are used for the OVisium data:

Scale and center HarmonyLog2Data.
Average difference of HarmonyLog2Data between the two groups (ident.1, ident.2, 11x10).
Calculate the rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC
Filter with min.diff.pct > 0.1 or 0.25, abs(avg_log2FC) > log2(1.5 or 2.5) and abs(rank) > -log10(10E-5)* log2(1.5 or 2.5). Depends on ident.1 & ident.2 are from the same tissue compartment or not.
Combine the results with the same ident.1.
Order they by avg_log2FC descending.
Remove duplicates with smaller avg_log2FC.
Top20 of each: group by ident.2 and slice max by rank`.
Merge all results from different ident.1.
Order they by avg_log2FC descending.
Remove duplicates with smaller avg_log2FC.
Top20 of all: Filter avg_log2FC > log2(2), then group by ident.1 and slice max by rank.

#' DEA between 11 clusters
#' Generate complexheatmap Figure 3A in the manuscript
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_ComplexHeatmap.R

#' Output path: /DE_functional_analysis/OVisium_SCT_merged/harmony_SCT_res_0.6/Variable_features_filt/MAST/FindMarker
#' Output folder name: 2024-04-20_harmony_sample_log2SCTcounts1

The results data frame has the following columns :

p_val : p_val (unadjusted)
avg_log2FC : log2 fold-chage of the average expression between the two groups. Positive values indicate that the feature is more highly expressed in the first group.
pct.1 : The percentage of cells where the feature is detected in the first group ident.1
pct.2 : The percentage of cells where the feature is detected in the second group ident.2
p_val_adj : Adjusted p-value, based on bonferroni correction using all features in the dataset.

2.6.2 `FindMarkers` between Clusters within the Same Tissue Compartment

In the OVisium data, we have 2 clusters from the epithelial compartment and 8 clusters from the stroma compartment. One mix cluster located on conjunction of epithelial and stroma is excluded in this analysis.

FTE Clusters:

Subset FTE clusters (0_FTE and 1_FTE)
Scale and center HarmonyLog2Data.
Average difference of HarmonyLog2Data between the two groups (ident.1, ident.2, 2x1).
Calculate the rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC
Filter with min.diff.pct > 0.1, abs(avg_log2FC) > 0.25 and abs(rank) > -log10(10E-5)* 0.25.
Combine the results with the same ident.1.
Order they by avg_log2FC descending.
Remove duplicates with smaller avg_log2FC.
Top20 of each: group by ident.2 and slice max by rank.
Merge all results from different ident.1.
Order they by avg_log2FC descending.
Remove duplicates with smaller avg_log2FC.
Top20 of all: Filter avg_log2FC > log2(1.5), then group by ident.1 and slice max by rank.

#' DEA between 2 clusters
#' Generate complexheatmap
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_FTE_ComplexHeatmap.R

#' Output fold name: 2024-05-16_harmony_sample_log2SCTcounts1_FTE_only

Stroma Clusters:

Subset Stroma clusters (2, 5-11_Stroma)
Scale and center HarmonyLog2Data.
Average difference of HarmonyLog2Data between the two groups (ident.1, ident.2, 8x7).
Calculate the rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC
Filter with min.diff.pct > 0.1, abs(avg_log2FC) > log2(1.5) and abs(rank) > -log10(10E-5)* log2(1.5).
Combine the results with the same ident.1.
Order they by avg_log2FC descending.
Remove duplicates with smaller avg_log2FC.
Top20 of each: group by ident.2 and slice max by rank.
Merge all results from different ident.1.
Order they by avg_log2FC descending.
Remove duplicates with smaller avg_log2FC.
Top20 of all: Filter avg_log2FC > log2(2), then group by ident.1 and slice max by rank.

#' DEA between 8 clusters
#' Generate complexheatmap
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_Stroma_ComplexHeatmap.R

#' Output fold name: 2024-05-04_harmony_sample_log2SCTcounts1_Stroma

2.6.3 `FindMarkers` between Fimbrial and Proximal Tissues

DEA on Visium samples from different part of the fallopian tubes from the same patient. Only 4 patients in our cohort have paired samples.

The following steps are used in our OVisium data:

Subset individual patient (no. 1, 3, 6 and 10)
Scale and center HarmonyLog2Data.
Average difference of HarmonyLog2Data between the two groups (ident.1:Fimbrial, ident.2:Proximal).
Calculate the rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC
Filter with min.diff.pct > 0.1.
Top20: Filter abs(avg_log2FC) >= 0.25 and slice max by abs(rank).
Valcano plot: all DEGs with pCutoff = 10E-5 (adj_p_val) and FCcutoff = log2(1.5)
Optional: bulk analysis through AverageExpression function to plot Top20 genes on fimbrial or proximal of the same patient.

#' Generate volcano plot for each patient
#' Optional: generate complexheatmap for the bulk analysis
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_Patient_Volcano_ComplexHeatmap.R

#' Output fold name: 2024-05-16_patient_pair

#' Optional bulk analysis
#' Complexheatmap split by ident.1
Rscript ./OVisium/R/Differential_Expression_Analysis/AverageExpresion_id1_ComplexHeatmap.R
#' Complexheatmap split by ident.2
Rscript ./OVisium/R/Differential_Expression_Analysis/AverageExpresion_id2_ComplexHeatmap.R

2.6.4 `FindMarkers` between BRCA1 and BRCA2 Fimbrial

We have 8 BRCA1 and 3 BRCA2 mutation carriers and total 14 fimbrial tissues.

The follow steps are used in our OVisium data:

Subset Fimbrial samples (14).
Scale and center HarmonyLog2Data.
Average difference of HarmonyLog2Data between the two groups (ident.1:BRCA1, ident.2:BRCA2).
Calculate the rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC
Filter with min.diff.pct > 0.1.
Top20: Filter abs(avg_log2FC) >= 0.25 and slice max by abs(rank).
Valcano plot: all DEGs with pCutoff = 10E-5 (adj_p_val) and FCcutoff = log2(1.5)
Optional: bulk analysis through AverageExpression function to plot Top20 genes on patient or mutational variant.

#' Generate volcano plot for each patient
#' Optional: generate complexheatmap for the bulk analysis
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_BRCA_Volcano_ComplexHeatmap.R

#' Output fold name: 2024-05-16_mutation

#' Optional bulk analysis
#' Complexheatmap split by ident.1
Rscript ./OVisium/R/Differential_Expression_Analysis/AverageExpresion_id1_ComplexHeatmap.R
#' Complexheatmap split by ident.2
Rscript ./OVisium/R/Differential_Expression_Analysis/AverageExpresion_id2_ComplexHeatmap.R

Other sources:

Seurat v4.3 DEA
DEA workshop provided by Harvard Chan Bioinformatics Core

2.7 Functional Annotation

We use gene set enrichment analysis (GSEA) as well as gene over-representation analysis (Enrich) in the clusterProfiler to map the function annotation of those differential expressed genes. The functional annotation is obtained from Molecular Signatures Database (MSigDB) and CellMarker 2.0 which are manually curated from over 10 000 publications as well as scRNAseq data.

MSigDB contains 9 major collections and the latest version can be downloaded from the homepage or the R package msigdbr:

H: hallmark
C1: positional
C2: curated
C3: motif
C4: computational
C5: ontology
C6: oncogenic signature
C7: immunologic signature
C8: cell type signature

In order to run the GSEA in the clusterProfiler, we need to convert the gene name to NCBI Entrez gene id first. As the gene name/symbol from the 10X data can be different to the ones in the database due to multiple name or new name), we will use the ensembl id in our data to make it easier but they don't always link to each other well either. There are tools to convert multiple features at once however none of them can convert all at once. Many of those features failed to convert are pseudogene or none coding RNA or due to outdated database (usually for uncharacterized or noval genes). By combining different tools and database, we were able to convert 25312 of 36601 features from aggregated 10X Visium data:

#' Spaceranger aggregated data folder: AGG_BRCA_18_20231018
Rscript ./OVisium/R/Help_functions/Ensemble_To_Entrezid.R

2.7.1 Gene Set Enrichment Analysis (GSEA):

The individual DEA results from individual ident.1 to others will be annotated to 9 major collections and 34550 gene sets from the MSigDB and over 15,000 Cell type biomarkers curated from the literatures as well as scRNAseq cell markers CellMarker 2.0.

For each DE gene list, the GSEA generates outputs in 36 folders with the names of annotation collections ex. "c1.all.v2023.2", and subfolders in each collection:

rds: results save in rds format
runplot: gseaplot2 with running enrichment score on the y axis and preranked gene list on the x axis showing top 5 enriched gene annotations
cnet: cnetplot with gene names of the top 5 enriched gene annotations
tsbar: two-side barplot with normalized enrichment scores of top 10 enriched gene annotations

#' Individual DE gene lists with comparison of `ident.1` to others 
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_DEA_id1.R

#' DE gene lists after combined the results from the same `ident.1` and differ by the `ident.2`
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_DEA_id2.R

#' Prepare figure 3 in the manuscript
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_GO_Heatmaps_in_Fig3.R

#' Generate feature plot, spatial feature plot, violin plot and dotplot for genes in the select annotation for individual sample
#' Output folders are sorted by gene name or sample id
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_NewMarkers_Sample_Spatial_Feature_Vln_Dot_Plot.R

#' DE gene lists from the 2.6.3
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_Patients.R

2.7.2 Over-Representation Analysis (ORA):

The over-representation analysis tests the DE genes whether significantly over represented in the gene set by hypergeometric distribution without consideration of fold change, so we first split the genes into "Up" (FC > 1.5), "Down" (FC < 0.75) and everything between as "No Change".

#' DE gene lists from the 2.6.3
Rscript ./OVisium/R/Functional_Annotation_Analysis/Enrichment_Patients.R

2.8 Deconvolution

Since it is not single cell resolution, we perfomed cell type deconvolution on individual spots using the scRNAseq reference data from normal-like fallopian tubes (Ulrich et al. Development Cell 2022).

The methods in this study are adapted from 10XGenomics using the spacexr to map normal cell types in the normal-like fallopian tubes tissues, and ecotyper to discover cell states and ecotypes in the tumor tissues. Seurat also provides workflow to integrate spatial and scRNAseq data.

2.8.1 QC, filtering and integration of reference scRNAseq data

Prepare the reference RNA expression data in a similar way described in section 2.3 using Seurat SCTransform and integration method to process the data. Before SCTransform, the reference data was splited by donor id, and then applied additional filterings Ribo.percent > 5 & log10GenesPerUMI > 0.8. After SCT, data from different donors were merged and integrated. A very small portion of cells are missed annotated based on their original cell type annotation, but overall the clustering looks good.

#' Note, the reference data used ensembl id instead of gene symbol
Rscript ./OVisium/R/Deconvolution/Spacexr_Ref_HealthyTube_Ulrich.R

#' Output file
healthyTubes_SCT_vIntegrated.rds

2.8.2 Robust cell type decomposition (RCTD) on the raw expression counts

RCTD applys supervised Parametric probability model to estimate cell type composition in each spot in the visium data. More details of the workflow can be found in one of their tutorials.

The RCTD applies directly on the raw RNA expression matrix of the individual Visium sample as well as the aggregated Visium data from the spaceranger standard output.

Output files for individual sample:

xxx_weights.png, cell type weight spatial plots.
xxx_binary weights.png, cell type binary weight spatial plots with thresholds 75th percentile and median.
cell_type_nUMI.png, Number of UMI counts spatial plot.
cell_type_occur.png, Cell type occur histogram.
SpatialPie_cell_type_occur.png, cell type occur in mini piechart with spatial coordinates for individual spots.
count_cell_type_clusters_table.csv, a table with cell type counts in individual clusters.
weight_clusters_table.csv, a table with cell type weight in individual spots.
norm_weight_clusters_table.csv, a table with cell type normalized weight (probabilities sum to 1) in individual spots.

Output boxplots for the multiple samples:

aggStack_cluster_pair_aggregated.png, stack boxplot with percentage of aggregated counts of cell type in each cluster for patients with paired fimbrial and proximal samples.
aggFill_cluster_pair_aggregated.png, stack boxplot with aggregated counts of cell type in each cluster for patients with paired fimbrial and proximal samples.
countStack_cluster_pair_aggregated.png, stack boxplot with percentage of filtered counts of cell type in each cluster for patients with paired fimbrial and proximal samples.
countFill_cluster_pair_aggregated.png, stack boxplot with filtered counts of cell type in each cluster for patients with paired fimbrial and proximal samples.
aggStack_cluster_all_aggregated.png, stack boxplot with percentage of aggregated counts of cell type in each cluster for all samples.
aggFill_cluster_all_aggregated.png, stack boxplot with aggregated counts of cell type in each cluster for all samples.
countStack_cluster_all_aggregated.png, stack boxplot with percentage of filtered counts of cell type in each cluster for all samples.
countFill_cluster_all_aggregated.png, stack boxplot with filtered counts of cell type in each cluster for all samples.

Here we can compare the cell type composition to the morphology and the clusters annotation of the same spots. We can plot cell type composition in each cluster of individuals, which further comfirm the morphological and cell type differences between the clusters.

#' RCTD analysis
Rscript ./OVisium/R/Deconvolution/Spacexr_Visium_Full_Mode.R

#' Option: comparison of BRCA1 and BRCA2 in Cell type count percentage
Rscript ./OVisium/R/Deconvolution/Spacexr_CellType_Total_Count.R

#' Option: pairwise comparison of Fimbrial and Proximal from BRCA1 carriers in Cell type count percentage
Rscript ./OVisium/R/Deconvolution/Spacexr_CellType_Total_Count_Tissue.R

2.8.3 DEGs Pearson correlation to identify cell type specific signatures

Inspired by the coldspring harbor protocol originated from Cid et al. (2021) to extract cell type sigantures in bulk-tissue RNAseq using reference scRNAseq data, we apply Pearson correlation to identify cell type sigantures from the differential expressed genes in the Visium data:

Select DEGs from all 11 clusters in OVisium analysis.
Used all DEGs from section 2.6.1.
Paired-wise correlations of all individual cells using scRNAseq expression data for selected DEGs.
PCA analysis with selected DEGs show some separation of different cell types.
Filter genes with correlation > 0.5.
Unsupervised hierarchical clustering (Pearson correlation, average linkage) to extract cell type DEG signatures using the cell type annotations in the reference data.
Validate the cell type DEG signatures in the Visium clusters.

#' Use all DEGs
Rscript ./OVisium/R/Deconvolution/CellType_Signatures_scRNAseq_correlation_all_DEGs.R

#' Split the DEGs by log2FC > or <0 to upregulated or downregulated signatures
Rscript ./OVisium/R/Deconvolution/CellType_Signatures_scRNAseq_correlation.R

2.8.4 Ecotyper on HGSC

For High grade serous carcinoma Visium samples, we can use ecoTyper, a machine learning framework to identify cell states and multicellular communities. Through 12 major cell lineages across 16 types of human carcinoma, EcoTyper identified 69 transcriptionally defined cell states.

#' 1. Setup docker on terminal and add non-root user:
./OVisium/R/Deconvolution/ecotyper/docker.sh

#' 2. Select the working directory and download the ecotyper github files:
#' 3. Edit config yml file to config_recovery_OVisium.yml:
./OVisium/R/Deconvolution/ecotyper/download.sh
./OVisium/R/Deconvolution/ecotyper/config_recovery_OVisium.yml

#' 4. Manually copy filtered feature bc matrices and spatial tissue position list and high resolution image
#' 5. In terminal run ecotyper Visium method for each sample:
#' Edit the yml file: ./OVisium/R/Deconvolution/ecotyper/config_recovery_OVisium.yml
#' Input Visium directory : "OVisium_data/OVisium_Carcinoma/`SampleID`"
./OVisium/R/Deconvolution/ecotyper/recovery.sh

#' 6. (Optional) In terminal run ecotyper scRNAseq method on individual sample:
#' Use loupe browser to export matrices and cluster annotation files 
Rscript EcoTyper_recovery_visium.R --discovery 'carcinoma' --matrix OVisium_data/OVisium_Carcinoma/`SampleID`

Name		Name	Last commit message	Last commit date
Latest commit History 308 Commits
Data		Data
R		R
SpaceRanger-2.0.1		SpaceRanger-2.0.1
man		man
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OVisium

Analysis Pipeline for Spatial Transcription Profiling of Fallopian Tube Tissues from Germline BRCA1/2 Mutation Carriers

OVisium data analysis workflow overview:

Analysis of 10X Visium Spatial data

1. SpaceRanger and Loupe Browser

2. Visium Data Analysis with Seurat

2.1. Workflow Details

2.2. Imaging and QC

2.3. Subset, Filter, SCTransform and Select Variable Features

2.4. PCA Dimensional Reduction and Harmony Integration (Standard Seurat)

2.5. Determine the Number of Clusters

2.6 Differential Expression Analaysis

2.6.1 `FindMarkers` between Clusters using MAST

2.6.2 `FindMarkers` between Clusters within the Same Tissue Compartment

FTE Clusters:

Stroma Clusters:

2.6.3 `FindMarkers` between Fimbrial and Proximal Tissues

2.6.4 `FindMarkers` between BRCA1 and BRCA2 Fimbrial

2.7 Functional Annotation

2.7.1 Gene Set Enrichment Analysis (GSEA):

2.7.2 Over-Representation Analysis (ORA):

2.8 Deconvolution

2.8.1 QC, filtering and integration of reference scRNAseq data

2.8.2 Robust cell type decomposition (RCTD) on the raw expression counts

2.8.3 DEGs Pearson correlation to identify cell type specific signatures

2.8.4 Ecotyper on HGSC

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OVisium

Analysis Pipeline for Spatial Transcription Profiling of Fallopian Tube Tissues from Germline BRCA1/2 Mutation Carriers

OVisium data analysis workflow overview:

Analysis of 10X Visium Spatial data

1. SpaceRanger and Loupe Browser

2. Visium Data Analysis with Seurat

2.1. Workflow Details

2.2. Imaging and QC

2.3. Subset, Filter, SCTransform and Select Variable Features

2.4. PCA Dimensional Reduction and Harmony Integration (Standard Seurat)

2.5. Determine the Number of Clusters

2.6 Differential Expression Analaysis

2.6.1 FindMarkers between Clusters using MAST

2.6.2 FindMarkers between Clusters within the Same Tissue Compartment

FTE Clusters:

Stroma Clusters:

2.6.3 FindMarkers between Fimbrial and Proximal Tissues

2.6.4 FindMarkers between BRCA1 and BRCA2 Fimbrial

2.7 Functional Annotation

2.7.1 Gene Set Enrichment Analysis (GSEA):

2.7.2 Over-Representation Analysis (ORA):

2.8 Deconvolution

2.8.1 QC, filtering and integration of reference scRNAseq data

2.8.2 Robust cell type decomposition (RCTD) on the raw expression counts

2.8.3 DEGs Pearson correlation to identify cell type specific signatures

2.8.4 Ecotyper on HGSC

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2.6.1 `FindMarkers` between Clusters using MAST

2.6.2 `FindMarkers` between Clusters within the Same Tissue Compartment

2.6.3 `FindMarkers` between Fimbrial and Proximal Tissues

2.6.4 `FindMarkers` between BRCA1 and BRCA2 Fimbrial

Packages