Analysis Pipeline for Spatial Transcription Profiling of Fallopian Tube Tissues from Germline BRCA1/2 Mutation Carriers
spaceRanger count default output includes feature-barcode matrices (gene x UMI counts) in tsv.gz or h5 format, a cloup file which can be opened in the Loupe Browser and visualize the analysis result ex. graph-based clustering and differential gene expression spatially, and a web_summary.html file to overview the QC matrices of the analysis. Manually annotatioin based on morphology can be exported and integrated to the Seurat analysis.
SpaceRanger also provide a pipeline called spaceranger aggr to allow user to aggregate multiple capture areas. The command takes a CSV file specifying a list of output files from individual samples and additional categories information for Loupe Browser to view or subset.
| library_id | molecule_h5 | cloupe_file | spatial_folder | sample_info | slide_info | seq_info |
|---|---|---|---|---|---|---|
| PO20 | /path/to/molecule_info.h5 | /path/to/cloupe.cloupe | /path/to/spatial | BRCA_Fimbrial | V10T06-110 | CTG_2021_075 |
| PO20_2 | /path/to/molecule_info.h5 | /path/to/cloupe.cloupe | /path/to/spatial | BRCA_Proximal | V10T06-110 | CTG_2021_075 |
| PO20a | /path/to/molecule_info.h5 | /path/to/cloupe.cloupe | /path/to/spatial | BRCA_Otherside | V10S21-050 | CTG_2021_099 |
In the spaceranger aggr function, one parameter can be adjusted is either perform a sequencing depth normalization or not before merging, which is recommended by 10XGenomics to avoid batch effect introduced by sequencing depth. The normalization can be turned off to maximize sensitivity (keep the original amount of reads of each sample) and perform batch correction in the downstream analysis (ex. Harmony). The output is basically same as the output for a single capture area except for the feature-barcode matrix which contains tissue related barcode nucleotide sequence plus number id to distinguish same spots from different tissues/capture areas. For example, AAACAACGAATAGTTC-1 and AAACAACGAATAGTTC-2. The numbering order reflects the sample order in the 'Aggregation CSV' above.
#' Run spaceranger on individual sample
./OVisium/SpaceRanger-2.0.1/SR_count.sh
#' Run spaceranger to aggregate all samples
./OVisium/SpaceRanger-2.0.1/SR_aggregation.sh
#' Generate a samplesheet of the spaceranger outputs for downstream analysis
./OVisium/SpaceRanger-2.0.1/infoTable.shWe also have two samples from two different patients and the capture areas were manually selected in the Loupe Browser using the Visium manual alignment -> Fiducial Alignment and obtained individual jason files and re-run the spaceRanger count on fastQ analysis, and lastly generated two separated outputs instead of one.
Based on the default graph-based clustering results from Spaceranger and Loupe Browser, we will need to run some batch correctionor sample integration to bring different sequencing libraries as well as the FTEs clusters into alignment. The workflow below is adapted to the 10XGenomics batch correction method on Visium data, Seurat R package for analysis of spatial datasets, Harmony and other R packages.
The newer version of Seurat introduces the SCTransform workflow, which is an alternative to the NormalizeData, FindVariableFeatures, ScaleData workflow. This procedure skips heuristic steps such as pseudocount addition or log-transformation and improves downstream analysis such as variable gene selection, dimensional reduction and differential expression.
STutility is an R-package with the goal of providing an easy-to-use visualization and analysis tool kit for spatial transcriptomics data. The package was developed by Ludvig Larsson and Joseph Bergenstråhle in the Genomics research group spatialresearch.org at the Royal Institute of Technology (KTH).
#' Manually Merge individual spacerange outputs instead of using the `spacerange agg`
#' Use the `infoTable` as mentioned above to merge the individuals in the same order
./OVisium/R/Data_Preprocessing/Merge_Visium_Samples.R
Visualize the QC matrices of individuals in the merged object including "nFeature_RNA", "nCount_RNA", "Mito.percent", "Ribo.percent" and "Hb.percent" by spatial plots and violin plots as well as overall density histogram of the merged object splited by tissue origins (fimbrial, proximal and other fimbrial). "log10FeaturesPerUMI", the overall complexity of the gene expression (novelty score) by visualizing the number of genes detected per UMI. Additionally, we also checked which genes contribute the most reads by plotting the percentage of counts per gene per spot.
#' This script generate QC plots in the supplementary figure 3 & 4
Rscript ./OVisium/R/Data_Preprocessing/Visualization_Data_Quality_Distribution.R
Filter, SCTransform individual samples and merge them through STutility. The variable features will be identified by the SCTransform from the individuals (3000) and then combined:
#' According to our data QC distribution, filter the spots as below before SCTransform:
#' nCount_RNA > 500 &
#' nFeature_RNA > 500 &
#' log10GenesPerUMI > 0.83 &
#' Mito.percent < 15 &
#' Ribo.percent >5 &
#' Ribo.percent < 50
Rscript ./OVisium/R/Data_Preprocessing/Subset_SCT_Merge.R
#' Output file: OVisium_SCT_merged.rds
#' Get statistic summary of the merged data
Rscript ./OVisium/R/Help_functions/Statistic_Result.R
Alternative, filter, SCTransform individual samples and integrate or merge them through Seurat. Gene symbols will be first converted to Ensembl ids. Individuals will be integrated and 3000 variable features will be selected after integration. Individuals can be also merged and individual variable features will be combined instead. The output files can be used in the deconvolution later.
#' According to our data distribution, filter the spots as below before SCTransform:
#' nCount_RNA > 500 &
#' nFeature_RNA > 500 &
#' log10GenesPerUMI > 0.83 &
#' Mito.percent < 15 &
#' Ribo.percent >5 &
#' Ribo.percent < 50
Rscript ./OVisium/R/Data_Preprocessing/Subset_SCT_Integrate_Seurat.R
#' Output file 1: OVisium_ens_SCT_integrated.rds and
#' Output file 2: OVisium_ens_SCT_merged.rds
Using the standard seurat analysis pipeline to process the data. The PCA is performed on the scale.data from the variable features. The scale.data of the SCTransform data is obtained through GetResidual function. The optimal number of principle components is determined by quantitative elbowplot method. The PCA step is applied in the previous step 2.3 right after merging.
The PCs threshold in the quantitative elbowplot:
-
Cutoff_1: The point where the PCs contribute less than 5% of standard deviation and the PCs cumulatively contribute over 90% of the standard deviation.
-
Cutoff_2: The point where the percent change in variation between the consecutive PCs is less than 0.1%.
#' This script is sourced in the step 2.3
#' Generate supplementary figure 8A
Rscript ./OVisium/R/Data_Preprocessing/Choose_Num_PCs.R
Next, Harmony R package will be used to integrates individual Visium samples (sample ids as covarient) by adjusting PCA embedding value of the features accordingly instead of the actual expression level. Further dimensional reduction ex. TSNE, UMAP and UWOT will be performed on both the PCA and the harmony embedding values so the data can be visualized on 2-dimensional plots. More details of how to use Harmony in Seurat can be found in this vignette.
#' Covarient: group.by.vars = "sample"
#' In the covergence plot, Harmony objective function value should descend over each harmony_idx
#' Generate plots in the supplementary figure 8B-D
Rscript ./OVisium/R/Data_Preprocessing/Harmony_Integration_DimReduction.R
#' Output file: OVisium_SCT_merged_dims.rds
The optimal number of clusters for the Visium data can be determined by the clustree R package, which utilizes C7 (Gene730) house keeping gene and SC3 stability to visualize the clusterings, together with the morphological appearance of the tissue. The resolution of clustering is tested between 0.1 and 1 with 0.1 increasement in the Seurat FindClusters function.
#' Perform clustering on PCA and Harmony data
#' clustree plots for the supplementary figure 9
Rscript ./OVisium/R/Data_Preprocessing/Choose_Number_Clusters.R
#' Output file: OVisium_SCT_merged_clust.rds
#' Export clusters annotation for Loupe Browser
Rscript ./OVisium/R/Help_functions/Export_Clusters_LoupeBrowser.R
#' Subset and rename clusters based on the morphology
Rscript ./OVisium/R/Help_functions/Subset_Rename_Clusters.R
#' Plot clusters annotation spatially
#' The optimal resolution based on morphology and preliminary DEG is 0.6 for the OVisium data
Rscript ./OVisium/R/Help_functions/Plot_Clusters_Spatial.R
Before running the DEA using the standard Seurat FindMarkers function, we will perform log2 transformation for better fold-change calculation and harmony batch correction on the expression matrix (SCTransformed count data) to remove batch effects from the Visium preparation.
The detail steps for the OVisium data:
- Use all spots from 12 clusters in the OVisium_SCT_merged_clust.rds.
- Subset SCTcounts from 6189 most variable genes united from 18 samples.
- Calculate log2(SCTcounts+1) on spot x gene matrix.
- Perform HarmonyMatrix using sample id as covariant.
- Subset away spots from cluster_4
- Filter variable genes based on distribution of the OVisium data:
- Total counts per gene >500 &
- Max counts > 4 &
- Percentage of spots > 1% &
- 98 Percentile of log2(Data+1) > log2(2.5) &
- Remove genes contain: "^MT-", "^RP[SL]", "^MTRNR", "^LINC"
#' Input file: OVisium_SCT_merged_clust.rds
Rscript ./OVisium/R/Differential_Expression_Analysis/HarmonyMatrix_log2DataPlus1_QC_Filter_VariableGenes.R
#' Output file: Variable_features_filt_SCT_log2counts+1_harmony.RData
#' Additional output files for the manuscript:
#' 11clusters_filt_vfeatureas_SCTcounts_log2+1_harmony_samples.csv
#' Filtered_spots_sample_cluster_annotation.csv
As a default, Seurat performs differential expression based on the non-parameteric Wilcoxon rank sum test. Here, we choose “MAST”, GLM-framework that treates cellular detection rate as a covariate (Finak et al, Genome Biology, 2015).
The following steps are used for the OVisium data:
- Scale and center HarmonyLog2Data.
- Average difference of HarmonyLog2Data between the two groups (
ident.1,ident.2, 11x10). - Calculate the
rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC - Filter with
min.diff.pct> 0.1 or 0.25, abs(avg_log2FC) > log2(1.5 or 2.5) and abs(rank) > -log10(10E-5)* log2(1.5 or 2.5). Depends onident.1&ident.2are from the same tissue compartment or not. - Combine the results with the same
ident.1. - Order they by
avg_log2FCdescending. - Remove duplicates with smaller
avg_log2FC. - Top20 of each: group by
ident.2 and slice max byrank`. - Merge all results from different
ident.1. - Order they by
avg_log2FCdescending. - Remove duplicates with smaller
avg_log2FC. - Top20 of all: Filter
avg_log2FC> log2(2), then group byident.1and slice max byrank.
#' DEA between 11 clusters
#' Generate complexheatmap Figure 3A in the manuscript
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_ComplexHeatmap.R
#' Output path: /DE_functional_analysis/OVisium_SCT_merged/harmony_SCT_res_0.6/Variable_features_filt/MAST/FindMarker
#' Output folder name: 2024-04-20_harmony_sample_log2SCTcounts1
The results data frame has the following columns :
p_val: p_val (unadjusted)avg_log2FC: log2 fold-chage of the average expression between the two groups. Positive values indicate that the feature is more highly expressed in the first group.pct.1: The percentage of cells where the feature is detected in the first groupident.1pct.2: The percentage of cells where the feature is detected in the second groupident.2p_val_adj: Adjusted p-value, based on bonferroni correction using all features in the dataset.
In the OVisium data, we have 2 clusters from the epithelial compartment and 8 clusters from the stroma compartment. One mix cluster located on conjunction of epithelial and stroma is excluded in this analysis.
- Subset FTE clusters (0_FTE and 1_FTE)
- Scale and center HarmonyLog2Data.
- Average difference of HarmonyLog2Data between the two groups (
ident.1,ident.2, 2x1). - Calculate the
rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC - Filter with
min.diff.pct> 0.1, abs(avg_log2FC) > 0.25 and abs(rank) > -log10(10E-5)* 0.25. - Combine the results with the same
ident.1. - Order they by
avg_log2FCdescending. - Remove duplicates with smaller
avg_log2FC. - Top20 of each: group by
ident.2and slice max byrank. - Merge all results from different
ident.1. - Order they by
avg_log2FCdescending. - Remove duplicates with smaller
avg_log2FC. - Top20 of all: Filter
avg_log2FC> log2(1.5), then group byident.1and slice max byrank.
#' DEA between 2 clusters
#' Generate complexheatmap
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_FTE_ComplexHeatmap.R
#' Output fold name: 2024-05-16_harmony_sample_log2SCTcounts1_FTE_only
- Subset Stroma clusters (2, 5-11_Stroma)
- Scale and center HarmonyLog2Data.
- Average difference of HarmonyLog2Data between the two groups (
ident.1,ident.2, 8x7). - Calculate the
rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC - Filter with
min.diff.pct> 0.1, abs(avg_log2FC) > log2(1.5) and abs(rank) > -log10(10E-5)* log2(1.5). - Combine the results with the same
ident.1. - Order they by
avg_log2FCdescending. - Remove duplicates with smaller
avg_log2FC. - Top20 of each: group by
ident.2and slice max byrank. - Merge all results from different
ident.1. - Order they by
avg_log2FCdescending. - Remove duplicates with smaller
avg_log2FC. - Top20 of all: Filter
avg_log2FC> log2(2), then group byident.1and slice max byrank.
#' DEA between 8 clusters
#' Generate complexheatmap
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_Stroma_ComplexHeatmap.R
#' Output fold name: 2024-05-04_harmony_sample_log2SCTcounts1_Stroma
DEA on Visium samples from different part of the fallopian tubes from the same patient. Only 4 patients in our cohort have paired samples.
The following steps are used in our OVisium data:
- Subset individual patient (no. 1, 3, 6 and 10)
- Scale and center HarmonyLog2Data.
- Average difference of HarmonyLog2Data between the two groups (
ident.1:Fimbrial,ident.2:Proximal). - Calculate the
rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC - Filter with
min.diff.pct> 0.1. - Top20: Filter abs(
avg_log2FC) >= 0.25 and slice max by abs(rank). - Valcano plot: all DEGs with
pCutoff= 10E-5 (adj_p_val) andFCcutoff= log2(1.5) - Optional: bulk analysis through
AverageExpressionfunction to plot Top20 genes on fimbrial or proximal of the same patient.
#' Generate volcano plot for each patient
#' Optional: generate complexheatmap for the bulk analysis
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_Patient_Volcano_ComplexHeatmap.R
#' Output fold name: 2024-05-16_patient_pair
#' Optional bulk analysis
#' Complexheatmap split by ident.1
Rscript ./OVisium/R/Differential_Expression_Analysis/AverageExpresion_id1_ComplexHeatmap.R
#' Complexheatmap split by ident.2
Rscript ./OVisium/R/Differential_Expression_Analysis/AverageExpresion_id2_ComplexHeatmap.R
We have 8 BRCA1 and 3 BRCA2 mutation carriers and total 14 fimbrial tissues.
The follow steps are used in our OVisium data:
- Subset Fimbrial samples (14).
- Scale and center HarmonyLog2Data.
- Average difference of HarmonyLog2Data between the two groups (
ident.1:BRCA1,ident.2:BRCA2). - Calculate the
rank: -log10(p_val_adj+2.225074e-308)*avg_log2FC - Filter with
min.diff.pct> 0.1. - Top20: Filter abs(
avg_log2FC) >= 0.25 and slice max by abs(rank). - Valcano plot: all DEGs with
pCutoff= 10E-5 (adj_p_val) andFCcutoff= log2(1.5) - Optional: bulk analysis through
AverageExpressionfunction to plot Top20 genes on patient or mutational variant.
#' Generate volcano plot for each patient
#' Optional: generate complexheatmap for the bulk analysis
Rscript ./OVisium/R/Differential_Expression_Analysis/FindMarkers_BRCA_Volcano_ComplexHeatmap.R
#' Output fold name: 2024-05-16_mutation
#' Optional bulk analysis
#' Complexheatmap split by ident.1
Rscript ./OVisium/R/Differential_Expression_Analysis/AverageExpresion_id1_ComplexHeatmap.R
#' Complexheatmap split by ident.2
Rscript ./OVisium/R/Differential_Expression_Analysis/AverageExpresion_id2_ComplexHeatmap.R
Other sources:
- Seurat v4.3 DEA
- DEA workshop provided by Harvard Chan Bioinformatics Core
We use gene set enrichment analysis (GSEA) as well as gene over-representation analysis (Enrich) in the clusterProfiler to map the function annotation of those differential expressed genes. The functional annotation is obtained from Molecular Signatures Database (MSigDB) and CellMarker 2.0 which are manually curated from over 10 000 publications as well as scRNAseq data.
MSigDB contains 9 major collections and the latest version can be downloaded from the homepage or the R package msigdbr:
- H: hallmark
- C1: positional
- C2: curated
- C3: motif
- C4: computational
- C5: ontology
- C6: oncogenic signature
- C7: immunologic signature
- C8: cell type signature
In order to run the GSEA in the clusterProfiler, we need to convert the gene name to NCBI Entrez gene id first. As the gene name/symbol from the 10X data can be different to the ones in the database due to multiple name or new name), we will use the ensembl id in our data to make it easier but they don't always link to each other well either. There are tools to convert multiple features at once however none of them can convert all at once. Many of those features failed to convert are pseudogene or none coding RNA or due to outdated database (usually for uncharacterized or noval genes). By combining different tools and database, we were able to convert 25312 of 36601 features from aggregated 10X Visium data:
#' Spaceranger aggregated data folder: AGG_BRCA_18_20231018
Rscript ./OVisium/R/Help_functions/Ensemble_To_Entrezid.R
The individual DEA results from individual ident.1 to others will be annotated to 9 major collections and 34550 gene sets from the MSigDB and over 15,000 Cell type biomarkers curated from the literatures as well as scRNAseq cell markers CellMarker 2.0.
For each DE gene list, the GSEA generates outputs in 36 folders with the names of annotation collections ex. "c1.all.v2023.2", and subfolders in each collection:
- rds: results save in rds format
- runplot: gseaplot2 with running enrichment score on the y axis and preranked gene list on the x axis showing top 5 enriched gene annotations
- cnet: cnetplot with gene names of the top 5 enriched gene annotations
- tsbar: two-side barplot with normalized enrichment scores of top 10 enriched gene annotations
#' Individual DE gene lists with comparison of `ident.1` to others
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_DEA_id1.R
#' DE gene lists after combined the results from the same `ident.1` and differ by the `ident.2`
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_DEA_id2.R
#' Prepare figure 3 in the manuscript
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_GO_Heatmaps_in_Fig3.R
#' Generate feature plot, spatial feature plot, violin plot and dotplot for genes in the select annotation for individual sample
#' Output folders are sorted by gene name or sample id
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_NewMarkers_Sample_Spatial_Feature_Vln_Dot_Plot.R
#' DE gene lists from the 2.6.3
Rscript ./OVisium/R/Functional_Annotation_Analysis/GSEA_Patients.R
The over-representation analysis tests the DE genes whether significantly over represented in the gene set by hypergeometric distribution without consideration of fold change, so we first split the genes into "Up" (FC > 1.5), "Down" (FC < 0.75) and everything between as "No Change".
#' DE gene lists from the 2.6.3
Rscript ./OVisium/R/Functional_Annotation_Analysis/Enrichment_Patients.R
Since it is not single cell resolution, we perfomed cell type deconvolution on individual spots using the scRNAseq reference data from normal-like fallopian tubes (Ulrich et al. Development Cell 2022).
The methods in this study are adapted from 10XGenomics using the spacexr to map normal cell types in the normal-like fallopian tubes tissues, and ecotyper to discover cell states and ecotypes in the tumor tissues. Seurat also provides workflow to integrate spatial and scRNAseq data.
Prepare the reference RNA expression data in a similar way described in section 2.3 using Seurat SCTransform and integration method to process the data.
Before SCTransform, the reference data was splited by donor id, and then applied additional filterings Ribo.percent > 5 & log10GenesPerUMI > 0.8. After SCT, data from different donors were merged and integrated. A very small portion of cells are missed annotated based on their original cell type annotation, but overall the clustering looks good.
#' Note, the reference data used ensembl id instead of gene symbol
Rscript ./OVisium/R/Deconvolution/Spacexr_Ref_HealthyTube_Ulrich.R
#' Output file
healthyTubes_SCT_vIntegrated.rds
RCTD applys supervised Parametric probability model to estimate cell type composition in each spot in the visium data. More details of the workflow can be found in one of their tutorials.
The RCTD applies directly on the raw RNA expression matrix of the individual Visium sample as well as the aggregated Visium data from the spaceranger standard output.
Output files for individual sample:
- xxx_weights.png, cell type weight spatial plots.
- xxx_binary weights.png, cell type binary weight spatial plots with thresholds 75th percentile and median.
- cell_type_nUMI.png, Number of UMI counts spatial plot.
- cell_type_occur.png, Cell type occur histogram.
- SpatialPie_cell_type_occur.png, cell type occur in mini piechart with spatial coordinates for individual spots.
- count_cell_type_clusters_table.csv, a table with cell type counts in individual clusters.
- weight_clusters_table.csv, a table with cell type weight in individual spots.
- norm_weight_clusters_table.csv, a table with cell type normalized weight (probabilities sum to 1) in individual spots.
Output boxplots for the multiple samples:
- aggStack_cluster_pair_aggregated.png, stack boxplot with percentage of aggregated counts of cell type in each cluster for patients with paired fimbrial and proximal samples.
- aggFill_cluster_pair_aggregated.png, stack boxplot with aggregated counts of cell type in each cluster for patients with paired fimbrial and proximal samples.
- countStack_cluster_pair_aggregated.png, stack boxplot with percentage of filtered counts of cell type in each cluster for patients with paired fimbrial and proximal samples.
- countFill_cluster_pair_aggregated.png, stack boxplot with filtered counts of cell type in each cluster for patients with paired fimbrial and proximal samples.
- aggStack_cluster_all_aggregated.png, stack boxplot with percentage of aggregated counts of cell type in each cluster for all samples.
- aggFill_cluster_all_aggregated.png, stack boxplot with aggregated counts of cell type in each cluster for all samples.
- countStack_cluster_all_aggregated.png, stack boxplot with percentage of filtered counts of cell type in each cluster for all samples.
- countFill_cluster_all_aggregated.png, stack boxplot with filtered counts of cell type in each cluster for all samples.
Here we can compare the cell type composition to the morphology and the clusters annotation of the same spots. We can plot cell type composition in each cluster of individuals, which further comfirm the morphological and cell type differences between the clusters.
#' RCTD analysis
Rscript ./OVisium/R/Deconvolution/Spacexr_Visium_Full_Mode.R
#' Option: comparison of BRCA1 and BRCA2 in Cell type count percentage
Rscript ./OVisium/R/Deconvolution/Spacexr_CellType_Total_Count.R
#' Option: pairwise comparison of Fimbrial and Proximal from BRCA1 carriers in Cell type count percentage
Rscript ./OVisium/R/Deconvolution/Spacexr_CellType_Total_Count_Tissue.R
Inspired by the coldspring harbor protocol originated from Cid et al. (2021) to extract cell type sigantures in bulk-tissue RNAseq using reference scRNAseq data, we apply Pearson correlation to identify cell type sigantures from the differential expressed genes in the Visium data:
- Select DEGs from all 11 clusters in OVisium analysis.
- Used all DEGs from section 2.6.1.
- Paired-wise correlations of all individual cells using scRNAseq expression data for selected DEGs.
- PCA analysis with selected DEGs show some separation of different cell types.
- Filter genes with correlation > 0.5.
- Unsupervised hierarchical clustering (Pearson correlation, average linkage) to extract cell type DEG signatures using the cell type annotations in the reference data.
- Validate the cell type DEG signatures in the Visium clusters.
#' Use all DEGs
Rscript ./OVisium/R/Deconvolution/CellType_Signatures_scRNAseq_correlation_all_DEGs.R
#' Split the DEGs by log2FC > or <0 to upregulated or downregulated signatures
Rscript ./OVisium/R/Deconvolution/CellType_Signatures_scRNAseq_correlation.R
For High grade serous carcinoma Visium samples, we can use ecoTyper, a machine learning framework to identify cell states and multicellular communities. Through 12 major cell lineages across 16 types of human carcinoma, EcoTyper identified 69 transcriptionally defined cell states.
#' 1. Setup docker on terminal and add non-root user:
./OVisium/R/Deconvolution/ecotyper/docker.sh
#' 2. Select the working directory and download the ecotyper github files:
#' 3. Edit config yml file to config_recovery_OVisium.yml:
./OVisium/R/Deconvolution/ecotyper/download.sh
./OVisium/R/Deconvolution/ecotyper/config_recovery_OVisium.yml
#' 4. Manually copy filtered feature bc matrices and spatial tissue position list and high resolution image
#' 5. In terminal run ecotyper Visium method for each sample:
#' Edit the yml file: ./OVisium/R/Deconvolution/ecotyper/config_recovery_OVisium.yml
#' Input Visium directory : "OVisium_data/OVisium_Carcinoma/`SampleID`"
./OVisium/R/Deconvolution/ecotyper/recovery.sh
#' 6. (Optional) In terminal run ecotyper scRNAseq method on individual sample:
#' Use loupe browser to export matrices and cluster annotation files
Rscript EcoTyper_recovery_visium.R --discovery 'carcinoma' --matrix OVisium_data/OVisium_Carcinoma/`SampleID`







