- Download Dory from GitHub and navigate into the directory:
git clone https://github.com/mzxj/Dory.git && cd Dory
- Make the bash script executable:
chmod +x Dory
- To make Dory globally runnable, add its path to your environment's PATH variable. Append
export PATH=/path/to/Dory/:$PATHto your~/.bashrcor~/.bash_profile. Replace/path/to/Dorywith the actual path to the Dory directory. Then reload your shell:
source ~/.bashrc # or source ~/.bash_profile
- Install required R packages manually, one by one, for the most reliable setup, including
ggplot2,optparse,reshape2,dplyr,data.table,future,parallel, andfurrr. Alternatively, you can install all of them automatically using the provided script:
Rscript ./R/install_packages.R
or inside R: source("./R/install_packages.R"). Note: If installation fails for some packages in all of the above methods, please update your R installation to a newer version (R ≥ 4.4 recommended).
Test the Dory is available globally:
Dory -h
You should see the help message printed in your terminal.
Dory -m $FileFormat -a $Input1 -b $Input2 [-o $OutputPath] [-c $ChrNum] [--norm [--NormFactorA $x --NormFactorB $y]]
-m $FileFormat: (Required) Format of the input data ["4DN" or "csv"].- "4DN": If the input data is directly downloaded from the 4DN data portal.
- "csv": If the input data is standard .csv file. The 3D_coordinates.csv file requires these columns:
- Trace_ID: The ID for each chromatin trace.
- X, Y, Z: 3D spatial coordinates of the genomic region.
- Chrom, Chrom_Start, Chrom_End or Region_ID: Indicate genomic location either by chromosome name, start and end coordinate, or by the genomic region ID (e.g., 1,2, ...). If "Chrom, Chrom_Start Chrom_End" columns are not provided, then the "Region_ID" column is required.
-a $Input1: (Required). File containing 3D coordinates of genomic regions for foreground condition.-b $Input2: (Required). File containing 3D coordinates of genomic regions for background condition.-o $OutputPath: (Optional). Output directory path. Defaults to the current directory "./" if not specified.-c $ChrNum: (Optional). Number of traced chromosomes ["one" or "more"]. Default is "one". If only Region_ID is used, chromosome information cannot be determined; therefore, all region IDs are assumed to be on one chromosome.--norm: (Optional). Apply normalization to the distance matrix. Normalization is performed only when--normis specified; otherwise, no normalization is applied. By default, when--normis used without --NormFactorA or --NormFactorB, the normalization factor is calculated as the mean of the mean distances from each trace across all traces. To use user-defined normalization factors, specify--norm --NormFactorA $x --NormFactorB $y, where$xand$yare two numeric values corresponding to the normalization factors for-aand-b, respectively.--nboot $bootnum: (Optional). Number of resamplings for bootstrapping. Default is 1000.--CIpct $ci: (Optional). Confidence interval for bootstrapping [numeric values from 0 to 1]. Default is 0.9.--seed $seed: (Optional). Seed for bootstrapping. Default is 1.
Dory -m $FileFormat -i $InputCoordFile -l $LabelFile [-n $LabelColame] [-o $OutputPath] [-c $ChrNum] [--norm [--NormFactorI $NormFactorFile]]
-
-m $FileFormat: (Required) Format of the input data ["4DN" or "csv"].- "4DN": If input files are directly downloaded from the 4DN data portal.
- "csv": If the input data is standard .csv file.
- The 3D_coordinates.csv file requires these columns:
- Trace_ID: The ID for each chromatin trace.
- X, Y, Z: 3D spatial coordinates of the genomic region.
- Chrom, Chrom_Start, Chrom_End or Region_ID: Indicate genomic location either by chromosome name, start and end coordinate, or by the genomic region ID (e.g., 1,2, ...). If "Chrom, Chrom_Start Chrom_End" columns are not provided, then the "Region_ID" column is required.
- Cell_ID: The ID for the cell.
- The cell_label.csv file requires these columns:
- Cell_ID: The ID for the cell.
- Cell_Type: The cell-type label or classification assigned to the corresponding cell.
- The 3D_coordinates.csv file requires these columns:
-
-i $InputCoordFile: (Required). File containing 3D coordinates of genomic regions for all cells. -
-l $LabelFile: (Optional) File with cell-type labels or classifications for cells. Required if cell-type labels or classifications are not included in the input 3D coordinate file. -
-n $LabelColname: (Optional). The column name for the cell-type label or classification. Defaults to "Cell_Type" if not specified. -
-o $OutputPath: (Optional). Output directory path. Defaults to the current directory "./" if not specified. -
-c $ChrNum: (Optional). Number of traced chromosomes ["one" or "more"]. Default is "one". If only "Region_ID" is provided without "Chrom" information, all region IDs are assumed to be on one chromosome. -
--norm: (Optional). Apply normalization to the distance matrix. Normalization is performed only when--normis specified; otherwise, no normalization is applied. By default, when--normis used without --NormFactorI, the normalization factor is calculated as the mean of the mean distances from each trace across all traces. To use user-defined normalization factors, specify--norm --NormFactorI $NormFactorFile, where$NormFactorFileis a.csvfile with two columns:"$LabelColname"and"Norm_factor". The"$LabelColname"column specificies the cell-type label or classification, and the"Norm_factor"column specifies the corresponding numeric normalization factor.
If the traced genomic regions are located on one chromosome, the output includes:
-
S0_DataInfo
- Region.tsv: Contains traced genomic regions with columns for Chrom, Chrom_Start, Chrom_End and Region_ID.
- TraceCount.pdf: Displays the number of traces for each condition.
-
S1_Distance
- RegionPairByTrace_CT_x.tsv: The Euclidean distances between region pairs (rows) across all chromatin traces (columns) for each condition (CT_x).
-
S2_DiffScore
- DiffScore_CT_xVSCT_y.tsv: The lower triangle of the DiffScore matrix generated by Dory for comparing condition CT_x (foreground) with CT_y (background). A positive DiffScore indicates that the region pair distance is greater in CT_x than in CT_y; while a negative DiffScore indicates the opposite.
- DiffScoreHeatmap_CT_xVSCT_y.pdf/.png: The heatmap visualization of the DiffScore matrix.
- DRP_greater_CT_xVSCT_y.tsv: The significantly differential (p < 0.05) region pairs with greater distance in CT_x than in CT_y. Region pairs are ranked according to their DiffScore values.
- DRP_less_CT_xVSCT_y.tsv: The significantly differential (p < 0.05) region pairs with less distance in CT_x than in CT_y. Region pairs are ranked according to their DiffScore values.
Note: 1. These DRP lists are defined using the most relaxed significance threshold (p < 0.05). For a more stringent selection, users may choose the top-ranked region pairs from these lists based on DiffScore. 2. If normalization is performed, the output will be written to the folders NML_S0_DataInfo, NML_S1_Distance, NML_S2_DiffScore.
If the traced genomic regions span multiple chromosomes, the output directory structure changes slightly:
- Because Dory performs intra-chromosomal analysis, the results are separated by chromosome (Chrom).
- Each main output folder (S0_DataInfo, S1_Distance and S2_DiffScore) contains subdirectories named "chr_Chrom" (e.g., chr_chr1, chr_chr2, etc.), corresponding to each chromosome.
- Each "chr_Chrom" subdirectory contains the same types of output files as described above, but limited to the genomic regions on corresponding chromosome.
Using the two datasets 4DNFIHSXQZIV.csv and 4DNFITZBMT6Q.csv downloaded from the 4DN data portal as an example. Both files are located in the folder tests.
Dory -m 4DN -a /path/to/4DNFIHSXQZIV.csv -b /path/to/4DNFITZBMT6Q.csv -o output/to/path
For any questions regarding Dory, please contact Zhaoxia Ma (prm2zt@virginia.edu).