██████╗ ███████╗███████╗██████╗ ██████╗ ██╗ ██╗ ██████╗ ██╔══██╗██╔════╝██╔════╝██╔══██╗██╔══██╗██║ ██║██╔═══██╗ ██║ ██║█████╗ █████╗ ██████╔╝██████╔╝███████║██║ ██║ ██║ ██║██╔══╝ ██╔══╝ ██╔═══╝ ██╔══██╗██╔══██║██║ ██║ ██████╔╝███████╗███████╗██║ ██║ ██║██║ ██║╚██████╔╝ ╚═════╝ ╚══════╝╚══════╝╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝ v2.0
DeepRho: software accompanyment for "DeepRho: Accurate Estimation of Recombination Rate from Inferred Genealogies using Deep Learning", Haotian Zhang and Yufeng Wu, manuscript, 2021.
DeepRho constructs images from population genetic data and takes advantage of the power of convolutional neural network (CNN) in image classification to etstimate recombination rate. The key idea of DeepRho is generating genetics-informative images based on inferred gene geneaologies and linkage disequilibrium from population genetic data.
deeprho is an open-source software developed for per-base recombination rate estimation from inferred genealogies using deep learning. deeprho makes estimates based on LD patterns and local genealogical trees inferred by RENT+.
- OS: Linux, Windows, MacOS
- Software: Conda
- Device: CUDA-Enabled GPU (optional, default set to use CPU)
- Clone from GitHub:
git clone https://github.com/haotianzh/deeprho_v2.gitor download & unzip the file to your local directory. - Enter root directory:
cd deeprho_v2 - Create a virtual environment through conda:
conda create -n deeprho python=3.7 openjdk=11 msprime - Activate conda environment:
conda activate deeprho - Install:
pip install . - Validate:
deeprho -v - [Optional] see GPU support if you are seeking to use GPU
- ms-formatted input (the first line is position (seperated by space) followed by haplotype sequences, check
examples/data.msfor details) - VCF file (check
examples/data.vcf)
-
# save a precalculated lookup table for a user provided demography deeprho maketable --demography examples/YRI_pop_sizes.csv --out YRI_pop_table
-
# estimate recombination rates deeprho estimate --file examples/example_YRI.vcf --ploidy 2 --table YRI_pop_table --num-thread 8 --plot --verbose
-
demography is a
# generate a test case under a given evolutionary setting deeprho test --demography examples/YRI_pop_sizes.csv --rate-map examples/test_recombination_map.txt --npop 50 --ploidy 2 --out test.vcf
.csvfile which contains at least three columnslabel,x(time) andy(size).labelis the population name which should have only one population in a single file,timeis measured in generation, seeexamples/ACB_pop_sizes.csvfor example.
Default output name is formatted as <FILE>.rate[.txt|.png|.npy] in the same directory as your input.
-
.txtfile consists of 3 columnsStart,EndandRateseperated by tab. a simple output likes:# your_vcf_file_name.rate.txt Start End Rate 0 8 0.0 8 1822 2.862294427352283e-08 1822 4321 2.3297465959039865e-08 4321 7125 1.6098357471351787e-08 7125 10570 4.027717518356611e-09 10570 14312 2.1394376828669226e-09 14312 17689 2.2685986706092933e-09 17689 19928 1.6854787948356243e-09
-
.pngfile shows a simple plot of estimated recombination map.
-
.npyfile stores andarrayobject recording recombination rate per base, the i-th element of thendarraydenotes the rate from base i to base (i+1).
GPU Support (more)
- First check if your graphics card is CUDA-enabled.
- Check compatibility table to find appropriate python, tensorflow, CUDA, cuDNN version combo.
- Install
cudatoolkitandcudnn:conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0 - (For Linux) Set env:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/(have to do this step every time you restart the session) - Verify install:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
-
deeprho maketable [-h] [--ne NE] [--demography DEMOGRAPHY] [--npop NPOP] [--ploidy PLOIDY] [--rmin RMIN] \ [--rmax RMAX] [--repeat REPEAT] [--draw DRAW] [--num-thread NUM_THREAD] [--verbose]
Arguments Descriptions --ploidy <PLOIDY>Ploidy (default 2) --ne <NE>Effective population size (default 105) --demography <DEMOGRAPHY>Demography file if no lookup table provided --npop <NPOP>Number of individuals or samples --num-thread <NUMTHREAD>Number of workers for parallel (default 4) --rmin <RMIN>Min of recombination rate per base per generation --rmax <RMAX>Max of recombination rate per base per generation --repeat <REPEAT>Number of repeats in simulation --draw <DRAW>Number of repeats in simulation --verboseShow loggings in console --help, -hShow usage -
deeprho estimate [-h] [--file FILE] [--length LENGTH] [--ne NE] [--ploidy PLOIDY] [--res RES] \ [--threshold THRESHOLD] [--gws GWS] [--ws WS] [--ss SS] [--m1 MODEL_FINE] \ [--m2 MODEL_LARGE] [--num-thread NUM_THREAD] [--plot] [--savenp] [--verbose]
Arguments Descriptions --file <FILE>Input file --ploidy <PLOIDY>Ploidy (default 1) --ne <NE>Effective population size (default 105) --demography <DEMOGRAPHY>Demography file if no lookup table provided --gws <GWS>Window size for inferring genealogy (default 103 SNPs) --ws <WS>Window size for performing deeprho(fixed at 50 SNPs)--ss <SS>Step size for performing deeprho(default as 25 SNPs)--length <LENGTH>Length of chromosome --m1 <MODELFINE>Path of fine model --m2 <MODELLARGE>Path of large model --threshold <THRESHOLD>Threshold of recombination Hotspot (default 5x10-8) --savenpSave estimated rates as numpy ndarray (saved as <FILE>.out.npy)--plotPlot recombination map (saved as <FILE>.out.png)--num-thread <NUMTHREAD>Specify number of workers for parallel (default 4) --verboseShow loggings in console --help, -hShow usage <LENGTH>can be either explicitly specified or inferred from input, if the latter,<LENGTH>= Sn-S1, where Sn is physical position of the last SNP site, S1 is the position of the first SNP site.<MODELFINE>, <MODELLARGE>are two pretrained-models,deeprhotakes two-stages strategies to estimate recombination rate,<MODELFINE>is applied for estimating recombination background regions while<MODELLARGE>is used to fine-tune hotspot regions. two default models with a constant demographic model are included in this repo, users are also allowed to train their own models through following sections.<THRESHOLD>defines a threshold above which a region can be regarded as a hotspot. 5x10-8 is set as default.<GWS>guides how large region the genealogies are inferred from. As our test, 1000 is a great choice to include as much information as possible for improving local genealogical inference.
-
deeprho test [-h] [--ne NE] [--demography DEMOGRAPHY] [--npop NPOP] [--ploidy PLOIDY] [--rate-map RATEMAP] \ [--recombination-rate RATE] [--sequence-length LENGTH] [--num-thread NUM_THREAD] [--verbose]
Arguments Descriptions --ploidy <PLOIDY>Ploidy (default 2) --ne <NE>Effective population size (default 105) --demography <DEMOGRAPHY>Demography file if no lookup table provided --npop <NPOP>Number of individuals or samples --sequence-length <LENGTH>Length of simulated genome --recombination-rate <RRATE>Recombination rate --rate-map <RATEMAP>Recombination rate map --mutation-rate <MRATE>Mutation rate (default as 2.5x10-8) --help, -hShow usage -
Demography settings: there are some software used for inferring demographic history, such as PSMC, SMC++, MSMC. Here we take SMC++ output as our input but only contains one population, get more information about SMC++ output.
TIPS: If you are not familiar with these parametric settings, just leave them as default if possible.
Feel free to shoot us at haotianzh@uconn.edu.