GitHub - haotianzh/deeprho: deeprho is a method for estimating recombination rate given population genetic data

██████╗ ███████╗███████╗██████╗ ██████╗ ██╗  ██╗ ██████╗ 
██╔══██╗██╔════╝██╔════╝██╔══██╗██╔══██╗██║  ██║██╔═══██╗
██║  ██║█████╗  █████╗  ██████╔╝██████╔╝███████║██║   ██║
██║  ██║██╔══╝  ██╔══╝  ██╔═══╝ ██╔══██╗██╔══██║██║   ██║
██████╔╝███████╗███████╗██║     ██║  ██║██║  ██║╚██████╔╝
╚═════╝ ╚══════╝╚══════╝╚═╝     ╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝    v2.0

DeepRho: software accompanyment for "DeepRho: Accurate Estimation of Recombination Rate from Inferred Genealogies using Deep Learning", Haotian Zhang and Yufeng Wu, manuscript, 2021.

DeepRho constructs images from population genetic data and takes advantage of the power of convolutional neural network (CNN) in image classification to etstimate recombination rate. The key idea of DeepRho is generating genetics-informative images based on inferred gene geneaologies and linkage disequilibrium from population genetic data.

Code

deeprho is an open-source software developed for per-base recombination rate estimation from inferred genealogies using deep learning. deeprho makes estimates based on LD patterns and local genealogical trees inferred by RENT+.

Prerequisites

OS: Linux, Windows, MacOS
Software: Conda
Device: CUDA-Enabled GPU (optional, default set to use CPU)

Installations

Clone from GitHub: git clone https://github.com/haotianzh/deeprho_v2.git or download & unzip the file to your local directory.
Enter root directory: cd deeprho_v2
Create a virtual environment through conda: conda create -n deeprho python=3.7 openjdk=11 msprime
Activate conda environment: conda activate deeprho
Install: pip install .
Validate: deeprho -v
[Optional] see GPU support if you are seeking to use GPU

Input Formats

ms-formatted input (the first line is position (seperated by space) followed by haplotype sequences, check examples/data.ms for details)
VCF file (check examples/data.vcf)

Usages (Examples)

deeprho maketable

# save a precalculated lookup table for a user provided demography 
deeprho maketable --demography examples/YRI_pop_sizes.csv --out YRI_pop_table

deeprho estimate

# estimate recombination rates
deeprho estimate --file examples/example_YRI.vcf --ploidy 2 --table YRI_pop_table --num-thread 8 --plot --verbose

deeprho test
```
# generate a test case under a given evolutionary setting
deeprho test --demography examples/YRI_pop_sizes.csv --rate-map examples/test_recombination_map.txt --npop 50 --ploidy 2 --out test.vcf
```
demography is a .csv file which contains at least three columns label, x (time) and y(size). label is the population name which should have only one population in a single file, time is measured in generation, see examples/ACB_pop_sizes.csv for example.

Outputs

Default output name is formatted as <FILE>.rate[.txt|.png|.npy] in the same directory as your input.

.txt file consists of 3 columns Start, End and Rate seperated by tab. a simple output likes:

# your_vcf_file_name.rate.txt
Start	End	Rate
0	8	0.0
8	1822	2.862294427352283e-08
1822	4321	2.3297465959039865e-08
4321	7125	1.6098357471351787e-08
7125	10570	4.027717518356611e-09
10570	14312	2.1394376828669226e-09
14312	17689	2.2685986706092933e-09
17689	19928	1.6854787948356243e-09

.png file shows a simple plot of estimated recombination map.
.npy file stores a ndarray object recording recombination rate per base, the i-th element of the ndarray denotes the rate from base i to base (i+1).

GPU Support (more)

First check if your graphics card is CUDA-enabled.
Check compatibility table to find appropriate python, tensorflow, CUDA, cuDNN version combo.
Install cudatoolkit and cudnn: conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
(For Linux) Set env: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/ (have to do this step every time you restart the session)
Verify install: python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Docs

Make lookup table

  deeprho maketable [-h] [--ne NE] [--demography DEMOGRAPHY] [--npop NPOP] [--ploidy PLOIDY] [--rmin RMIN] \
                    [--rmax RMAX] [--repeat REPEAT] [--draw DRAW] [--num-thread NUM_THREAD] [--verbose]

Arguments	Descriptions
`--ploidy <PLOIDY>`	Ploidy (default 2)
`--ne <NE>`	Effective population size (default 10⁵)
`--demography <DEMOGRAPHY>`	Demography file if no lookup table provided
`--npop <NPOP>`	Number of individuals or samples
`--num-thread <NUMTHREAD>`	Number of workers for parallel (default 4)
`--rmin <RMIN>`	Min of recombination rate per base per generation
`--rmax <RMAX>`	Max of recombination rate per base per generation
`--repeat <REPEAT>`	Number of repeats in simulation
`--draw <DRAW>`	Number of repeats in simulation
`--verbose`	Show loggings in console
`--help, -h`	Show usage

Estimate

  deeprho estimate [-h] [--file FILE] [--length LENGTH] [--ne NE] [--ploidy PLOIDY] [--res RES] \
                    [--threshold THRESHOLD] [--gws GWS] [--ws WS] [--ss SS] [--m1 MODEL_FINE] \
                    [--m2 MODEL_LARGE] [--num-thread NUM_THREAD] [--plot] [--savenp] [--verbose]

Arguments	Descriptions
`--file <FILE>`	Input file
`--ploidy <PLOIDY>`	Ploidy (default 1)
`--ne <NE>`	Effective population size (default 10⁵)
`--demography <DEMOGRAPHY>`	Demography file if no lookup table provided
`--gws <GWS>`	Window size for inferring genealogy (default 10³ SNPs)
`--ws <WS>`	Window size for performing `deeprho` (fixed at 50 SNPs)
`--ss <SS>`	Step size for performing `deeprho` (default as 25 SNPs)
`--length <LENGTH>`	Length of chromosome
`--m1 <MODELFINE>`	Path of fine model
`--m2 <MODELLARGE>`	Path of large model
`--threshold <THRESHOLD>`	Threshold of recombination Hotspot (default 5x10^-8)
`--savenp`	Save estimated rates as numpy ndarray (saved as `<FILE>.out.npy`)
`--plot`	Plot recombination map (saved as `<FILE>.out.png`)
`--num-thread <NUMTHREAD>`	Specify number of workers for parallel (default 4)
`--verbose`	Show loggings in console
`--help, -h`	Show usage

<LENGTH> can be either explicitly specified or inferred from input, if the latter, <LENGTH>= S_n-S₁, where S_n is physical position of the last SNP site, S₁ is the position of the first SNP site.
<MODELFINE>, <MODELLARGE> are two pretrained-models, deeprho takes two-stages strategies to estimate recombination rate, <MODELFINE> is applied for estimating recombination background regions while <MODELLARGE> is used to fine-tune hotspot regions. two default models with a constant demographic model are included in this repo, users are also allowed to train their own models through following sections.
<THRESHOLD> defines a threshold above which a region can be regarded as a hotspot. 5x10^-8 is set as default.
<GWS> guides how large region the genealogies are inferred from. As our test, 1000 is a great choice to include as much information as possible for improving local genealogical inference.

Test

  deeprho test [-h] [--ne NE] [--demography DEMOGRAPHY] [--npop NPOP] [--ploidy PLOIDY] [--rate-map RATEMAP] \
                    [--recombination-rate RATE] [--sequence-length LENGTH] [--num-thread NUM_THREAD] [--verbose]

Arguments	Descriptions
`--ploidy <PLOIDY>`	Ploidy (default 2)
`--ne <NE>`	Effective population size (default 10⁵)
`--demography <DEMOGRAPHY>`	Demography file if no lookup table provided
`--npop <NPOP>`	Number of individuals or samples
`--sequence-length <LENGTH>`	Length of simulated genome
`--recombination-rate <RRATE>`	Recombination rate
`--rate-map <RATEMAP>`	Recombination rate map
`--mutation-rate <MRATE>`	Mutation rate (default as 2.5x10^-8)
`--help, -h`	Show usage

Demography settings: there are some software used for inferring demographic history, such as PSMC, SMC++, MSMC. Here we take SMC++ output as our input but only contains one population, get more information about SMC++ output.

TIPS: If you are not familiar with these parametric settings, just leave them as default if possible.

Contact:

Feel free to shoot us at haotianzh@uconn.edu.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
deeprho		deeprho
examples		examples
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code

Prerequisites

Installations

Input Formats

Usages (Examples)

deeprho maketable

deeprho estimate

deeprho test

Outputs

GPU Support (more)

Docs

Make lookup table

Estimate

Test

Contact:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Code

Prerequisites

Installations

Input Formats

Usages (Examples)

deeprho maketable

deeprho estimate

deeprho test

Outputs

GPU Support (more)

Docs

Make lookup table

Estimate

Test

Contact:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages