A pipeline to identify meiotic recombination events using trio samples of 10x genomics longranger vcf outputs
Author: Peng Xu
Email: peng.xu@mssm.edu
Draft date: April. 30, 2018
To cite: Xu, P., Kennell, T., Gao, M., Human Genome Structural Variation, C., Kimberly, R.P., and Chong, Z. (2020). MRLR: unraveling high-resolution meiotic recombination by linked reads. Bioinformatics 36, 10-16.
This is a pipeline to identify meiotic recombination events that occur during gamete formation and trasmit into next generation. The input should be trio samples with vcf files generated by 10x genomics longranger software. It can also be used to achieve whole chromosomal phasing of the child genome based on pedigree haplotype comparison.
The program was tested on a x86_64 Linux system with a 8GB physical memory. The work can be usually finished within half an hour. Bedtools (https://github.com/arq5x/bedtools2) is required for the program.
git clone https://github.com/penguab/MRLR.git
Then, please also add this directory to your PATH:
export PATH=$PWD/MRLR/:$PATH
Three vcf files from trio samples of 10x genomics longranger vcf outputs are required for analysis. The outputs have three files: final_$profix_child.vcf (Reconstructed gamete genomes), final_$profix_F_C_sum (Recombination events from Father) and final_$profix_M_C_sum (Recombination events from Mother).
MRLR.sh -f <Father_vcf> -m <Mother_vcf> -c <Child_vcf> [-oablps]
-f father vcf file from longranger output
-m mother vcf file from longranger output
-c child vcf file from longranger output
------------optional--------
-o output file profix; default='trio'
-a min arm length (kb); default=20
-b min supporting barcode; default=4
-l min block length (kb); default=500
-p max breakpoint region length(kb); default=100
-s min SNV number; default=20
To test the pipeline.
gzip -d *.gz
MRLR.sh -f NA12891_chr20.vcf -m NA12892_chr20.vcf -c NA12878_chr20.vcf -o NA12878_chr20
11/28/2018: update test files for the pipeline