Post processing

Post-processing

Here we will introduce a few common post-processing steps to get the most out of your PopIns2 analysis, improve the overview of results and safe some disk space (without data loss).

Constructing a multi-VCF file of your entire population

You can obtain a VCF file that summarizes the genotypes of all variants in all samples by following the steps below. They require the UNIX tools cat, bgzip and tabix as well as VCFtools and might be translated into a workflow language of your choice.

Sort the VCF files

input: {PATH_TO_YOUR_PROJECT}/{SAMPLE}/insertions.vcf
output: {PATH_TO_YOUR_PROJECT}/{SAMPLE}/insertions_sorted.vcf
shell: cat {input} | vcf-sort -c > {output}

Compress the sorted VCF files

input: {PATH_TO_YOUR_PROJECT}/{SAMPLE}/insertions_sorted.vcf
output: {PATH_TO_YOUR_PROJECT}/{SAMPLE}/insertions_sorted.vcf.gz
shell: bgzip {input}

Index the compressed VCF files

input: {PATH_TO_YOUR_PROJECT}/{SAMPLE}/insertions_sorted.vcf.gz
output: {PATH_TO_YOUR_PROJECT}/{SAMPLE}/insertions_sorted.vcf.gz.tbi
shell: tabix -p vcf {input}

Merge all VCF files

input: {PATH_TO_YOUR_PROJECT}/*/insertions_sorted.vcf.gz
output: {PATH_TO_YOUR_PROJECT}/insertions_all.vcf.gz
shell: vcf-merge {input} | bgzip -c > {output}

The records (insertions) of the final insertions_all.vcf.gz file contain the genotypes for each of the samples. After successful steps 1 and 2 you can safely delete the original {PATH_TO_YOUR_PROJECT}/{SAMPLE}/insertions.vcf to safe some disc space.

Back to main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post processing

Post-processing

Constructing a multi-VCF file of your entire population

Uh oh!

Uh oh!

Clone this wiki locally