Skip to content

OptionsAndArguments

Chris edited this page Aug 12, 2019 · 12 revisions

Options_and_Arguments


Options to vcfMerger2.py

required arguments:

--vcfs VCFS List of vcfs file delimited by DELIM character; default DELIM is "|" (piped character) ; if needed re- assign DELIM using --delim option; The PATH to the VCFs can be relative or absolute.

--toolnames TOOLNAMES List of vcfs file delimited by DELIM character; default DELIM is pipe unless --delim option is used using --delim option

-g REFGENOME, --refgenome REFGENOME reference genome .fa and its .fai index are used with bcftools norm; must match reference used for alignment.

--dict dictionary file of the reference genome fasta file generated by the tool 'picard'. This file has usually the extension .dict. This is used to sort properly the contigs in both the header and the data section of the vcf file.

--prep-outfilenames PREP_OUTFILENAMES delim-separated names to the tool-specific prepared vcf files; Important: This option is Mandatory ONLY and ONLY if the inputs vcfs are "RAW" vcfs. The PATH to the VCFs can be relative or absolute. If only the name is given, the vcf will be written in either the current working directory or in the directory defined using the option -d. See Example_1 and see QuickStart section.

-o MERGED_VCF_OUTFILENAME, --merged-vcf-outfilename MERGED_VCF_OUTFILENAME outfilename for the merge vcf (path can be relative or absolute; if name only, merged file will be written in working directory)

additional required arguments if merging SOMATIC VCFs:

--tumor-sname TUMOR_SNAME expected name of tumor sample in vcf file (already present in VCF; used to make sure of the right column)

--normal-sname NORMAL_SNAME expected name of normal sample in vcf file (already present in VCF; used to make sure of the right column)

--bams BAMS List of Full Path bams necessary for capturing contigs if not present in input vcf; otherwise put empty_string as value for each tool [maybe deprecated in future as ALL the vcfs MUST contain the exact same contigs in their headers (if not check you input vcfs and update them)]

optional arguments:

--germline consider the inputs VCFs as GERMLINE only VCFs (and not somatic) This implies using the right toolnames associated to germline calls. as of 2019, we implemented preprocessing for three (3) germline variants callers: HaplotypeCaller, Freebayes and and Deepvariant. (see exact name using --list-tools ) ; WARNING: This Feature is still under high development and must be use at your own risks. By default, vcfMerger2 considers inputs vcfs as somatic vcfs. if you merge germline vcfs, you must enable this option.

--germline-snames ordered list of sample name(s) that should be present in the VCF beyond column 9. As of 2019-04-04, ONLY one sample per VCF is taken into account; So this option should be a list of ONE name only

-c PRECEDENCE, --precedence PRECEDENCE sorted delim-separated list of the toolnames as listed in --toolnames; This list stipulates an order of precedence for the tools different from the default order given by the --toolnames list

--contigs-file-for-vcf-header CONTIGS_FILE_FOR_VCF_HEADER List of contigs necessary for capturing adding them to tool vcf header if needed; otherwise put empty_string as value for each tool ;do not provide if bam file is given instead [maybe deprecated in future as ALL the vcfs MUST contain the exact same contigs in their headers (if not check you input vcfs and update them)]

-a TOOLACRONYMS, --toolacronyms TOOLACRONYMS List of Acronyms for toolnames to be used as PREFIXES in INFO field ; same DELIM as --vcfs

--delim DELIM delimiter which will be used to create the arguments value for the vcfMerger2.0 tool ; default is "|" (a.k.a pipe character). DO NOT CHANGE UNLESS

--lossy This will create a lossy merged vcf by only keeping the infromation from the tool with precedence

--skip-prep-vcfs skip the step for preparing vcfs up to specs and only run the merge step [i.e. which is is purpose of that tool :-) ]; implies all << given >> vcfs are already up-to-specs (uts);

--skip-merge enabling this flag prevents doing the merging step [useful if only the prep step needs to be run but not the merging stage ]

--threshold-AR Threshold Allele Ratio (AR) to assign genotype GT value with 0/1 or 1/1 ; GT=0/1 if below threshold, GT=1/1 if equal or above threshold [ default is 0.90 ; range ]0,1] ]

-n, --dry-run perform a dry run to just see the commands lines that will be run behind the scenes

--filter-by-pass filter the variants by PASS. This implies that the keyword PASS must be present in column 7 of the VCF, and not a dot. This filtering step is specifically performed before preparing the vcfs to specs steps ; if you want to filter PASS after prep_step and before merging, use --filter FILTER;

--filter FILTER filter variants using snpSift in the backend; This filtering process is performed ONLY on up-to-specs vcf (so after the stage prep-vcf [if run] ); A string must be provided as if you were using snpSift (see snpSift user manual); For each tool, a string must be provided; If you have 3 tools, three string must be provided; Hint: if you want to apply the exact same filtering to all tools because they do possess the same exact Flags, you may provide only one string.

--path-jar-snpsift FULL PATH to the SnpSift JAR file MUST be provided if any of the filter option is used.

--do-venn Will make a Venn diagram using the vcf files provided to the merging step. This is a simple vennDiagram; if list <=4, venn ; if list>=5, upset plot selected [hardcoded].

--venn-title Add a custom title to the Venn; Works only with "venn" module and not "upset" [this is intervene issue, not vcfMerger2]; Default is empty string.

-d DIR_OUT, --dir-out DIR_OUT, --dir-temp DIR_OUT directory where the temporary files and/or outputs of vcfMerger2 will be written

--delete-temps if set, temporary files created during the "prep_vcf" stage will be deleted

--beds If the user already have the data in bed format to make the Venn Diagram, the user can provide these bed files here; The number of bed files MUST be the same as the number of toolnames and VCFs files. If bed files are not provided and --do-venn is enabled, vcfMerger2 will try to create the bed files are created on the fly from the vcf files; If --beds is used, --do-venn must also be used otherwise the venn won't be created.


NOTE_1: What does PRECEDENCE mean?
A vcf contains information in INFO and FORMAT columns. Unfortunately, redundant information exist from one tools to another in a vcf. For instance, the AR field may exist in ALL the given vcf in the FORMAT columns, but the values may vary from one tool to another. Unfortunately, only one value can be kept in the AR field within the merged vcf. So which one would the user preferably keep in the merged vcf? What tool does have your "liking" the most. This is where the precedence is used. It gives an order of preference for the tool when the variant is called by more than one tool. This Precedence is subjective to the user.

NOTE_2: Ordering the Information in the piped-delimited arguments and/or given values
The Order of information within the options --vcfs, --toolnames, --precedence, --toolacronyms, --prep-outfilenames, --beds MUST be linked to each other; vcfMerger2 uses a 1-to-1 mapping between the piped-delimited information; The order must therefore be kept or vcfMErger2 may failed or output wrong information for instance, --toolnames "tool1|tool2|tool3" --vcfs "vcf1|vcf2|vcf3" --toolacronyms "a1|a2|a3" means tool1 will be match with vcf1 and match with a1 ; same for tool2 <-> vcf2 <-> a2, etc.
You MUST NOT swap the information otherwise the link between the data is disrupted; e.g., if --vcfs is "vcf2|vcf3|vcf1", it will lead to errors or incorrect merging, as tool1 will be mapped to vcf2 and not vcf1.


top

Clone this wiki locally