-
Notifications
You must be signed in to change notification settings - Fork 1
OptionsAndArguments
--vcfs VCFS List of vcfs file delimited by DELIM character;
default DELIM is "|" (piped character) ; if needed re-
assign DELIM using --delim option; The PATH to the VCFs can be relative or absolute.
--toolnames TOOLNAMES
List of vcfs file delimited by DELIM character;
default DELIM is pipe unless --delim option is used
using --delim option
-g REFGENOME, --refgenome REFGENOME
reference genome .fa and its .fai index are used with bcftools norm; must match
reference used for alignment.
--dict
dictionary file of the reference genome fasta file generated by the tool 'picard'.
This file has usually the extension .dict.
This is used to sort properly the contigs in both the header and the data section of the vcf file.
--prep-outfilenames PREP_OUTFILENAMES
delim-separated names to the tool-specific prepared
vcf files; Important: This option is Mandatory ONLY and ONLY if the inputs vcfs are "RAW" vcfs.
The PATH to the VCFs can be relative or absolute. If only the name is given, the vcf will be written in
either the current working directory or in the directory defined using the option -d.
See Example_1 and see QuickStart section.
-o MERGED_VCF_OUTFILENAME, --merged-vcf-outfilename MERGED_VCF_OUTFILENAME
outfilename for the merge vcf (path can be relative or absolute; if name only, merged file will be written in working directory)
--tumor-sname TUMOR_SNAME
expected name of tumor sample in vcf file (already present in VCF; used to make sure of the right column)
--normal-sname NORMAL_SNAME
expected name of normal sample in vcf file (already present in VCF; used to make sure of the right column)
--bams BAMS List of Full Path bams necessary for capturing contigs if not
present in input vcf; otherwise put empty_string as
value for each tool [maybe deprecated in future as ALL the vcfs MUST contain the exact same contigs in their headers (if not check you input vcfs and update them)]
--germline
consider the inputs VCFs as GERMLINE only VCFs (and not somatic)
This implies using the right toolnames associated to germline calls. as of 2019, we implemented
preprocessing for three (3) germline variants callers: HaplotypeCaller, Freebayes and and Deepvariant.
(see exact name using --list-tools ) ; WARNING: This Feature is still under high development and must be use at your own risks.
By default, vcfMerger2 considers inputs vcfs as somatic vcfs. if you merge germline vcfs, you must enable this option.
--germline-snames
ordered list of sample name(s) that should be present in the VCF beyond column 9.
As of 2019-04-04, ONLY one sample per VCF is taken into account; So this option should be a list of ONE name only
-c PRECEDENCE, --precedence PRECEDENCE
sorted delim-separated list of the toolnames as listed
in --toolnames; This list stipulates an order of precedence for the tools different from the
default order given by the --toolnames list
--contigs-file-for-vcf-header CONTIGS_FILE_FOR_VCF_HEADER
List of contigs necessary for capturing adding them to
tool vcf header if needed; otherwise put empty_string
as value for each tool ;do not provide if bam file is
given instead [maybe deprecated in future as ALL the vcfs MUST contain the exact same contigs in their headers (if not check you input vcfs and update them)]
-a TOOLACRONYMS, --toolacronyms TOOLACRONYMS
List of Acronyms for toolnames to be used as PREFIXES
in INFO field ; same DELIM as --vcfs
--delim DELIM delimiter which will be used to create the arguments
value for the vcfMerger2.0 tool ; default is "|"
(a.k.a pipe character). DO NOT CHANGE UNLESS
--lossy This will create a lossy merged vcf by only keeping
the infromation from the tool with precedence
--skip-prep-vcfs skip the step for preparing vcfs up to specs and only
run the merge step [i.e. which is is purpose of that tool :-) ]; implies all << given >> vcfs are
already up-to-specs (uts);
--skip-merge enabling this flag prevents doing the merging step
[useful if only the prep step needs to be run but not the merging stage ]
--threshold-AR Threshold Allele Ratio (AR) to assign genotype GT value with 0/1 or 1/1 ;
GT=0/1 if below threshold, GT=1/1 if equal or above threshold [ default is 0.90 ; range ]0,1] ]
-n, --dry-run perform a dry run to just see the commands lines that
will be run behind the scenes
--filter-by-pass filter the variants by PASS. This implies that the keyword PASS must be present in column 7 of the VCF, and not a dot.
This filtering step is specifically performed before preparing the vcfs to specs steps ; if you want to filter PASS after prep_step and before merging, use --filter FILTER;
--filter FILTER filter variants using snpSift in the backend; This filtering process is performed ONLY on up-to-specs vcf (so after the stage prep-vcf [if run] );
A string must be provided as if you were using snpSift (see snpSift user manual); For each tool, a string must be provided; If you have 3 tools, three string must be provided;
Hint: if you want to apply the exact same filtering to all tools because they do possess the same exact Flags, you may provide only one string.
--path-jar-snpsift FULL PATH to the SnpSift JAR file MUST be provided if any of the filter option is used.
--do-venn Will make a Venn diagram using the vcf files provided to the merging step. This is a simple vennDiagram; if list <=4, venn ; if list>=5, upset plot selected [hardcoded].
--venn-title Add a custom title to the Venn; Works only with "venn" module and not "upset" [this is intervene issue, not vcfMerger2]; Default is empty string.
-d DIR_OUT, --dir-out DIR_OUT, --dir-temp DIR_OUT directory where the temporary files and/or outputs of vcfMerger2 will be written
--delete-temps if set, temporary files created during the "prep_vcf" stage will be deleted
--beds If the user already have the data in bed format to make the Venn Diagram, the user can provide these bed files here; The number of bed files MUST be the same as the number of toolnames and VCFs files.
If bed files are not provided and --do-venn is enabled, vcfMerger2 will try to create the bed files are created on the fly from the vcf files;
If --beds is used, --do-venn must also be used otherwise the venn won't be created.
NOTE_1: What does PRECEDENCE mean?
A vcf contains information in INFO and FORMAT columns. Unfortunately, redundant information exist from one tools to another in a vcf.
For instance, the AR field may exist in ALL the given vcf in the FORMAT columns, but the values may vary from one tool to another. Unfortunately, only one value can be kept in the AR field within the merged vcf.
So which one would the user preferably keep in the merged vcf? What tool does have your "liking" the most. This is where the precedence is used. It gives an order of preference for the tool when the variant is called
by more than one tool. This Precedence is subjective to the user.
NOTE_2: Ordering the Information in the piped-delimited arguments and/or given values
The Order of information within the options --vcfs, --toolnames, --precedence, --toolacronyms, --prep-outfilenames, --beds
MUST be linked to each other; vcfMerger2 uses a 1-to-1 mapping between the piped-delimited information; The order must therefore be kept or vcfMErger2 may failed or output wrong information
for instance, --toolnames "tool1|tool2|tool3" --vcfs "vcf1|vcf2|vcf3" --toolacronyms "a1|a2|a3" means tool1 will be match with vcf1 and match with a1 ; same for tool2 <-> vcf2 <-> a2, etc.
You MUST NOT swap the information otherwise the link between the data is disrupted; e.g., if --vcfs is "vcf2|vcf3|vcf1", it will lead to errors or incorrect merging, as tool1 will be mapped to vcf2 and not vcf1.