Simple workflow to reannotate Kids First VEPs to be compatible with the RADIANT framework. However, this can also be used to annotate/reannotate any other VCF as needed to ensure the following requirements are met:
- Old annotations are stripped
- VCF is normalized
- VCF is annotated with VEP
- Exomiser is run
bcftools strip: Usesannotateto remove existing VEP (CSQ) and gnomad 3.1.1 (gnomad_3_1_1*) annotationsbcftools norm: Usesnormto split (or join) mulit-allelics and normalize indelsVEP: Uses cache 111 and meets the following specific criteria:--canonical --format vcf --hgvs --hgvsg --no_stats --numbers --offline --fields Allele,Consequence,IMPACT,SYMBOL,Feature_type,Gene,PICK,Feature,EXON,BIOTYPE,INTRON,HGVSc,HGVSp,STRAND,CDS_position,cDNA_position,Protein_position,Amino_acids,Codons,VARIANT_CLASS,HGVSg,CANONICAL,RefSeq,MANE,MANE_SELECT,MANE_PLUS --flag_pick --mane --mane_select --pick_order rank,biotype,mane_select,mane_plus_clinical,canonical,appris,tsl,ccds,length,ensembl,refseq --symbol --variant_class --vcf --xref_refseq --compress_output bgzip --fasta Homo_sapiens_assembly38.fasta --assembly GRCh38 --species homo_sapiens --cache --cache_version 111exomiser: "The Exomiser is a Java program that finds potential disease-causing variants from whole-exome or whole-genome sequencing data." See https://github.com/exomiser/Exomiser for more details
What is "required" or not depends upon the input. The Required inputs are based on Kids First Processed VCFs. You may use the disable_(insert_step) flags to skip any step not needed based on the input.
vcf: KF VCF to processvcf_index= Index ofvcfoutput_basename: String prefix of file outputs. Technically optional, but strongly recommended. Default isTEST
vep_cache: VEP cache for annotation. Recommend 111,homo_sapiens_vep_111_GRCh38.tar.gzfasta: FASTA used during variant calling. RecommendHomo_sapiens_assembly38.fasta
analysis_file: YAML with analysis options. See https://exomiser.readthedocs.io/en/latest/advanced_analysis.html#analysis. Recommenddefault_exomiser_WGS_analysis.ymldatadir_file: TAR GZ with referenece files. Example contents:data/ data/remm/ data/remm/ReMM.v0.4.hg38.tsv.gz.tbi data/remm/ReMM.v0.4.hg38.tsv.gz data/2406_hg38.sha256 data/2406_hg38/ data/2406_hg38/2406_hg38_transcripts_refseq.ser data/2406_hg38/2406_hg38_transcripts_ensembl.ser data/2406_hg38/2406_hg38_genome.mv.db data/2406_hg38/2406_hg38_variants.mv.db data/2406_hg38/2406_hg38_transcripts_ucsc.ser data/2406_hg38/2406_hg38_clinvar.mv.db data/cadd/ data/cadd/1.7/ data/cadd/1.7/whole_genome_SNVs.tsv.gz data/cadd/1.7/whole_genome_SNVs.tsv.gz.tbi data/cadd/1.7/gnomad.genomes.r4.0.indel.tsv.gz data/cadd/1.7/gnomad.genomes.r4.0.indel.tsv.gz.tbi data/2406_phenotype/ data/2406_phenotype/2406_phenotype.mv.db data/2406_phenotype/rw_string_10.mv data/2406_phenotype/phenix/ data/2406_phenotype/phenix/out/ data/2406_phenotype/phenix/out/10.out data/2406_phenotype/phenix/out/9_symmetric.out data/2406_phenotype/phenix/out/14.out data/2406_phenotype/phenix/out/13_symmetric.out data/2406_phenotype/phenix/out/20.out data/2406_phenotype/phenix/out/19.out data/2406_phenotype/phenix/out/16_symmetric.out data/2406_phenotype/phenix/out/7.out data/2406_phenotype/phenix/out/3_symmetric.out data/2406_phenotype/phenix/out/8.out data/2406_phenotype/phenix/out/17_symmetric.out data/2406_phenotype/phenix/out/15_symmetric.out data/2406_phenotype/phenix/out/5_symmetric.out data/2406_phenotype/phenix/out/11.out data/2406_phenotype/phenix/out/14_symmetric.out data/2406_phenotype/phenix/out/8_symmetric.out data/2406_phenotype/phenix/out/10_symmetric.out data/2406_phenotype/phenix/out/1.out data/2406_phenotype/phenix/out/20_symmetric.out data/2406_phenotype/phenix/out/11_symmetric.out data/2406_phenotype/phenix/out/18_symmetric.out data/2406_phenotype/phenix/out/4.out data/2406_phenotype/phenix/out/19_symmetric.out data/2406_phenotype/phenix/out/2.out data/2406_phenotype/phenix/out/12.out data/2406_phenotype/phenix/out/15.out data/2406_phenotype/phenix/out/2_symmetric.out data/2406_phenotype/phenix/out/9.out data/2406_phenotype/phenix/out/17.out data/2406_phenotype/phenix/out/16.out data/2406_phenotype/phenix/out/6_symmetric.out data/2406_phenotype/phenix/out/13.out data/2406_phenotype/phenix/out/18.out data/2406_phenotype/phenix/out/3.out data/2406_phenotype/phenix/out/4_symmetric.out data/2406_phenotype/phenix/out/1_symmetric.out data/2406_phenotype/phenix/out/7_symmetric.out data/2406_phenotype/phenix/out/6.out data/2406_phenotype/phenix/out/5.out data/2406_phenotype/phenix/out/12_symmetric.out data/2406_phenotype/phenix/hp.obo data/2406_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt data/2406_phenotype/hp.obopheno_file: YAML file with sample information- Individual example:
--- id: FM_ZCF17CPC proband: subject: id: BS_4JPDCXVR sex: MALE phenotypicFeatures: - type: id: HP:0009733 label: Glioma pedigree: persons: - individualId: BS_4JPDCXVR sex: MALE affectedStatus: AFFECTED metaData: resources: - id: hp name: human phenotype ontology url: http://purl.obolibrary.org/obo/hp.owl version: hp/releases/2019-11-08 namespacePrefix: HP iriPrefix: 'http://purl.obolibrary.org/obo/HP_' phenopacketSchemaVersion: 2.0
- Family example:
--- id: FM_ASMHHE2N proband: subject: id: BS_KPAR4N94 sex: MALE phenotypicFeatures: - type: id: HP:0009733 pedigree: persons: - individualId: BS_KPAR4N94 sex: MALE paternalId: BS_264QQD38 maternalId: BS_EQ98CJJT affectedStatus: AFFECTED - individualId: BS_264QQD38 sex: MALE affectedStatus: UNAFFECTED - individualId: BS_EQ98CJJT sex: FEMALE affectedStatus: UNAFFECTED metaData: resources: - id: hp name: human phenotype ontology url: http://purl.obolibrary.org/obo/hp.owl version: hp/releases/2019-11-08 namespacePrefix: HP iriPrefix: 'http://purl.obolibrary.org/obo/HP_' phenopacketSchemaVersion: 2.0
- Individual example:
rm_fields_csv: List of existing annotation to remove. Default:INFO/CSQ,INFO/gnomad_3_1_1_AC,INFO/gnomad_3_1_1_AN,INFO/gnomad_3_1_1_AF,INFO/gnomad_3_1_1_nhomalt,INFO/gnomad_3_1_1_AC_popmax,INFO/gnomad_3_1_1_AN_popmax,INFO/gnomad_3_1_1_AF_popmax,INFO/gnomad_3_1_1_nhomalt_popmax,INFO/gnomad_3_1_1_AC_controls_and_biobanks,INFO/gnomad_3_1_1_AN_controls_and_biobanks,INFO/gnomad_3_1_1_AF_controls_and_biobanks,INFO/gnomad_3_1_1_AF_non_cancer,INFO/gnomad_3_1_1_primate_ai_score,INFO/gnomad_3_1_1_splice_ai_consequence,INFO/gnomad_3_1_1_FILTER,INFO/gnomad_3_1_1_AF_non_cancer_afr,INFO/gnomad_3_1_1_AF_non_cancer_ami,INFO/gnomad_3_1_1_AF_non_cancer_asj,INFO/gnomad_3_1_1_AF_non_cancer_eas,INFO/gnomad_3_1_1_AF_non_cancer_fin,INFO/gnomad_3_1_1_AF_non_cancer_mid,INFO/gnomad_3_1_1_AF_non_cancer_nfe,INFO/gnomad_3_1_1_AF_non_cancer_oth,INFO/gnomad_3_1_1_AF_non_cancer_raw,INFO/gnomad_3_1_1_AF_non_cancer_sas,INFO/gnomad_3_1_1_AF_non_cancer_amr,INFO/gnomad_3_1_1_AF_non_cancer_popmax,INFO/gnomad_3_1_1_AF_non_cancer_all_popmax
assembly:GRCh38vep_cache_version:111vep_species:homo_sapiensvep_cpus:32
datadir_name:datacadd_indelname:gnomad.genomes.r4.0.indel.tsv.gzcadd_snvname:whole_genome_SNVs.tsv.gzcadd_version:1.7exomiser_version:2406exomiser_genome:hg38exomiser_mem:16remm_filename:ReMM.v0.4.hg38.tsv.gzremm_version:v0.4
annotate_vcf: VCF to use to add annotations using bcftoolsannotate_vcf_index: Index ofannotate_vcfannot_fields_csv: CSV list of fields to use fromannotate_vcfand/or list of fields to renamebcftools_strip_extra_args: Extra args to add to the step. Consult tool manual if needed
check_ref: For BCFTOOLS norm. Check REF alleles and exit (e), warn (w), exclude (x), or set (s) bad sites. Default iswmultiallelics: Split multiallelics (-) or join biallelics (+), type: snps|indels|both|any. Default is-any
vep_buffer_size:100000
local_frequency: custom frequency source filelocal_frequency_index: Index oflocal_frequency