Skip to content

Latest commit

 

History

History
133 lines (96 loc) · 6.19 KB

File metadata and controls

133 lines (96 loc) · 6.19 KB

Command-line interface of TRGT

Commands:

  • genotype
  • plot
  • merge
  • validate
  • deepdive

Options:

  • -v, --verbose Specify multiple times to increase verbosity level (e.g., -vv for more verbosity)
  • --color <COLOR> Enable or disable color output in logging (always, auto, never); supports NO_COLOR/FORCE_COLOR environment variables [default: auto]
  • -h, --help Print help
  • -V, --version Print version

Most file input arguments accept remote URIs (http://, https://, s3://, gs://). See remote file support for details and authentication tips.

Environmental variables

  • TRGT_ENABLE_HTSLIB_LOGGING Set this environment variable to re-enable HTSlib error messages, which are disabled by default in TRGT. This can be helpful for debugging unknown errors that might be related to HTSlib.
  • NO_COLOR Set this environment variable to disable color output.
  • FORCE_COLOR Set this environment variable to force color output.

For cloud storage and remote file access, additional environment variables are available. See remote file support for more details.

Genotype command-line

Options:

  • -g, --genome <GENOME> Path to the FASTA file containing reference genome. This must be the same reference genome as the one used for read alignment
  • -r, --reads <READS> BAM file with alignments of HiFi reads
  • -b, --repeats <REPEATS> BED file with reference coordinates and structure of tandem repeats
  • -o, --output-prefix <OUTPUT_PREFIX> Prefix for output files. TRGT generates an unsorted VCF file (<OUTPUT_PREFIX>.vcf.gz) and unsorted BAM file with pieces of HiFi reads overlapping the repeats (<OUTPUT_PREFIX>.spanning.bam)
  • -k, --karyotype <KARYOTYPE> Sample karyotype (XX, XY, or a file path) [default: XX]
  • -t, --threads <THREADS> Number of threads [default: 1]
  • --preset <PRESET> Parameter preset (wgs or targeted) [default: wgs]

Advanced:

  • --sample-name <SAMPLE_NAME> Sample name. If not provided, the sample name is extracted from the input BAM or file stem
  • --genotyper <GENOTYPER> Genotyping algorithm (size or cluster) [default: size; targeted preset default: cluster]
  • --flank-len <FLANK_LEN> Minimum length of the flanking sequence [default: 250]
  • --output-flank-len <FLANK_LEN> Length of flanking sequence to report on output [default: 50]
  • --disable-bam-output Disable BAM output
  • --max-depth <MAX_DEPTH> Maximum locus depth [default: 250; targeted preset default: 10000]
  • --fetcher-threads <THREADS> Number of threads for querying input BAM files [default: half the number of analysis threads, max 8]

Preset-dependent defaults:

  • --preset targeted changes several defaults:
    • --genotyper: default becomes cluster (was size)
    • --flank-len: default becomes 200 (was 250)
    • --max-depth: default becomes 10000 (was 250)
    • Advanced (hidden):
      • --aln-scoring: default becomes 1,0,1 (was 2,5,1)
      • --min-flank-id-frac: default becomes 0.8 (was 0.7)
      • --min-read-quality: default becomes -1.0 (was 0.98)

Plot command-line

Options:

  • -g, --genome <GENOME> Path to the FASTA file containing reference genome
  • -b, --repeats <REPEATS> BED file with repeat coordinates and structure
  • -f, --vcf <VCF> VCF file generated by TRGT
  • -r, --spanning-reads <SPANNING_READS> BAM file with spanning reads generated by TRGT
  • -i, --repeat-id <REPEAT_ID> ID of the repeat to visualize
  • -o, --image <OUTPUT_PATH> Output image path that ends with extension .pdf, .svg, or .png

Plotting:

  • --plot-type <PLOT_TYPE> Two types of plots can be generated: allele plots and waterfall plots. Allele plots show alignments of reads to each repeat allele. Waterfall plots display unaligned repeat sequences without aligning them to the (consensus) allele. Waterfall plots are especially useful for QC of repeat calls and for visualization of mosaic expansions [default: allele]
  • --squished Horizontally compress the plot; useful for visualization of high-coverage targeted data
  • --show <SHOW> Either motifs (motifs) or methylation (meth) is visualized [default: motifs]
  • --font-family <FONT> Font family to use for text elements [default: Roboto Mono]

Advanced:

  • --flank-len <FLANK_LEN> Length of flanking regions [default: 50]
  • --max-allele-reads <MAX_READS> Max number of reads per allele to plot

Merge command-line

The merge subcommand currently requires local files.

Options:

  • --vcf <VCF> VCF files to merge
  • --vcf-list <VCF_LIST> File containing paths of VCF files to merge (one per line)
  • -g, --genome <FASTA> Path to the FASTA file containing reference genome
  • -o, --output <FILE> Write output to a file [default: standard output]

Advanced:

  • -O, --output-type <OUTPUT_TYPE> Output type: u|b|v|z, u/b: un/compressed BCF, v/z: un/compressed VCF
  • --skip-n <SKIP_N> Skip the first N records
  • --process-n <PROCESS_N> Only process N records
  • --print-header Print only the merged header and exit
  • --force-single Run even if there is only one file on input
  • --no-version Do not append version and command line to the header
  • --quit-on-errors Quit immediately on errors during merging
  • --contig <CONTIG> Process only the specified contigs (comma-separated list)
  • -t, --threads <THREADS> Number of threads for (de)compressing input/output VCF files (shared across readers and writer) [default: 2]
  • --no-index Stream VCFs without loading their indexes (contig order must match across inputs)
  • -W, --write-index Write index for the output compressed VCF/BCF file

Validate command-line

Options:

  • -g, --genome <FASTA> Path to the FASTA file containing reference genome
  • -b, --repeats <REPEATS> BED file with repeat coordinates and structure

Advanced:

  • --flank-len <FLANK_LEN> Length of flanking regions [default: 50]

Deepdive command-line

Options:

  • -g, --genome <FASTA> Path to reference genome FASTA
  • -f, --vcf <VCF> VCF file generated by trgt genotype
  • -b, --repeats <REPEATS> BED file with repeat coordinates
  • -r, --spanning-reads <SPANNING_READS> BAM file with spanning reads generated by trgt genotype
  • -i, --repeat-id <REPEAT_ID> ID of the repeat to realign
  • -o, --output-prefix <OUTPUT_PREFIX> Prefix for output files (.fasta, .bed and .bam)