Skip to content

MedGenOL/happy-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

happy-cli

A lightweight CLI wrapper for running hap.py in Docker. Built for lab teams that need to run benchmarking periodically when sequencing technology or protocols change.

hap.py (Haplotype Comparison Tool) was created by Peter Krusche at Illumina.

Prerequisites

  • Python >= 3.9
  • Docker installed and running
  • The pkrusche/hap.py Docker image pulled (docker pull pkrusche/hap.py)

Installation

git clone https://github.com/MedGenOL/happy-cli.git
cd happy-cli
pip3 install -e .

If pip3 is not found, install it first with python3 -m ensurepip --user or your system's package manager.

Usage

Whole Genome (WGS)

happy \
  data/PlatinumGenomesIllumina/vcf/NA12877.vcf.gz \
  /path/to/your_pipeline_output.vcf.gz \
  -r /path/to/hg38/genome.fa \
  -f data/ConfidentRegions/ConfidentRegions.bed \
  -o /path/to/output/NA12877_vs_pipeline

Exome (WES)

For exome benchmarking, add -T with your capture kit target regions BED. Use --engine vcfeval and --pass-only for best results:

happy \
  data/PlatinumGenomesIllumina/vcf/NA12877.vcf.gz \
  /path/to/your_exome_output.vcf.gz \
  -r /path/to/hg38/genome.fa \
  -f data/ConfidentRegions/ConfidentRegions.bed \
  -T /path/to/exome_capture_targets.bed \
  -o /path/to/output/NA12877_vs_pipeline \
  --engine vcfeval \
  --pass-only

The -f flag defines where the truth set is reliable (confident regions). The -T flag restricts analysis to your exome capture footprint. hap.py intersects them internally — no need to pre-intersect with bedtools.

Background mode

Add -bg to run in the background. Output is logged to happy_YYYYMMDD_HHMMSS.log in the current directory:

happy ... -bg

All paths are normal host paths — the tool handles Docker volume mounting automatically. Use --dry-run to preview the Docker command without executing it.

Run happy --help for all options.

Data

The data/ directory includes curated Platinum Genomes truth sets and high-confidence regions. Large files are gitignored and must be downloaded — see data/README.md for instructions.

Output

hap.py produces the following files (using the output prefix as base name):

File Contents
.summary.csv Precision, recall, and F1 score
.extended.csv Extended statistics
.metrics.json.gz Detailed metrics
.runinfo.json Run metadata and parameters
.vcf.gz / .vcf.gz.tbi Annotated comparison VCF with index
.roc.all.csv.gz ROC data for all variants
.roc.Locations.INDEL.csv.gz ROC data for INDELs
.roc.Locations.INDEL.PASS.csv.gz ROC data for INDELs (PASS only)
.roc.Locations.SNP.csv.gz ROC data for SNPs
.roc.Locations.SNP.PASS.csv.gz ROC data for SNPs (PASS only)

Documentation

See docs/Guide_to_run_benchmarking.md for a step-by-step walkthrough.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors