A lightweight CLI wrapper for running hap.py in Docker. Built for lab teams that need to run benchmarking periodically when sequencing technology or protocols change.
hap.py (Haplotype Comparison Tool) was created by Peter Krusche at Illumina.
- Python >= 3.9
- Docker installed and running
- The
pkrusche/hap.pyDocker image pulled (docker pull pkrusche/hap.py)
git clone https://github.com/MedGenOL/happy-cli.git
cd happy-cli
pip3 install -e .If pip3 is not found, install it first with python3 -m ensurepip --user
or your system's package manager.
happy \
data/PlatinumGenomesIllumina/vcf/NA12877.vcf.gz \
/path/to/your_pipeline_output.vcf.gz \
-r /path/to/hg38/genome.fa \
-f data/ConfidentRegions/ConfidentRegions.bed \
-o /path/to/output/NA12877_vs_pipelineFor exome benchmarking, add -T with your capture kit target regions BED.
Use --engine vcfeval and --pass-only for best results:
happy \
data/PlatinumGenomesIllumina/vcf/NA12877.vcf.gz \
/path/to/your_exome_output.vcf.gz \
-r /path/to/hg38/genome.fa \
-f data/ConfidentRegions/ConfidentRegions.bed \
-T /path/to/exome_capture_targets.bed \
-o /path/to/output/NA12877_vs_pipeline \
--engine vcfeval \
--pass-onlyThe -f flag defines where the truth set is reliable (confident regions).
The -T flag restricts analysis to your exome capture footprint.
hap.py intersects them internally — no need to pre-intersect with bedtools.
Add -bg to run in the background. Output is logged to
happy_YYYYMMDD_HHMMSS.log in the current directory:
happy ... -bgAll paths are normal host paths — the tool handles Docker volume
mounting automatically. Use --dry-run to preview the Docker command
without executing it.
Run happy --help for all options.
The data/ directory includes curated Platinum Genomes truth sets and
high-confidence regions. Large files are gitignored and must be
downloaded — see data/README.md for instructions.
hap.py produces the following files (using the output prefix as base name):
| File | Contents |
|---|---|
.summary.csv |
Precision, recall, and F1 score |
.extended.csv |
Extended statistics |
.metrics.json.gz |
Detailed metrics |
.runinfo.json |
Run metadata and parameters |
.vcf.gz / .vcf.gz.tbi |
Annotated comparison VCF with index |
.roc.all.csv.gz |
ROC data for all variants |
.roc.Locations.INDEL.csv.gz |
ROC data for INDELs |
.roc.Locations.INDEL.PASS.csv.gz |
ROC data for INDELs (PASS only) |
.roc.Locations.SNP.csv.gz |
ROC data for SNPs |
.roc.Locations.SNP.PASS.csv.gz |
ROC data for SNPs (PASS only) |
See docs/Guide_to_run_benchmarking.md
for a step-by-step walkthrough.