happy-cli

A lightweight CLI wrapper for running hap.py in Docker. Built for lab teams that need to run benchmarking periodically when sequencing technology or protocols change.

hap.py (Haplotype Comparison Tool) was created by Peter Krusche at Illumina.

Prerequisites

Python >= 3.9
Docker installed and running
The pkrusche/hap.py Docker image pulled (docker pull pkrusche/hap.py)

Installation

git clone https://github.com/MedGenOL/happy-cli.git
cd happy-cli
pip3 install -e .

If pip3 is not found, install it first with python3 -m ensurepip --user or your system's package manager.

Usage

Whole Genome (WGS)

happy \
  data/PlatinumGenomesIllumina/vcf/NA12877.vcf.gz \
  /path/to/your_pipeline_output.vcf.gz \
  -r /path/to/hg38/genome.fa \
  -f data/ConfidentRegions/ConfidentRegions.bed \
  -o /path/to/output/NA12877_vs_pipeline

Exome (WES)

For exome benchmarking, add -T with your capture kit target regions BED. Use --engine vcfeval and --pass-only for best results:

happy \
  data/PlatinumGenomesIllumina/vcf/NA12877.vcf.gz \
  /path/to/your_exome_output.vcf.gz \
  -r /path/to/hg38/genome.fa \
  -f data/ConfidentRegions/ConfidentRegions.bed \
  -T /path/to/exome_capture_targets.bed \
  -o /path/to/output/NA12877_vs_pipeline \
  --engine vcfeval \
  --pass-only

The -f flag defines where the truth set is reliable (confident regions). The -T flag restricts analysis to your exome capture footprint. hap.py intersects them internally — no need to pre-intersect with bedtools.

Background mode

Add -bg to run in the background. Output is logged to happy_YYYYMMDD_HHMMSS.log in the current directory:

happy ... -bg

All paths are normal host paths — the tool handles Docker volume mounting automatically. Use --dry-run to preview the Docker command without executing it.

Run happy --help for all options.

Data

The data/ directory includes curated Platinum Genomes truth sets and high-confidence regions. Large files are gitignored and must be downloaded — see data/README.md for instructions.

Output

hap.py produces the following files (using the output prefix as base name):

File	Contents
`.summary.csv`	Precision, recall, and F1 score
`.extended.csv`	Extended statistics
`.metrics.json.gz`	Detailed metrics
`.runinfo.json`	Run metadata and parameters
`.vcf.gz` / `.vcf.gz.tbi`	Annotated comparison VCF with index
`.roc.all.csv.gz`	ROC data for all variants
`.roc.Locations.INDEL.csv.gz`	ROC data for INDELs
`.roc.Locations.INDEL.PASS.csv.gz`	ROC data for INDELs (PASS only)
`.roc.Locations.SNP.csv.gz`	ROC data for SNPs
`.roc.Locations.SNP.PASS.csv.gz`	ROC data for SNPs (PASS only)

Documentation

See docs/Guide_to_run_benchmarking.md for a step-by-step walkthrough.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
analysis		analysis
data		data
docs		docs
src/happy_cli		src/happy_cli
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

happy-cli

Prerequisites

Installation

Usage

Whole Genome (WGS)

Exome (WES)

Background mode

Data

Output

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

happy-cli

Prerequisites

Installation

Usage

Whole Genome (WGS)

Exome (WES)

Background mode

Data

Output

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages