MicroTrace is a lightweight R package for detecting SNP-based transmission clusters from pathogen genome distance matrices.
- 💻 View Source Code on GitHub: Explore the full repository
- 🧬 Live Report (GitHub Pages): View the interactive HTML output
- Reads SNP distance matrix in CSV format
- Suggests SNP threshold based on distance distribution
- Performs hierarchical clustering (UPGMA)
- Merges optional sample metadata (e.g., ward, date)
- Outputs cluster table, dendrogram, and SNP distance plots
- Generates publication-ready HTML reports (via R Markdown)
You need R (≥ 3.6) and the following R packages:
install.packages(c("ape", "ggplot2", "readr", "dplyr", "dendextend"))MicroTrace/
├── MicroTrace.R # Main analysis script
├── convert_snp_dists_to_microtrace.py # convert a SNP distance matrix
├── test_microtrace.R # Unit tests
├── MicroTrace_Report.Rmd # R Markdown HTML report
├── data/
│ ├── sim_snp_dist.csv # Example SNP distance matrix
│ ├── metadata.csv # Sample metadata
│ └── intra_cluster_stats.csv
├── docs/
│ ├── example_dendrogram.png
│ ├── snp_distance_histogram.png
│ └── snp_distance_density.png
├── paper/
│ ├── paper.md # JOSS manuscript
│ └── paper.bib # References
├── DESCRIPTION # R project metadata
├── LICENSE # MIT license
└── README.md
You need R (≥ 3.6) and the following R packages:
install.packages(c("ape", "ggplot2", "readr", "dplyr", "dendextend"))If you have a core genome alignment (e.g., from Snippy), you can compute SNP distances using snp-dists:
conda install -c bioconda snp-distssnp-dists core.full.aln > snp_dist.tsvUse the provided Python script:
python tools/convert_snp_dists_to_microtrace.py -i snp_dist.tsv -o data/sim_snp_dist.csvThis will produce a sim_snp_dist.csv file readable by MicroTrace.
- Run the full pipeline:
source("MicroTrace.R")- Generate the HTML report:
rmarkdown::render("MicroTrace_Report.Rmd")cluster_assignments.csv: sample-to-cluster tableintra_cluster_stats.csv: summary of SNP distances within clustersexample_dendrogram.png: visual tree with SNP threshold cutsnp_distance_histogram.png: histogram of all SNP pairwise distancessnp_distance_density.png: density plot of SNP distances
If you use this tool in your research, please cite:
Lai, K. (2025). MicroTrace: A lightweight R tool for SNP-based pathogen clustering in outbreak detection. arXiv: https://arxiv.org/abs/2507.08060 (submitted to Journal of Open Source Software). https://github.com/biosciences/MicroTrace
MIT License
Developed by Kaitao Lai, University of Sydney.