This repository contains all scripts and data tables used for the analysis of viruses in total metagenomes generated from upland peatland soils in the UK, as presented in Ecosystem Health Shapes Viral Ecology in Peatland Soils (Kosmopoulos et al., 2025).
All raw sequencing data are publicly available in the NCBI Short Read Archive under BioProject accession PRJNA1203648. Whole assembled metagenomic contigs as well as high-quality (CheckM) prokaryotic metagenome-assembled genomes (MAGs) are available at the NCBI WGS using the same BioProject accession. All viral metagenome-assembled genomes (vMAGs) and prokaryotic metagenome-assembled genomes (medium and high quality) are publicly available on figshare under the DOI 10.6084/m9.figshare.28143446.
Soil samples, their associated environmental parameters, and metadata were originally obtained and sequenced by Pallier et al. (2025).
Scripts contains python and shell scripts used to conduct the analyses and run the tools mentioned in the bioinformatic workflow.
Data contains .RDS formatted data that can be loaded into R for analysis. These data include sample metadata as well as virus and host community summary statistics, virus and host genome information, virus and host genome counts, virus-host predictions, gene annotations, read mapping statistics, functional predictions, and more. It also includes the .shp shapefiles used to generate the map in Figure 1A.
Tables include .csv and .tsv formatted data tables used in analyses to generate the data files present in Data. It also contains organized and summarized data that went into the Supplementary Tables.
Notebooks contain R markdown notebooks of all the R code used for the analyses described here. Pre-formatted expected outputs are displayed. A table of contents for these notebooks is given below.
Plots contain saved plots (PNG formatted) that were generated by the notebooks above, organized into sub-directories by notebook.
All required software packages, their versions, and dependencies are noted in each notebook listed in the table of contents below. For the analyses in the notebook Bioinformatic Workflow, software can be installed using Bioconda. The remaining analyses are performed in R (v4.4.0), and the required packages can be installed from the Bioconductor and CRAN repositories. Installing all required packages takes approximately 1–2 hours, and significantly less when using mamba for Bioconda packages.
To reproduce the analyses in the manuscript, code cells can be copied and pasted into a terminal running a Linux operating system (for commands listed in Bioinformatic Workflow) or .Rmd files can be opened and run directly in RStudio, as the required data and tables are included in this repository.
On a "standard" desktop computer, the R-based analyses should take no more than one hour to run in total. Many commands and software used in Bioinformatic Workflow require access to a server with more than 250 GB of RAM and more than 2 TB of disk space.
The total runtime to fully reproduce the analyses (from metagenome assembly, binning, virus identification, and annotations to clustering and read mapping) will depend heavily on the number of available CPU cores, but will generally take at least 3 full days for assembly alone. We recommend obtaining the pre-assembled metagenomes, MAGs, and vMAGs for analysis using the NCBI BioProject and figshare repository listed above.
- Bioinformatic workflow
- Gather and organize data
- Soil Environmental Parameters (Extended Data Figure 1)
- Peatland Map and Virus/Host Diversity (Figure 1, Extended Data Figure 2)
- Virus Distribution (Figure 2)
- Virus Genome DESeq and Clustering (Extended Data Figure 3)
- Host Genome DESeq and Clustering (Extended Data Figure 4)
- Virus and Host Relative Abundance (Figure 3)
- Virus Genes (Figure 4)
- Virus-Host Abundance Relationships and Lysogenic Viruses (Figure 5, Extended Data Figure 5, Extended Data Figure 6)
James C. Kosmopoulos | kosmopoulos [at] wisc [dot] edu
PhD candidate | Microbiology Doctoral Training Program
Anantharaman lab | anantharamanlab.wisc.edu
Department of Bacteriology | University of Wisconsin - Madison