Snakemake pipelines to run NGS-based methylation analysis and benchmark.
The python script and .ipynb files to reproduce the benchmarking results, including evaluation and statistical analysis, are in the benchmark folder.
The scripts for data visualization are listed in benchmark/figures folder.
Following is a simplified schematic diagram involving the MethFlow and MethCali:

Here is the way to calculate the RMSE and SpearmanR:

Please read the benchmark/README.md for more details.
While the snakemake and python scripts are compatible with most operating systems, there are many bioinformatic tools evaluated here only work on x64 Linux. Please use a x64 Linux server / container with at least 128GB RAM and 8 CPU cores to run the pipelines.
Clone this repository with all submodules:
git clone https://github.com/YuanfengZhang/MethFlow --recurse-submodules
cd MethFlowBuild the Docker image from the provided Dockerfile:
docker build -t MethFlow:latest .Use the utils/build_index_en.sh script inside the Docker container to build indices for the alignment tools. You need to provide your own reference genome file:
docker run -it --rm \
-v /path/to/your/reference:/ref \
-v /path/to/output:/output \
MethFlow:latest \
/opt/MethFlow/utils/build_index_en.sh \
-r /ref/genome.fa -o /output -t bwa-meme,bwa-meth,bismark-bowtie2 -@ 16Note: The -t flag specifies which tools to build indices for. Supported tools include: bwa-meme, bwa-meth, bwa-mem, astair, batmeth2, bismark-bowtie2, bismark-hisat2, biscuit, bowtie2, bsbolt, bsmapz, fame, gatk, gem3, hisat2, hisat-3n, strobealign, whisper.
Download additional required resources using the provided script:
bash utils/download.shThis will download dbSNP and GENCODE annotation files to the resources/ directory.
Copy and modify the Docker runtime configuration file:
cp config/runtime_config_docker.yaml config/my_runtime_config.yamlEdit config/my_runtime_config.yaml to set:
input_dir: Path to your input FASTQ files- Reference genome paths under the
ref:section (update to match your genome indices) snp_vcfanddbsnp_file: Path to these two files downloaded byutils/download.shif you want to usegatk-calibrate.- Any other tool-specific parameters
Create a sample sheet CSV file according to the format described in utils/sample_sheet_parser.py. See utils/conda_trigger.csv for an example. See utils/sample_sheet_parser.py for detailed usage.
Execute the Snakemake pipeline using Docker:
docker run -it --rm \
-v /folder/of/config:/data
-v /path/to/your/input:/input \
-v /path/to/output:/output \
MethFlow:latest \
snakemake --snakefile fq2bedgraph.smk \
--config sample_sheet=/data/sample_sheet.csv \
--cores 32 --use-conda --printshellcmdsNote: Mount your data directory to /data in the container and ensure your runtime config file paths are set accordingly. The /data folder should contain runtime_config.yaml and sample_sheet.csv.