Skip to content

YuanfengZhang/MethFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

122 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MethFlow zread

Snakemake pipelines to run NGS-based methylation analysis and benchmark. The python script and .ipynb files to reproduce the benchmarking results, including evaluation and statistical analysis, are in the benchmark folder.

The scripts for data visualization are listed in benchmark/figures folder.

Following is a simplified schematic diagram involving the MethFlow and MethCali: Schematic diagram

Here is the way to calculate the RMSE and SpearmanR: RMSE SpearmanR

Please read the benchmark/README.md for more details.

Requirements

While the snakemake and python scripts are compatible with most operating systems, there are many bioinformatic tools evaluated here only work on x64 Linux. Please use a x64 Linux server / container with at least 128GB RAM and 8 CPU cores to run the pipelines.

Installation

Step 1: Clone the Repository

Clone this repository with all submodules:

git clone https://github.com/YuanfengZhang/MethFlow --recurse-submodules
cd MethFlow

Step 2: Build the Docker Image

Build the Docker image from the provided Dockerfile:

docker build -t MethFlow:latest .

Step 3: Build Genome Index Files

Use the utils/build_index_en.sh script inside the Docker container to build indices for the alignment tools. You need to provide your own reference genome file:

docker run -it --rm \
    -v /path/to/your/reference:/ref \
    -v /path/to/output:/output \
    MethFlow:latest \
    /opt/MethFlow/utils/build_index_en.sh \
    -r /ref/genome.fa -o /output -t bwa-meme,bwa-meth,bismark-bowtie2 -@ 16

Note: The -t flag specifies which tools to build indices for. Supported tools include: bwa-meme, bwa-meth, bwa-mem, astair, batmeth2, bismark-bowtie2, bismark-hisat2, biscuit, bowtie2, bsbolt, bsmapz, fame, gatk, gem3, hisat2, hisat-3n, strobealign, whisper.

Step 4: Download Required Resources

Download additional required resources using the provided script:

bash utils/download.sh

This will download dbSNP and GENCODE annotation files to the resources/ directory.

Step 5: Configure Runtime Parameters

Copy and modify the Docker runtime configuration file:

cp config/runtime_config_docker.yaml config/my_runtime_config.yaml

Edit config/my_runtime_config.yaml to set:

  • input_dir: Path to your input FASTQ files
  • Reference genome paths under the ref: section (update to match your genome indices)
  • snp_vcf and dbsnp_file: Path to these two files downloaded by utils/download.sh if you want to use gatk-calibrate.
  • Any other tool-specific parameters

Step 6: Prepare Sample Sheet

Create a sample sheet CSV file according to the format described in utils/sample_sheet_parser.py. See utils/conda_trigger.csv for an example. See utils/sample_sheet_parser.py for detailed usage.

Step 7: Run the Pipeline

Execute the Snakemake pipeline using Docker:

docker run -it --rm \
    -v /folder/of/config:/data
    -v /path/to/your/input:/input \
    -v /path/to/output:/output \
    MethFlow:latest \
    snakemake --snakefile fq2bedgraph.smk \
        --config sample_sheet=/data/sample_sheet.csv \
        --cores 32 --use-conda --printshellcmds

Note: Mount your data directory to /data in the container and ensure your runtime config file paths are set accordingly. The /data folder should contain runtime_config.yaml and sample_sheet.csv.

About

Snakemake pipelines to run NGS-based methylation analysis and benchmark.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors