A Snakemake workflow for the processing of short read RNA-Seq data in bacteria. This workflow can be used in combination with subsequent workflows for follow-up analyses. For example, differential expression analysis can be performed using snakemake-bacterial-rnaseq-deseq.
The usage of this workflow is described in the Snakemake Workflow Catalog.
Detailed information about input data and workflow configuration can also be found in the config/README.md.
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository or its DOI.
This workflow is a best-practice workflow for the processing of short read sequencing data in bacteria. The workflow is built using snakemake and consists of the following steps:
- Obtain genome database in
fastaandgffformat (python, NCBI Datasets)- Using automatic download from NCBI with a
RefSeqID - Using user-supplied files
- Using automatic download from NCBI with a
- Check quality of input sequencing data (FastQC)
- Cut adapters and filter by length and/or sequencing quality score (fastp)
- Identify unique molecular identifier (UMI, UMI-tools)
- Map reads to the reference genome (STAR aligner)
- Sort and index aligned RNA-Seq data (Samtools)
- Deduplicate reads by unique molecular identifier (UMI, UMI-tools)
- Generate cpm normalized coverage files (deepTools)
- Quantify biotype features (featureCounts)
- Generate summary report for all processing steps (MultiQC)
Figure 1: Directed acyclic graph (DAG) of the current workflow steps.
To run the workflow from command line, change the working directory.
cd path/to/snakemake-workflow-nameAdjust options in the default config file config/config.yml.
Before running the complete workflow, you can perform a dry run using:
snakemake --dry-runTo run the workflow with test files using conda:
snakemake --cores 2 --sdm conda --directory .testTo run the workflow with apptainer:
snakemake --cores 2 --sdm conda apptainer --directory .test- Dr Rina Ahmed-Begrich
- Affiliation: Max-Planck-Unit for the Science of Pathogens (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-0656-1795
- github page: https://github.com/rabioinf
- Dr. Michael Jahn
- Affiliation: Max-Planck-Unit for the Science of Pathogens (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-3913-153X
- github page: https://github.com/m-jahn
Visit the MPUSP github page at https://github.com/MPUSP for more info on this workflow and other projects.
- Essential tools are linked in the top section of this document
