Skip to content

MPUSP/snakemake-bacterial-rnaseq-processing

Repository files navigation

Snakemake GitHub actions run with conda run with apptainer workflow catalog


A Snakemake workflow for the processing of short read RNA-Seq data in bacteria. This workflow can be used in combination with subsequent workflows for follow-up analyses. For example, differential expression analysis can be performed using snakemake-bacterial-rnaseq-deseq.

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

Detailed information about input data and workflow configuration can also be found in the config/README.md.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository or its DOI.

Workflow overview

This workflow is a best-practice workflow for the processing of short read sequencing data in bacteria. The workflow is built using snakemake and consists of the following steps:

  1. Obtain genome database in fasta and gff format (python, NCBI Datasets)
    1. Using automatic download from NCBI with a RefSeq ID
    2. Using user-supplied files
  2. Check quality of input sequencing data (FastQC)
  3. Cut adapters and filter by length and/or sequencing quality score (fastp)
  4. Identify unique molecular identifier (UMI, UMI-tools)
  5. Map reads to the reference genome (STAR aligner)
  6. Sort and index aligned RNA-Seq data (Samtools)
  7. Deduplicate reads by unique molecular identifier (UMI, UMI-tools)
  8. Generate cpm normalized coverage files (deepTools)
  9. Quantify biotype features (featureCounts)
  10. Generate summary report for all processing steps (MultiQC)

Figure 1: Directed acyclic graph (DAG) of the current workflow steps.

Deployment options

To run the workflow from command line, change the working directory.

cd path/to/snakemake-workflow-name

Adjust options in the default config file config/config.yml. Before running the complete workflow, you can perform a dry run using:

snakemake --dry-run

To run the workflow with test files using conda:

snakemake --cores 2 --sdm conda --directory .test

To run the workflow with apptainer:

snakemake --cores 2 --sdm conda apptainer --directory .test

Authors

Visit the MPUSP github page at https://github.com/MPUSP for more info on this workflow and other projects.

References

  • Essential tools are linked in the top section of this document

Packages

 
 
 

Contributors

Languages