Snakemake workflow: bacterial-rnaseq-processing

A Snakemake workflow for the processing of short read RNA-Seq data in bacteria. This workflow can be used in combination with subsequent workflows for follow-up analyses. For example, differential expression analysis can be performed using snakemake-bacterial-rnaseq-deseq.

Snakemake workflow: bacterial-rnaseq-processing

Usage

The usage of this workflow is described in the Snakemake Workflow Catalog.

Detailed information about input data and workflow configuration can also be found in the config/README.md.

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository or its DOI.

Workflow overview

This workflow is a best-practice workflow for the processing of short read sequencing data in bacteria. The workflow is built using snakemake and consists of the following steps:

Obtain genome database in fasta and gff format (python, NCBI Datasets)
1. Using automatic download from NCBI with a RefSeq ID
2. Using user-supplied files
Check quality of input sequencing data (FastQC)
Cut adapters and filter by length and/or sequencing quality score (fastp)
Identify unique molecular identifier (UMI, UMI-tools)
Map reads to the reference genome (STAR aligner)
Sort and index aligned RNA-Seq data (Samtools)
Deduplicate reads by unique molecular identifier (UMI, UMI-tools)
Generate cpm normalized coverage files (deepTools)
Quantify biotype features (featureCounts)
Generate summary report for all processing steps (MultiQC)

Figure 1: Directed acyclic graph (DAG) of the current workflow steps.

Deployment options

To run the workflow from command line, change the working directory.

cd path/to/snakemake-workflow-name

Adjust options in the default config file config/config.yml. Before running the complete workflow, you can perform a dry run using:

snakemake --dry-run

To run the workflow with test files using conda:

snakemake --cores 2 --sdm conda --directory .test

To run the workflow with apptainer:

snakemake --cores 2 --sdm conda apptainer --directory .test

Authors

Dr Rina Ahmed-Begrich
- Affiliation: Max-Planck-Unit for the Science of Pathogens (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-0656-1795
- github page: https://github.com/rabioinf
Dr. Michael Jahn
- Affiliation: Max-Planck-Unit for the Science of Pathogens (MPUSP), Berlin, Germany
- ORCID profile: https://orcid.org/0000-0002-3913-153X
- github page: https://github.com/m-jahn

Visit the MPUSP github page at https://github.com/MPUSP for more info on this workflow and other projects.

References

Essential tools are linked in the top section of this document

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github/workflows		.github/workflows
.test		.test
config		config
resources		resources
workflow		workflow
.gitignore		.gitignore
.snakemake-workflow-catalog.yml		.snakemake-workflow-catalog.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snakemake workflow: bacterial-rnaseq-processing

Usage

Workflow overview

Deployment options

Authors

References

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Snakemake workflow: bacterial-rnaseq-processing

Usage

Workflow overview

Deployment options

Authors

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages