Skip to content

seqwell/nextflow-pairwise-alignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nextflow-pairwise-alignment

Nextflow Workflow Tests Nextflow

This Nextflow pipeline performs pairwise alignments to support clonal identification analysis. The Nextflow pipeline aligns paired-end FASTQ reads against multiple reference sequences using BWA-MEM, extracts alignment scores, and identifies the best alignment for each sample.

Pipeline Overview

The pipeline creates a Cartesian product of all FASTQ pairs and reference sequences, performing comprehensive alignment across all combinations. The final output includes alignment scores for each sample-reference combination and a summary of the best alignments.

Pairwise Alignment Workflow

The pipeline processes paired-end sequencing data through the following key steps:

  1. BWA_INDEX: Indexes all reference FASTA files for BWA alignment
  2. BWA_ALIGN: Aligns each FASTQ pair against all indexed references using BWA-MEM
  3. EXTRACT_ALIGNMENT_SCORES: Extracts alignment scores from BAM files
  4. BEST_ALIGNMENTS: Identifies the best alignment for each sample based on alignment scores

Pairwise Alignment

Dependencies

This pipeline requires installation of:

  • Nextflow: Workflow management system
  • Docker: Containerization platform for running pipeline processes

Docker Containers

All docker containers used in this pipeline should be publicly available and specified in the respective module files:

  • BWA_INDEX: seqwell/fq_assemble:v1.0
  • BWA_ALIGN: seqwell/fq_assemble:v1.0
  • EXTRACT_ALIGNMENT_SCORES: seqwell/fq_assemble:v1.0
  • BEST_ALIGNMENTS: seqwell/python:v2.0

How to Run the Pipeline

Required Parameters

The pipeline requires the following parameters:

--fastq

Path to a directory containing paired-end FASTQ files. The pipeline automatically detects files matching the pattern *_R{1,2}_001.fastq.gz. This can be either a local absolute path or an AWS S3 URI. If using an S3 URI, ensure your AWS credentials are properly configured in the nextflow.config file.

--ref

Path to a directory containing reference FASTA files (*.fa). Each FASTA file will be indexed and used as an alignment reference. This can be either a local absolute path or an AWS S3 URI. If using an S3 URI, ensure your AWS credentials are properly configured in the nextflow.config file.

--output

The output directory path where results will be saved. This can be a local absolute path or an AWS S3 URI. If using an AWS S3 URI, please ensure your security credentials are configured in the nextflow.config file.

--run_id

A unique identifier for the sequencing run being analysed.

Profiles

Profiles can be selected with the -profile option at the command line. Common profiles include:

  • docker: Run pipeline using Docker containers (it is the default)
  • test: Run pipeline using Docker containers with parameters set to default

Example Commands

Basic Execution

A minimal execution might look like:

nextflow run \
    main.nf \
    --fastq "${PWD}/path/to/fastq/directory" \
    --ref "${PWD}/path/to/references" \
    --run_id "test" \
    --output "pairwise_alignment_out" \
    -resume -bg

Running Test Data

The pipeline can be run using test data with:

nextflow run \
    main.nf \
    --fastq "${PWD}/tests/fastq" \
    --ref "${PWD}/tests/ref" \
    --run_id "test" \
    --output "pairwise_alignment_out" \
    -resume -bg

Expected Outputs

├── pairwise_alignment_out
│   ├── alignment_scores
│   │   ├── EP_1002_A01_vs_pBR322_scores.tsv
│   │   ├── EP_1002_A01_vs_pUC19_scores.tsv
│   │   ├── EP_1002_A01_vs_seqWell_DelwithpUCIDT-KanGoldenGate+_scores.tsv
│   │   ├── EP_1002_A02_vs_pBR322_scores.tsv
│   │   ├── EP_1002_A02_vs_pUC19_scores.tsv
│   │   ├── EP_1002_A02_vs_seqWell_DelwithpUCIDT-KanGoldenGate+_scores.tsv
│   │   ├── EP_1002_A03_vs_pBR322_scores.tsv
│   │   ├── EP_1002_A03_vs_pUC19_scores.tsv
│   │   ├── EP_1002_A03_vs_seqWell_DelwithpUCIDT-KanGoldenGate+_scores.tsv
│   │   ├── EP_1002_A05_vs_pBR322_scores.tsv
│   │   ├── EP_1002_A05_vs_pUC19_scores.tsv
│   │   └── EP_1002_A05_vs_seqWell_DelwithpUCIDT-KanGoldenGate+_scores.tsv
│   ├── best_alignments
│   │   ├── test_best_alignments_summary.csv
│   │   └── test_best_reference_per_sample.csv
│   └── bwa_alignments
│       ├── EP_1002_A01_vs_pBR322.bam
│       ├── EP_1002_A01_vs_pBR322.bam.bai
│       ├── EP_1002_A01_vs_pUC19.bam
│       ├── EP_1002_A01_vs_pUC19.bam.bai
│       ├── EP_1002_A01_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam
│       ├── EP_1002_A01_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam.bai
│       ├── EP_1002_A02_vs_pBR322.bam
│       ├── EP_1002_A02_vs_pBR322.bam.bai
│       ├── EP_1002_A02_vs_pUC19.bam
│       ├── EP_1002_A02_vs_pUC19.bam.bai
│       ├── EP_1002_A02_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam
│       ├── EP_1002_A02_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam.bai
│       ├── EP_1002_A03_vs_pBR322.bam
│       ├── EP_1002_A03_vs_pBR322.bam.bai
│       ├── EP_1002_A03_vs_pUC19.bam
│       ├── EP_1002_A03_vs_pUC19.bam.bai
│       ├── EP_1002_A03_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam
│       ├── EP_1002_A03_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam.bai
│       ├── EP_1002_A05_vs_pBR322.bam
│       ├── EP_1002_A05_vs_pBR322.bam.bai
│       ├── EP_1002_A05_vs_pUC19.bam
│       ├── EP_1002_A05_vs_pUC19.bam.bai
│       ├── EP_1002_A05_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam
│       └── EP_1002_A05_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam.bai

About

This Nextflow pipeline performs pairwise alignments to support clonal identification analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published