This Nextflow pipeline performs pairwise alignments to support clonal identification analysis. The Nextflow pipeline aligns paired-end FASTQ reads against multiple reference sequences using BWA-MEM, extracts alignment scores, and identifies the best alignment for each sample.
The pipeline creates a Cartesian product of all FASTQ pairs and reference sequences, performing comprehensive alignment across all combinations. The final output includes alignment scores for each sample-reference combination and a summary of the best alignments.
The pipeline processes paired-end sequencing data through the following key steps:
- BWA_INDEX: Indexes all reference FASTA files for BWA alignment
- BWA_ALIGN: Aligns each FASTQ pair against all indexed references using BWA-MEM
- EXTRACT_ALIGNMENT_SCORES: Extracts alignment scores from BAM files
- BEST_ALIGNMENTS: Identifies the best alignment for each sample based on alignment scores
This pipeline requires installation of:
- Nextflow: Workflow management system
- Docker: Containerization platform for running pipeline processes
All docker containers used in this pipeline should be publicly available and specified in the respective module files:
- BWA_INDEX:
seqwell/fq_assemble:v1.0 - BWA_ALIGN:
seqwell/fq_assemble:v1.0 - EXTRACT_ALIGNMENT_SCORES:
seqwell/fq_assemble:v1.0 - BEST_ALIGNMENTS:
seqwell/python:v2.0
The pipeline requires the following parameters:
Path to a directory containing paired-end FASTQ files. The pipeline automatically detects files matching the pattern *_R{1,2}_001.fastq.gz. This can be either a local absolute path or an AWS S3 URI. If using an S3 URI, ensure your AWS credentials are properly configured in the nextflow.config file.
Path to a directory containing reference FASTA files (*.fa). Each FASTA file will be indexed and used as an alignment reference. This can be either a local absolute path or an AWS S3 URI. If using an S3 URI, ensure your AWS credentials are properly configured in the nextflow.config file.
The output directory path where results will be saved. This can be a local absolute path or an AWS S3 URI. If using an AWS S3 URI, please ensure your security credentials are configured in the nextflow.config file.
A unique identifier for the sequencing run being analysed.
Profiles can be selected with the -profile option at the command line. Common profiles include:
- docker: Run pipeline using Docker containers (it is the default)
- test: Run pipeline using Docker containers with parameters set to default
A minimal execution might look like:
nextflow run \
main.nf \
--fastq "${PWD}/path/to/fastq/directory" \
--ref "${PWD}/path/to/references" \
--run_id "test" \
--output "pairwise_alignment_out" \
-resume -bgThe pipeline can be run using test data with:
nextflow run \
main.nf \
--fastq "${PWD}/tests/fastq" \
--ref "${PWD}/tests/ref" \
--run_id "test" \
--output "pairwise_alignment_out" \
-resume -bg├── pairwise_alignment_out
│ ├── alignment_scores
│ │ ├── EP_1002_A01_vs_pBR322_scores.tsv
│ │ ├── EP_1002_A01_vs_pUC19_scores.tsv
│ │ ├── EP_1002_A01_vs_seqWell_DelwithpUCIDT-KanGoldenGate+_scores.tsv
│ │ ├── EP_1002_A02_vs_pBR322_scores.tsv
│ │ ├── EP_1002_A02_vs_pUC19_scores.tsv
│ │ ├── EP_1002_A02_vs_seqWell_DelwithpUCIDT-KanGoldenGate+_scores.tsv
│ │ ├── EP_1002_A03_vs_pBR322_scores.tsv
│ │ ├── EP_1002_A03_vs_pUC19_scores.tsv
│ │ ├── EP_1002_A03_vs_seqWell_DelwithpUCIDT-KanGoldenGate+_scores.tsv
│ │ ├── EP_1002_A05_vs_pBR322_scores.tsv
│ │ ├── EP_1002_A05_vs_pUC19_scores.tsv
│ │ └── EP_1002_A05_vs_seqWell_DelwithpUCIDT-KanGoldenGate+_scores.tsv
│ ├── best_alignments
│ │ ├── test_best_alignments_summary.csv
│ │ └── test_best_reference_per_sample.csv
│ └── bwa_alignments
│ ├── EP_1002_A01_vs_pBR322.bam
│ ├── EP_1002_A01_vs_pBR322.bam.bai
│ ├── EP_1002_A01_vs_pUC19.bam
│ ├── EP_1002_A01_vs_pUC19.bam.bai
│ ├── EP_1002_A01_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam
│ ├── EP_1002_A01_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam.bai
│ ├── EP_1002_A02_vs_pBR322.bam
│ ├── EP_1002_A02_vs_pBR322.bam.bai
│ ├── EP_1002_A02_vs_pUC19.bam
│ ├── EP_1002_A02_vs_pUC19.bam.bai
│ ├── EP_1002_A02_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam
│ ├── EP_1002_A02_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam.bai
│ ├── EP_1002_A03_vs_pBR322.bam
│ ├── EP_1002_A03_vs_pBR322.bam.bai
│ ├── EP_1002_A03_vs_pUC19.bam
│ ├── EP_1002_A03_vs_pUC19.bam.bai
│ ├── EP_1002_A03_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam
│ ├── EP_1002_A03_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam.bai
│ ├── EP_1002_A05_vs_pBR322.bam
│ ├── EP_1002_A05_vs_pBR322.bam.bai
│ ├── EP_1002_A05_vs_pUC19.bam
│ ├── EP_1002_A05_vs_pUC19.bam.bai
│ ├── EP_1002_A05_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam
│ └── EP_1002_A05_vs_seqWell_DelwithpUCIDT-KanGoldenGate+.bam.bai
