Description
Add support for long-read sequencing data (ONT and PacBio) within BactScout to extend its functionality beyond Illumina short reads.
This feature would introduce long-read–specific quality metrics and integrate appropriate third-party tools for read-level QC, taxonomic profiling, and contamination detection.
Use Case
Many bacterial genomics projects now incorporate long-read sequencing for hybrid assemblies, plasmid reconstruction, and structural variant analysis.
Currently, BactScout only supports paired-end Illumina data, limiting its use in mixed or long-read–only projects.
Integrating long-read QC would allow users to apply the same standardized framework for both sequencing types, maintaining consistency in quality reporting and enabling broader adoption across laboratories.
Proposed Solution
- Implement detection of long-read input formats (FASTQ or FASTA, gzipped or uncompressed).
- Integrate long-read QC tools such as:
- NanoPlot / NanoStat for basic read length and quality distributions.
- Filtlong for quality and length filtering metrics.
- Define long-read–specific QC thresholds (mean read length, N50, quality score, contamination percentage).
- Generate per-sample summaries compatible with existing BactScout outputs (CSV + summary reports).
- Extend configuration and reporting modules to handle both short- and long-read modes seamlessly.
Description
Add support for long-read sequencing data (ONT and PacBio) within BactScout to extend its functionality beyond Illumina short reads.
This feature would introduce long-read–specific quality metrics and integrate appropriate third-party tools for read-level QC, taxonomic profiling, and contamination detection.
Use Case
Many bacterial genomics projects now incorporate long-read sequencing for hybrid assemblies, plasmid reconstruction, and structural variant analysis.
Currently, BactScout only supports paired-end Illumina data, limiting its use in mixed or long-read–only projects.
Integrating long-read QC would allow users to apply the same standardized framework for both sequencing types, maintaining consistency in quality reporting and enabling broader adoption across laboratories.
Proposed Solution