Skip to content

[FEATURE] Long read? #13

@happykhan

Description

@happykhan

Description

Add support for long-read sequencing data (ONT and PacBio) within BactScout to extend its functionality beyond Illumina short reads.
This feature would introduce long-read–specific quality metrics and integrate appropriate third-party tools for read-level QC, taxonomic profiling, and contamination detection.

Use Case

Many bacterial genomics projects now incorporate long-read sequencing for hybrid assemblies, plasmid reconstruction, and structural variant analysis.
Currently, BactScout only supports paired-end Illumina data, limiting its use in mixed or long-read–only projects.

Integrating long-read QC would allow users to apply the same standardized framework for both sequencing types, maintaining consistency in quality reporting and enabling broader adoption across laboratories.

Proposed Solution

  • Implement detection of long-read input formats (FASTQ or FASTA, gzipped or uncompressed).
  • Integrate long-read QC tools such as:
    • NanoPlot / NanoStat for basic read length and quality distributions.
    • Filtlong for quality and length filtering metrics.
  • Define long-read–specific QC thresholds (mean read length, N50, quality score, contamination percentage).
  • Generate per-sample summaries compatible with existing BactScout outputs (CSV + summary reports).
  • Extend configuration and reporting modules to handle both short- and long-read modes seamlessly.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions