Skip to content

Releases: ghruproject/bactscout

BactScout v1.2.0

05 Nov 18:07

Choose a tag to compare

Release date: 2025-11-05

This is a breaking release that introduces canonical coverage field names across BactScout's outputs, rounds memory metrics, and updates documentation and the CLI. Please read the upgrade notes carefully before updating production pipelines.

Quick summary

  • Version: v1.2.0
  • Nature: Breaking change (intentional)
  • Primary goals:
    • Rename and standardize coverage-related output fields
    • Round resource memory metrics to integers for cleaner output
    • Add bactscout version CLI command
    • Update QC and scaling documentation to reflect current behavior
    • Remove generated test output from repository tracking
    • Default thresholds in config changed. You can always change them yourself
    • A lot of improvement to documentation. Take a look: https://ghruproject.github.io/bactscout/

Highlights

  • Canonical coverage fields

    • Canonicalized coverage field names are now used everywhere in the codebase, outputs, and docs. Notably:
      • coverage_estimate_sylph — Sylph-derived coverage estimate (previously part of the coverage_estimate family)
      • coverage_estimate_qualibact — calculated coverage (reads / expected genome size) and its status coverage_estimate_qualibact_status
    • These canonical names are the single source of truth for outputs (CSV/JSON) and header ordering utilities.
  • Memory metric formatting

    • resource_memory_avg_mb and resource_memory_peak_mb are rounded to integers before being written to summary CSV/JSON files.
  • CLI

    • bactscout version added: prints the string from bactscout/__version__.py (now 1.2.0).
  • Documentation and guides

    • docs/guide/quality-control.md rewritten to reflect the two-tier WARN/FAIL QC logic implemented in code (see bactscout/thread.py). It includes an explicit note about what “x‑fold” coverage means and how both Sylph and qualibact estimates are combined to derive final pass/warning/fail results.
    • docs/guide/scaling.md added: practical guidance for running BactScout at scale and a Nextflow example walkthrough (processes collect_sample and final_summary).
    • Per-sample README content consolidated into the canonical docs/usage/output-format.md.

Detailed changelog

The following files were updated or added as part of this release (representative list):

  • bactscout/thread.py

    • All coverage-related keys replaced with canonical names.
    • Memory metrics rounded before writeout.
    • Final PASS/WARNING/FAIL logic aligned with documentation (two-tier thresholds; critical vs non-critical metrics).
  • bactscout/util.py

    • CSV header ordering and formatting helpers updated to include new canonical coverage keys.
  • bactscout.py

    • New version subcommand.
  • bactscout/__version__.py

    • Updated to __version__ = "1.2.0".
  • docs/guide/quality-control.md

    • Rewritten to match logic in bactscout/thread.py (WARN/FAIL thresholds, coverage handling, metric definitions and guidance).
  • docs/guide/scaling.md (new)

    • Guidance for multi-sample and HPC/Nextflow deployments; notes about I/O, resource monitoring, and process-level responsibilities.

Full Changelog: v1.1.2...v1.2.0

BactScout v1.1.2

30 Oct 18:17
5a1234b

Choose a tag to compare

🎉 What's Changed

🐛 Bug Fixes

  • Fix MLST ST detection failing for empty/invalid values - Resolved issue where MLST sequence typing would fail when encountering empty or invalid ST values in stringMLST output, ensuring robust handling of edge cases (#9)

📚 Documentation Improvements

  • Comprehensive docstring updates - Updated all function docstrings in thread.py to accurately reflect current implementation
    • Removed references to deprecated QC metrics (insert size, filtering status, quality trends)
    • Added detailed parameter descriptions with types and defaults
    • Documented status logic (PASSED/WARNING/FAILED) for all QC handlers
    • Enhanced workflow documentation with step-by-step processing details
    • Improved error handling and edge case documentation

🔧 Configuration Updates

  • Adapter detection enhancement - Added adapter_overrep_threshold configuration parameter for more granular control over adapter contamination detection
  • GC content evaluation - Added gc_fail_percentage parameter for improved GC content range validation
  • Removed deprecated quality_end_drop_threshold configuration

🧹 Code Quality

  • Improved function documentation consistency across all QC evaluation handlers
  • Enhanced status message clarity for better debugging and reporting
  • Better backward compatibility notes for legacy configuration parameters

🧪 Testing

  • Added comprehensive test coverage for genome download functionality
  • Enhanced integration tests for sample data collection pipeline
  • Optimized CI workflows by removing memory-intensive tests

Full Changelog: v1.0.0...v1.1.2

BactScout v1.0.0

27 Oct 15:00

Choose a tag to compare

Release Highlights

✨ Features

  • Complete MLST analysis pipeline for bacterial genomic classification
  • Multi-threaded processing for improved performance
  • Sylph-based species identification
  • FastP quality control integration
  • Comprehensive configuration system

🧪 Testing & Quality

  • 98 comprehensive pytest tests (100% passing)
  • 16 CLI integration tests
  • 56 fastp data extraction tests
  • 13 stringmlst module tests
  • Full code coverage reporting with Codecov integration

🚀 DevOps & Automation

  • GitHub Actions CI/CD pipeline with Pixi
  • Automated linting and code quality checks

📦 Deployment

  • Pre-configured settings for common bacterial species
  • Database support for: Acinetobacter baumannii, Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa, Salmonella enterica

🐛 Bug Fixes

  • Fixed fastp metrics extraction for read length calculations
  • Fixed field name validation in fastp result handling

This is a stable, production-ready release suitable for genomic analysis workflows.