Skip to content

biggeoff/aviti-workflow

Repository files navigation

AVITI Workflow Repository

Version 1.0.1

Complete workflow automation for AVITI sequencer data processing in Nonacus's clinical genomic testing pipelines (GALEAS). This repository provides end-to-end automation from FASTQ file detection through pipeline execution, result download, and quality control reporting.

Table of Contents

What is AVITI?

AVITI is a next-generation sequencer from Element Biosciences that provides an alternative platform to Illumina for clinical genomic testing. This repository contains specialized workflows for:

  • GALEAS-Hereditary-Plus: Germline variant detection and validation
  • GALEAS-Bladder: Bladder cancer mutation analysis
  • GALEAS-Tumor-HRD: Tumor HRD assessment with CNV analysis

Repository Structure

aviti-workflow/
├── core/                          # Main workflow automation scripts
│   ├── samplesheet_generator_seqera_tw.py    # Samplesheet generation & pipeline launch
│   ├── download_and_postprocess_s3.py        # Result download & SharePoint upload
│   ├── copy_and_rename_sample_files.sh       # Sample file management
│   ├── test_samplesheet_generator_seqera_tw.py   # Test suite (73 tests)
│   └── test_download_and_postprocess_s3.py       # Postprocess tests
│
├── vcf_comparison/                # AVITI vs Illumina validation
│   ├── compare_vcfs_virtual_panel_discordant_report_Aviti.py
│   └── README.md
│
├── cnv_analysis/                  # Copy number variation analysis
│   ├── cnvkit_pon_aviti.sh       # Build Panel of Normals for AVITI
│   ├── samtools_reindex_bam.sh   # BAM indexing utility
│   └── cnvkit_pon_SOP.md         # CNVkit procedure documentation
│
├── docs/                          # Documentation
│   ├── README_samplesheet_generator.md
│   ├── README_download_and_postprocess_s3.md
│   └── README_TESTING_SAMPLESHEET_GENERATOR.md
│
├── sample_samplesheets/           # Example samplesheets
│   ├── AVITI01 (Hereditary - Odoo 118)
│   ├── AVITI02 (Hereditary - Odoo 14967)
│   ├── AVITI04 (Bladder - Odoo 15015)
│   └── AVITI17 (Tumor - Odoo 256)
│
└── test_data/                     # Test reports
    ├── aviti-galeas-launch-unit-test-report-v1.0.0.html
    └── aviti-post-unit-test-report-v1.0.0.html

Quick Start

Prerequisites

# Required environment variables
export TOWER_API_ENDPOINT="https://staging-tower.nonacus.com/api"
export TOWER_ACCESS_TOKEN="your_token_here"

# AWS credentials configured
aws configure

Generate Samplesheet and Launch Pipeline

cd ~/IGL_apps/aviti-workflow/core

# Hereditary service (most common)
python samplesheet_generator_seqera_tw.py \
    -seq aviti \
    -i samples.txt \
    -rn 20250724 \
    -panel GALEAS-Hereditary-Plus-1966 \
    -pipeline GALEAS-Hereditary-Plus-1966 \
    -launch y \
    -outfolder AVITI_HP_Run1 \
    -ot 118

# Bladder service
python samplesheet_generator_seqera_tw.py \
    -seq aviti \
    -i samples.txt \
    -rn 20250808 \
    -panel GALEAS-Bladder-Release-250701 \
    -pipeline GALEAS-Bladder-Release-250701 \
    -launch y \
    -outfolder AVITI_Bladder_Run \
    -ot 15015

# Tumor service with AVITI-specific PoN
python samplesheet_generator_seqera_tw.py \
    -seq aviti \
    -i samples.txt \
    -rn 20251216 \
    -panel GALEAS-Tumor-Release-250701 \
    -pipeline GALEAS-Tumor-Release-250701_AVITI \
    -launch y \
    -outfolder AVITI_Tumor_Run \
    -ot 256

Download Results and Upload to SharePoint

# Hereditary service - downloads VCF, BAM, coverage, MultiQC
python download_and_postprocess_s3.py \
    -i s3://bucket/hereditary/AVITI_HP_Run1/ \
    -s hereditary \
    -ot 118 \
    --upload

# Bladder service - includes Odoo HCRM integration
python download_and_postprocess_s3.py \
    -i s3://nonacus-research-eu-west-2/1821-galeasbladder/AVITI04/ \
    -s bladder \
    -ot 15015 \
    --fetch-from-odoo \
    --upload

Core Workflows

1. Samplesheet Generation

The samplesheet generator is the central orchestrator for AVITI workflows:

Key Features:

  • AVITI-specific FASTQ detection (supports lane-based naming: sample_L1_R1.fastq.gz)
  • Flexible run name matching (searches pattern anywhere in folder name)
  • 11-point validation suite for data integrity
  • Panel-to-pipeline compatibility checking
  • Dry-run mode for safe testing
  • Downsampling support for validation runs

AVITI S3 Structure:

s3://ncs-aviti/AV252104/<run_folder>/Samples/<sample_id>/
├── <sample_id>_L1_R1.fastq.gz
├── <sample_id>_L1_R2.fastq.gz
├── <sample_id>_L2_R1.fastq.gz
└── <sample_id>_L2_R2.fastq.gz

Validation Includes:

  • FASTQ file existence in S3
  • Proper R1/R2 pairing
  • Duplicate sample detection
  • Run name pattern matching
  • Panel BED file verification

2. Download and Postprocess

Automated result retrieval with service-specific handling:

Hereditary Service:

  • Downloads: VCF, BAM, coverage files, MultiQC reports
  • SharePoint path: 1.2.16 AVITI Sequencing runs/[run_folder]/
  • Sample ID remapping (removes _int_AVITI, _rpt50, _ds10M suffixes)

Bladder Service:

  • Downloads: QCI reports (PDFs)
  • SharePoint path: 2.2.10 AVITI Sequencing runs/[Account]/[Result]/[SampleID]/
  • Odoo HCRM integration for sample metadata
  • Automatic account mapping

Features:

  • Presigned S3 URLs for large BAM files
  • Comprehensive logging (console + file)
  • Batch processing with progress tracking
  • Automatic directory structure creation

3. Sample File Management

Helper script for copying and renaming AVITI results between S3 locations:

bash copy_and_rename_sample_files.sh

Used for creating test subsets or reorganizing results with Odoo ticket correlation.

VCF Comparison

Compare AVITI and Illumina sequencing results for validation:

cd ~/IGL_apps/aviti-workflow/vcf_comparison

python compare_vcfs_virtual_panel_discordant_report_Aviti.py \
    --folder1 /path/to/aviti/vcfs \
    --folder2 /path/to/illumina/vcfs \
    --output_dir /path/to/output \
    --bed_file panel.bed

Output Files:

  • *_unique_AVITI.vcf - Variants only in AVITI
  • *_unique_Illumina.vcf - Variants only in Illumina
  • *_common.vcf - Shared variants
  • *_discordant_alts_to_be_checked.txt - Discordant ALT calls
  • Global CSV summary with TP/FP/FN metrics

Use Cases:

  • Cross-platform validation
  • Sensitivity/specificity analysis
  • Virtual panel filtering
  • Quality control metrics

See vcf_comparison/README.md for detailed usage.

CNV Analysis

Build AVITI-specific Panel of Normals (PoN) for CNVkit:

cd ~/IGL_apps/aviti-workflow/cnv_analysis

bash cnvkit_pon_aviti.sh

Workflow:

  1. Generate accessible regions (genome - blacklist)
  2. Create targets and antitargets from BED files
  3. Compute coverage for each normal BAM
  4. Aggregate into reference .cnn file

Inputs:

  • Reference: hg38.fa
  • Blacklist: hg38-blacklist.v2.bed
  • RefFlat: refFlat.hg38.txt
  • Target BED: Nonacus_GALEAS_Tumor_1911_covered.bed
  • BAM directory: AVITI tumor normal samples

Output:

  • pon_reference.cnn - Ready for tumor CNV calling

Docker-based: Uses etal/cnvkit container for reproducibility.

See cnv_analysis/cnvkit_pon_SOP.md for detailed procedure.

Dependencies

Required External Tools

# AWS CLI (for S3 interaction)
aws --version

# Seqera Tower CLI v0.10.1 (workflow management)
tw --version

# Docker (for CNVkit)
docker --version

# VCF processing (for comparison workflows)
bcftools --version
bgzip --version
tabix --version

# BAM processing
samtools --version

Python Packages

Standard library dependencies:

  • subprocess, argparse, csv, os, datetime, json, logging

External (implied via CLI):

  • boto3 (AWS SDK, via aws CLI)

Reference Data

CNVkit PoN:

  • AVITI-specific: s3://nonacus-development-eu-west-2-reference/GRCh38/v0/cnvkit_pon/aviti/pon_reference.cnn

Panel Files:

  • s3://nonacus-research-eu-west-2/<project>/panels/

Configuration

AVITI-Specific Parameters

Sequencer Base Path:

"aviti": "s3://ncs-aviti/AV252104"

Run Name Patterns:

  • Date format: YYYYMMDD (e.g., 20250724)
  • Instrument ID: AVITI01, AVITI02, AVITI04, etc.
  • Full example: AV252104_AVITI01_20250724_Eval_Run_1
  • Flexible matching: Pattern can appear anywhere in folder name (case-insensitive)

Pipeline-Specific:

  • Tumor pipeline: GALEAS-Tumor-Release-250701_AVITI
  • Uses AVITI PoN: pon_reference.cnn
  • Parameter: save_realign_bam: false (default for AVITI)

SharePoint Integration

Hereditary:

Base: S. Clinical Services TRUE/1. Hereditary/1.2 - Clinical/
Path: 1.2.16 AVITI Sequencing runs/[run_folder]/

Bladder:

Base: S. Clinical Services TRUE/2. Bladder/2.2 - Clinical/
Path: 2.2.10 AVITI Sequencing runs/[Account]/[Result]/[SampleID]/

Common Commands

Validation and Testing

Dry-run (no execution):

python samplesheet_generator_seqera_tw.py \
    -seq aviti -i samples.txt -rn 20250724 \
    -panel GALEAS-Hereditary-Plus-1966 \
    -pipeline GALEAS-Hereditary-Plus-1966 \
    --dry-run

Downsampled validation:

# 5M reads only
python samplesheet_generator_seqera_tw.py \
    -seq aviti -i samples.txt -rn 20250724 \
    -panel GALEAS-Hereditary-Plus-1966 \
    -pipeline GALEAS-Hereditary-Plus-1966 \
    -ds y -dsn 5M \
    -launch y -outfolder AVITI_5M_Test

# Both original and downsampled
python samplesheet_generator_seqera_tw.py \
    -seq aviti -i samples.txt -rn 20250724 \
    -panel GALEAS-Hereditary-Plus-1966 \
    -pipeline GALEAS-Hereditary-Plus-1966 \
    -ds y -sds y -dsn 10M \
    -launch y -outfolder AVITI_Full_and_10M

Skip S3 validation (faster):

python samplesheet_generator_seqera_tw.py \
    -seq aviti -i samples.txt -rn 20250724 \
    -panel GALEAS-Hereditary-Plus-1966 \
    --skip-s3-check

Launch from Existing Samplesheet

python samplesheet_generator_seqera_tw.py \
    -sh sample_samplesheets/AVITI01_hereditary.csv \
    -pipeline GALEAS-Hereditary-Plus-1966 \
    -launch y \
    -outfolder AVITI_Rerun

List Available Options

# List all panels
python samplesheet_generator_seqera_tw.py --list-panels

# List all pipelines
python samplesheet_generator_seqera_tw.py --list-pipelines

Examples

Example 1: AVITI01 Hereditary Run (Odoo 118)

# samples.txt contains:
# L1008-C0155-AD1
# L1008-C0155-AD2
# L1008-C0155-AD3

python samplesheet_generator_seqera_tw.py \
    -seq aviti \
    -i samples.txt \
    -rn 20250724 \
    -panel GALEAS-Hereditary-Plus-1966 \
    -pipeline GALEAS-Hereditary-Plus-1966 \
    -launch y \
    -outfolder AVITI01_Eval_Run_1 \
    -ot 118

# After pipeline completes, download results
python download_and_postprocess_s3.py \
    -i s3://dna-nexus-sequencing/service_improvement/AVITI01_Eval_Run_1/ \
    -s hereditary \
    -ot 118 \
    --upload

Example 2: AVITI04 Bladder Run (Odoo 15015)

python samplesheet_generator_seqera_tw.py \
    -seq aviti \
    -i bladder_samples.txt \
    -rn 20250808 \
    -panel GALEAS-Bladder-Release-250701 \
    -pipeline GALEAS-Bladder-Release-250701 \
    -launch y \
    -outfolder AVITI04_Eval_Run_4 \
    -ot 15015

# Download with Odoo integration
python download_and_postprocess_s3.py \
    -i s3://nonacus-research-eu-west-2/1821-galeasbladder/AVITI04_Eval_Run_4/ \
    -s bladder \
    -ot 15015 \
    --fetch-from-odoo \
    --upload

Example 3: AVITI17 Tumor Run (Odoo 256)

python samplesheet_generator_seqera_tw.py \
    -seq aviti \
    -i tumor_samples.txt \
    -rn 20251216 \
    -panel GALEAS-Tumor-Release-250701 \
    -pipeline GALEAS-Tumor-Release-250701_AVITI \
    -launch y \
    -outfolder AVITI17_Tumor_Eval_Run_1 \
    -ot 256

Example 4: VCF Comparison (AVITI vs Illumina)

cd vcf_comparison

# Compare AVITI02 hereditary results against Illumina baseline
python compare_vcfs_virtual_panel_discordant_report_Aviti.py \
    --folder1 /data/aviti02_hereditary_vcfs/ \
    --folder2 /data/illumina_nextseq57_vcfs/ \
    --output_dir /data/aviti02_comparison/ \
    --bed_file /reference/GALEAS_Hereditary_Plus_1966.bed

# Review discordant calls
less /data/aviti02_comparison/*_discordant_alts_to_be_checked.txt

Testing

The repository includes comprehensive unit and integration tests for all core workflows.

Test Setup

Install test dependencies:

pip install -r test_requirements.txt

This installs:

  • pytest (testing framework)
  • pytest-cov (coverage reporting)
  • pytest-mock (mocking utilities)
  • pytest-xdist (parallel execution)
  • pytest-html (HTML reports)

Running Tests

Run all tests:

pytest test_*.py -v

Run specific test suite:

# Samplesheet generator (73 tests)
pytest test_samplesheet_generator_seqera_tw.py -v

# Download and postprocess (54 tests)
pytest test_download_and_postprocess_s3.py -v

Run only unit tests:

pytest -v -m unit

Run with coverage report:

pytest test_*.py --cov=core --cov-report=html

Generate HTML test report:

pytest test_*.py --html=test_report.html --self-contained-html

Test Results

Current test suite coverage:

  • 127 total tests
  • 126 tests passing (99.2%)
  • 1 test skipped (AWS integration test)
  • Test execution time: <1 second

Test Reports

Pre-generated HTML test reports in test_data/:

  • aviti-galeas-launch-unit-test-report-v1.0.0.html
  • aviti-post-unit-test-report-v1.0.0.html

Sample Data

Example samplesheets in sample_samplesheets/:

  • AVITI01: 10 hereditary samples (Odoo 118)
  • AVITI02: 82 hereditary samples (Odoo 14967) with 10-sample test subsets
  • AVITI04: 82 bladder samples (Odoo 15015)
  • AVITI17: 31 tumor samples (Odoo 256)

Documentation

Detailed documentation in docs/:

  1. README_samplesheet_generator.md

    • Complete parameter reference
    • Validation rules
    • Panel/pipeline shortcuts
    • Advanced usage
  2. README_download_and_postprocess_s3.md

    • Service-specific workflows
    • SharePoint integration
    • Odoo HCRM usage
    • Logging configuration
  3. README_TESTING_SAMPLESHEET_GENERATOR.md

    • Test suite overview
    • Coverage analysis
    • Running tests
    • Future enhancements

Troubleshooting

AVITI Run Folder Not Found

Error: "Could not find unique run folder matching pattern"

Solution:

  • Try shorter pattern: Just date (e.g., 20250724) or instrument ID (e.g., AVITI01)
  • Verify folder exists: aws s3 ls s3://ncs-aviti/AV252104/
  • Pattern is case-insensitive and matches anywhere in folder name

S3 FASTQ Files Not Found

Error: "FASTQ files not found in S3 for sample X"

Solutions:

  • Check S3 path structure: Should be Samples/<sample_id>/<sample_id>_L*_R*.fastq.gz
  • Use --skip-s3-check to skip validation (faster, but less safe)
  • Verify sample ID matches folder name exactly

Panel-Pipeline Mismatch Warning

Warning: "Panel and pipeline names do not match"

Action:

  • Review panel and pipeline compatibility
  • Confirm intended configuration
  • Type "yes" to proceed if correct

AWS Credentials Not Configured

Error: "Unable to locate credentials"

Solution:

aws configure
# Enter AWS Access Key ID
# Enter AWS Secret Access Key
# Enter region: eu-west-2

Tower Token Not Set

Error: "TOWER_ACCESS_TOKEN environment variable not set"

Solution:

export TOWER_ACCESS_TOKEN="your_token_here"
# Add to ~/.bashrc for persistence

SharePoint Upload Fails

Error: SharePoint connection timeout or permission denied

Solutions:

  • Verify network connectivity to SharePoint
  • Check SharePoint credentials in environment
  • Ensure proper folder permissions
  • Review upload path in script configuration

Version History

  • v1.9.0 (2026-01-20): Latest samplesheet generator with enhanced AVITI support
  • v1.5.9 (2025-12): Added --nf-config-file for additional Nextflow configs
  • v1.5.8 (2025-12): Fixed PoN parameters for AVITI tumor pipeline

Contributing

Follow the process described in BIX12 SOP.

Requirements:

  1. Create branches for changes
  2. Get 1 reviewer approval before merging to main
  3. Update version history in script headers
  4. Run test suite before committing

Support

For issues or questions:

  • Review documentation in docs/
  • Check sample samplesheets for examples
  • Consult test reports for expected behavior
  • Contact bioinformatics team for complex issues

License

Internal use only - Nonacus Ltd.

About

AVITI sequencer workflow automation for GALEAS clinical genomic testing pipelines. End-to-end automation: samplesheet generation, pipeline launch, result download, and QC reporting.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors