Version 1.0.1
Complete workflow automation for AVITI sequencer data processing in Nonacus's clinical genomic testing pipelines (GALEAS). This repository provides end-to-end automation from FASTQ file detection through pipeline execution, result download, and quality control reporting.
- What is AVITI?
- Repository Structure
- Quick Start
- Core Workflows
- VCF Comparison
- CNV Analysis
- Dependencies
- Configuration
- Common Commands
- Examples
- Testing
- Documentation
AVITI is a next-generation sequencer from Element Biosciences that provides an alternative platform to Illumina for clinical genomic testing. This repository contains specialized workflows for:
- GALEAS-Hereditary-Plus: Germline variant detection and validation
- GALEAS-Bladder: Bladder cancer mutation analysis
- GALEAS-Tumor-HRD: Tumor HRD assessment with CNV analysis
aviti-workflow/
├── core/ # Main workflow automation scripts
│ ├── samplesheet_generator_seqera_tw.py # Samplesheet generation & pipeline launch
│ ├── download_and_postprocess_s3.py # Result download & SharePoint upload
│ ├── copy_and_rename_sample_files.sh # Sample file management
│ ├── test_samplesheet_generator_seqera_tw.py # Test suite (73 tests)
│ └── test_download_and_postprocess_s3.py # Postprocess tests
│
├── vcf_comparison/ # AVITI vs Illumina validation
│ ├── compare_vcfs_virtual_panel_discordant_report_Aviti.py
│ └── README.md
│
├── cnv_analysis/ # Copy number variation analysis
│ ├── cnvkit_pon_aviti.sh # Build Panel of Normals for AVITI
│ ├── samtools_reindex_bam.sh # BAM indexing utility
│ └── cnvkit_pon_SOP.md # CNVkit procedure documentation
│
├── docs/ # Documentation
│ ├── README_samplesheet_generator.md
│ ├── README_download_and_postprocess_s3.md
│ └── README_TESTING_SAMPLESHEET_GENERATOR.md
│
├── sample_samplesheets/ # Example samplesheets
│ ├── AVITI01 (Hereditary - Odoo 118)
│ ├── AVITI02 (Hereditary - Odoo 14967)
│ ├── AVITI04 (Bladder - Odoo 15015)
│ └── AVITI17 (Tumor - Odoo 256)
│
└── test_data/ # Test reports
├── aviti-galeas-launch-unit-test-report-v1.0.0.html
└── aviti-post-unit-test-report-v1.0.0.html
# Required environment variables
export TOWER_API_ENDPOINT="https://staging-tower.nonacus.com/api"
export TOWER_ACCESS_TOKEN="your_token_here"
# AWS credentials configured
aws configurecd ~/IGL_apps/aviti-workflow/core
# Hereditary service (most common)
python samplesheet_generator_seqera_tw.py \
-seq aviti \
-i samples.txt \
-rn 20250724 \
-panel GALEAS-Hereditary-Plus-1966 \
-pipeline GALEAS-Hereditary-Plus-1966 \
-launch y \
-outfolder AVITI_HP_Run1 \
-ot 118
# Bladder service
python samplesheet_generator_seqera_tw.py \
-seq aviti \
-i samples.txt \
-rn 20250808 \
-panel GALEAS-Bladder-Release-250701 \
-pipeline GALEAS-Bladder-Release-250701 \
-launch y \
-outfolder AVITI_Bladder_Run \
-ot 15015
# Tumor service with AVITI-specific PoN
python samplesheet_generator_seqera_tw.py \
-seq aviti \
-i samples.txt \
-rn 20251216 \
-panel GALEAS-Tumor-Release-250701 \
-pipeline GALEAS-Tumor-Release-250701_AVITI \
-launch y \
-outfolder AVITI_Tumor_Run \
-ot 256# Hereditary service - downloads VCF, BAM, coverage, MultiQC
python download_and_postprocess_s3.py \
-i s3://bucket/hereditary/AVITI_HP_Run1/ \
-s hereditary \
-ot 118 \
--upload
# Bladder service - includes Odoo HCRM integration
python download_and_postprocess_s3.py \
-i s3://nonacus-research-eu-west-2/1821-galeasbladder/AVITI04/ \
-s bladder \
-ot 15015 \
--fetch-from-odoo \
--uploadThe samplesheet generator is the central orchestrator for AVITI workflows:
Key Features:
- AVITI-specific FASTQ detection (supports lane-based naming:
sample_L1_R1.fastq.gz) - Flexible run name matching (searches pattern anywhere in folder name)
- 11-point validation suite for data integrity
- Panel-to-pipeline compatibility checking
- Dry-run mode for safe testing
- Downsampling support for validation runs
AVITI S3 Structure:
s3://ncs-aviti/AV252104/<run_folder>/Samples/<sample_id>/
├── <sample_id>_L1_R1.fastq.gz
├── <sample_id>_L1_R2.fastq.gz
├── <sample_id>_L2_R1.fastq.gz
└── <sample_id>_L2_R2.fastq.gz
Validation Includes:
- FASTQ file existence in S3
- Proper R1/R2 pairing
- Duplicate sample detection
- Run name pattern matching
- Panel BED file verification
Automated result retrieval with service-specific handling:
Hereditary Service:
- Downloads: VCF, BAM, coverage files, MultiQC reports
- SharePoint path:
1.2.16 AVITI Sequencing runs/[run_folder]/ - Sample ID remapping (removes
_int_AVITI,_rpt50,_ds10Msuffixes)
Bladder Service:
- Downloads: QCI reports (PDFs)
- SharePoint path:
2.2.10 AVITI Sequencing runs/[Account]/[Result]/[SampleID]/ - Odoo HCRM integration for sample metadata
- Automatic account mapping
Features:
- Presigned S3 URLs for large BAM files
- Comprehensive logging (console + file)
- Batch processing with progress tracking
- Automatic directory structure creation
Helper script for copying and renaming AVITI results between S3 locations:
bash copy_and_rename_sample_files.shUsed for creating test subsets or reorganizing results with Odoo ticket correlation.
Compare AVITI and Illumina sequencing results for validation:
cd ~/IGL_apps/aviti-workflow/vcf_comparison
python compare_vcfs_virtual_panel_discordant_report_Aviti.py \
--folder1 /path/to/aviti/vcfs \
--folder2 /path/to/illumina/vcfs \
--output_dir /path/to/output \
--bed_file panel.bedOutput Files:
*_unique_AVITI.vcf- Variants only in AVITI*_unique_Illumina.vcf- Variants only in Illumina*_common.vcf- Shared variants*_discordant_alts_to_be_checked.txt- Discordant ALT calls- Global CSV summary with TP/FP/FN metrics
Use Cases:
- Cross-platform validation
- Sensitivity/specificity analysis
- Virtual panel filtering
- Quality control metrics
See vcf_comparison/README.md for detailed usage.
Build AVITI-specific Panel of Normals (PoN) for CNVkit:
cd ~/IGL_apps/aviti-workflow/cnv_analysis
bash cnvkit_pon_aviti.shWorkflow:
- Generate accessible regions (genome - blacklist)
- Create targets and antitargets from BED files
- Compute coverage for each normal BAM
- Aggregate into reference
.cnnfile
Inputs:
- Reference:
hg38.fa - Blacklist:
hg38-blacklist.v2.bed - RefFlat:
refFlat.hg38.txt - Target BED:
Nonacus_GALEAS_Tumor_1911_covered.bed - BAM directory: AVITI tumor normal samples
Output:
pon_reference.cnn- Ready for tumor CNV calling
Docker-based: Uses etal/cnvkit container for reproducibility.
See cnv_analysis/cnvkit_pon_SOP.md for detailed procedure.
# AWS CLI (for S3 interaction)
aws --version
# Seqera Tower CLI v0.10.1 (workflow management)
tw --version
# Docker (for CNVkit)
docker --version
# VCF processing (for comparison workflows)
bcftools --version
bgzip --version
tabix --version
# BAM processing
samtools --versionStandard library dependencies:
subprocess,argparse,csv,os,datetime,json,logging
External (implied via CLI):
boto3(AWS SDK, viaawsCLI)
CNVkit PoN:
- AVITI-specific:
s3://nonacus-development-eu-west-2-reference/GRCh38/v0/cnvkit_pon/aviti/pon_reference.cnn
Panel Files:
s3://nonacus-research-eu-west-2/<project>/panels/
Sequencer Base Path:
"aviti": "s3://ncs-aviti/AV252104"Run Name Patterns:
- Date format:
YYYYMMDD(e.g.,20250724) - Instrument ID:
AVITI01,AVITI02,AVITI04, etc. - Full example:
AV252104_AVITI01_20250724_Eval_Run_1 - Flexible matching: Pattern can appear anywhere in folder name (case-insensitive)
Pipeline-Specific:
- Tumor pipeline:
GALEAS-Tumor-Release-250701_AVITI - Uses AVITI PoN:
pon_reference.cnn - Parameter:
save_realign_bam: false(default for AVITI)
Hereditary:
Base: S. Clinical Services TRUE/1. Hereditary/1.2 - Clinical/
Path: 1.2.16 AVITI Sequencing runs/[run_folder]/
Bladder:
Base: S. Clinical Services TRUE/2. Bladder/2.2 - Clinical/
Path: 2.2.10 AVITI Sequencing runs/[Account]/[Result]/[SampleID]/
Dry-run (no execution):
python samplesheet_generator_seqera_tw.py \
-seq aviti -i samples.txt -rn 20250724 \
-panel GALEAS-Hereditary-Plus-1966 \
-pipeline GALEAS-Hereditary-Plus-1966 \
--dry-runDownsampled validation:
# 5M reads only
python samplesheet_generator_seqera_tw.py \
-seq aviti -i samples.txt -rn 20250724 \
-panel GALEAS-Hereditary-Plus-1966 \
-pipeline GALEAS-Hereditary-Plus-1966 \
-ds y -dsn 5M \
-launch y -outfolder AVITI_5M_Test
# Both original and downsampled
python samplesheet_generator_seqera_tw.py \
-seq aviti -i samples.txt -rn 20250724 \
-panel GALEAS-Hereditary-Plus-1966 \
-pipeline GALEAS-Hereditary-Plus-1966 \
-ds y -sds y -dsn 10M \
-launch y -outfolder AVITI_Full_and_10MSkip S3 validation (faster):
python samplesheet_generator_seqera_tw.py \
-seq aviti -i samples.txt -rn 20250724 \
-panel GALEAS-Hereditary-Plus-1966 \
--skip-s3-checkpython samplesheet_generator_seqera_tw.py \
-sh sample_samplesheets/AVITI01_hereditary.csv \
-pipeline GALEAS-Hereditary-Plus-1966 \
-launch y \
-outfolder AVITI_Rerun# List all panels
python samplesheet_generator_seqera_tw.py --list-panels
# List all pipelines
python samplesheet_generator_seqera_tw.py --list-pipelines# samples.txt contains:
# L1008-C0155-AD1
# L1008-C0155-AD2
# L1008-C0155-AD3
python samplesheet_generator_seqera_tw.py \
-seq aviti \
-i samples.txt \
-rn 20250724 \
-panel GALEAS-Hereditary-Plus-1966 \
-pipeline GALEAS-Hereditary-Plus-1966 \
-launch y \
-outfolder AVITI01_Eval_Run_1 \
-ot 118
# After pipeline completes, download results
python download_and_postprocess_s3.py \
-i s3://dna-nexus-sequencing/service_improvement/AVITI01_Eval_Run_1/ \
-s hereditary \
-ot 118 \
--uploadpython samplesheet_generator_seqera_tw.py \
-seq aviti \
-i bladder_samples.txt \
-rn 20250808 \
-panel GALEAS-Bladder-Release-250701 \
-pipeline GALEAS-Bladder-Release-250701 \
-launch y \
-outfolder AVITI04_Eval_Run_4 \
-ot 15015
# Download with Odoo integration
python download_and_postprocess_s3.py \
-i s3://nonacus-research-eu-west-2/1821-galeasbladder/AVITI04_Eval_Run_4/ \
-s bladder \
-ot 15015 \
--fetch-from-odoo \
--uploadpython samplesheet_generator_seqera_tw.py \
-seq aviti \
-i tumor_samples.txt \
-rn 20251216 \
-panel GALEAS-Tumor-Release-250701 \
-pipeline GALEAS-Tumor-Release-250701_AVITI \
-launch y \
-outfolder AVITI17_Tumor_Eval_Run_1 \
-ot 256cd vcf_comparison
# Compare AVITI02 hereditary results against Illumina baseline
python compare_vcfs_virtual_panel_discordant_report_Aviti.py \
--folder1 /data/aviti02_hereditary_vcfs/ \
--folder2 /data/illumina_nextseq57_vcfs/ \
--output_dir /data/aviti02_comparison/ \
--bed_file /reference/GALEAS_Hereditary_Plus_1966.bed
# Review discordant calls
less /data/aviti02_comparison/*_discordant_alts_to_be_checked.txtThe repository includes comprehensive unit and integration tests for all core workflows.
Install test dependencies:
pip install -r test_requirements.txtThis installs:
- pytest (testing framework)
- pytest-cov (coverage reporting)
- pytest-mock (mocking utilities)
- pytest-xdist (parallel execution)
- pytest-html (HTML reports)
Run all tests:
pytest test_*.py -vRun specific test suite:
# Samplesheet generator (73 tests)
pytest test_samplesheet_generator_seqera_tw.py -v
# Download and postprocess (54 tests)
pytest test_download_and_postprocess_s3.py -vRun only unit tests:
pytest -v -m unitRun with coverage report:
pytest test_*.py --cov=core --cov-report=htmlGenerate HTML test report:
pytest test_*.py --html=test_report.html --self-contained-htmlCurrent test suite coverage:
- 127 total tests
- 126 tests passing (99.2%)
- 1 test skipped (AWS integration test)
- Test execution time: <1 second
Pre-generated HTML test reports in test_data/:
aviti-galeas-launch-unit-test-report-v1.0.0.htmlaviti-post-unit-test-report-v1.0.0.html
Example samplesheets in sample_samplesheets/:
- AVITI01: 10 hereditary samples (Odoo 118)
- AVITI02: 82 hereditary samples (Odoo 14967) with 10-sample test subsets
- AVITI04: 82 bladder samples (Odoo 15015)
- AVITI17: 31 tumor samples (Odoo 256)
Detailed documentation in docs/:
-
README_samplesheet_generator.md
- Complete parameter reference
- Validation rules
- Panel/pipeline shortcuts
- Advanced usage
-
README_download_and_postprocess_s3.md
- Service-specific workflows
- SharePoint integration
- Odoo HCRM usage
- Logging configuration
-
README_TESTING_SAMPLESHEET_GENERATOR.md
- Test suite overview
- Coverage analysis
- Running tests
- Future enhancements
Error: "Could not find unique run folder matching pattern"
Solution:
- Try shorter pattern: Just date (e.g.,
20250724) or instrument ID (e.g.,AVITI01) - Verify folder exists:
aws s3 ls s3://ncs-aviti/AV252104/ - Pattern is case-insensitive and matches anywhere in folder name
Error: "FASTQ files not found in S3 for sample X"
Solutions:
- Check S3 path structure: Should be
Samples/<sample_id>/<sample_id>_L*_R*.fastq.gz - Use
--skip-s3-checkto skip validation (faster, but less safe) - Verify sample ID matches folder name exactly
Warning: "Panel and pipeline names do not match"
Action:
- Review panel and pipeline compatibility
- Confirm intended configuration
- Type "yes" to proceed if correct
Error: "Unable to locate credentials"
Solution:
aws configure
# Enter AWS Access Key ID
# Enter AWS Secret Access Key
# Enter region: eu-west-2Error: "TOWER_ACCESS_TOKEN environment variable not set"
Solution:
export TOWER_ACCESS_TOKEN="your_token_here"
# Add to ~/.bashrc for persistenceError: SharePoint connection timeout or permission denied
Solutions:
- Verify network connectivity to SharePoint
- Check SharePoint credentials in environment
- Ensure proper folder permissions
- Review upload path in script configuration
- v1.9.0 (2026-01-20): Latest samplesheet generator with enhanced AVITI support
- v1.5.9 (2025-12): Added
--nf-config-filefor additional Nextflow configs - v1.5.8 (2025-12): Fixed PoN parameters for AVITI tumor pipeline
Follow the process described in BIX12 SOP.
Requirements:
- Create branches for changes
- Get 1 reviewer approval before merging to
main - Update version history in script headers
- Run test suite before committing
For issues or questions:
- Review documentation in
docs/ - Check sample samplesheets for examples
- Consult test reports for expected behavior
- Contact bioinformatics team for complex issues
Internal use only - Nonacus Ltd.