Add configuration system, documentation, VAMB compatibility layer, and test suite#9
Draft
Add configuration system, documentation, VAMB compatibility layer, and test suite#9
Conversation
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Improve pipeline documentation and configuration setup
Add configuration system, documentation, and VAMB compatibility layer
Jan 28, 2026
Owner
|
@copilot can you design a test suite for this code? |
Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Contributor
Author
I've designed and implemented a comprehensive test suite for the pipeline improvements. Commit: 3e9201b Test Suite Overview61 test cases across 3 test suites, all passing ✅ 1. Bash Helper Tests (
|
Copilot
AI
changed the title
Add configuration system, documentation, and VAMB compatibility layer
Add configuration system, documentation, VAMB compatibility layer, and test suite
Jan 28, 2026
Owner
|
@copilot update this branch to match main |
… copilot/improve-pipeline-documentation Co-authored-by: linsalrob <836231+linsalrob@users.noreply.github.com>
Contributor
Author
Branch updated to match main. Commit: a86bed5 Merged 29 commits from main including:
All 61 tests still passing ✅ after merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pipeline Improvements Complete ✅
This PR implements comprehensive improvements to the atavide_lite pipeline to make it easier to run, debug, and reproduce across HPC systems, without redesigning the architecture.
Phase 1: Documentation + Configuration ✅
config/paths.env.example- Comprehensive configuration templateconfig/samples.tsv.example- Sample sheet examplesdocs/directory_contract.md- Input/output docs for each stepdocs/known_good_versions.md- Software version trackingdocs/compat.md- Compatibility issue documentationdocs/dev_notes.md- Developer guidance and code standardsREADME.md- Quick start guide and documentation linksPhase 2: Shared Helper Library ✅
lib/common.sh- Reusable Bash helper functionslib/README.md- Usage documentation and migration guide.gitignore- Allow lib/ directorypawsey_shortread/fastp_enhanced.slurm- Example using helpersPhase 3: VAMB Compatibility ✅
bin/vamb_create_fasta_clusters.py- Canonical, version-independent scriptbin/vamb_create_fasta.py- Add warnings and version detectiondocs/compat.mdPhase 4: Quality Gates ✅
docs/dev_notes.mdPhase 5: Test Suite ✅
tests/test_common.sh- Test all lib/common.sh functions (25 tests)tests/test_vamb_create_fasta_clusters.py- Test VAMB script (17 tests)tests/test_config_files.sh- Validate configuration files (19 tests)tests/run_tests.sh- Main test runnertests/README.md- Test suite documentation.github/workflows/test.yml- GitHub Actions CIPhase 6: Sync with Main ✅
Summary ✅
IMPROVEMENTS_SUMMARY.md- Complete implementation overviewTest Suite Coverage
Total: 61 test cases across 3 test suites
Bash Helper Functions (25 tests)
VAMB Script (17 tests)
Configuration Files (19 tests)
All tests passing ✅ (verified after merge with main)
Key Benefits
Files Created (19 new files + 3 modified)
See
IMPROVEMENTS_SUMMARY.mdandtests/README.mdfor complete details.Original prompt
Copilot Instructions — atavide_lite
You are GitHub Copilot working in the
atavide_literepository. Your goal is to make the pipeline easier to run, easier to debug, and more reproducible across HPC systems, without converting it to Snakemake and without large architectural rewrites.Guiding principles
Phase 1 — Documentation + “front door” config (highest priority)
A) Add a single canonical configuration format
Create a new directory:
config/paths.env.example— shellKEY=VALUEentries that can besource’dsamples.tsv.example— example samplesheet for paired and single-end runsRequirements for
paths.env.example:Requirements for
samples.tsv.example:B) Add “directory contract” documentation
Create
docs/directory_contract.mddocumenting, for each step:Keep it concise but explicit.
C) Add “known-good versions” documentation
Create
docs/known_good_versions.mdwith:Include a short note:
D) Update README to point to the above
Update
README.md:docs/directory_contract.mddocs/known_good_versions.mdPhase 2 — Standardize script interfaces (medium priority)
A) Add a shared Bash helper:
lib/common.shCreate
lib/common.shand ensure cluster scripts can source it.It must provide:
die()— print error to stderr and exit non-zerolog()— timestamped loggingrequire_cmd()— assert a command exists in PATHrequire_file()— assert a file exists and is readablerequire_dir()— assert a directory exists (or create it if appropriate)check_nonempty()— assert output exists and size > 0load_config()— sourceconfig/paths.envif present, else error with guidanceDo not introduce external dependencies beyond POSIX/Bash.
B) Add consistent safe Bash practices in scripts
Where appropriate:
set -euo pipefailtrap '...' ERR)find ... -print0+while IFS= read -r -d ''patterns for filenamesDo not rewrite every script at once; start with 1–2 representative ones.
Phase 3 — Fix/contain VAMB script drift (high priority because it breaks runs)
A) Establish a single canonical VAMB helper script
Identify the two similar scripts (e.g.,
vamb_create_fasta.pyandvamb_create_fasta_clusters.py).Actions:
B) Add a compatibility layer for VAMB version differences
In the canonical script:
import vamb; vamb.__version__(preferred if available)python -c "import vamb; print(vamb.__version__)"in wrappersminsizevs removed/renamed args)Add a short doc:
docs/compat.mdcapturi...✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.