This is a KBase module generated by the KBase Software Development Kit (SDK).
You will need to have the SDK installed to use this module. Learn more about the SDK and how to use it.
You can also learn more about the apps implemented in this module from its catalog page or its spec file.
KBDatalakeApps provides a comprehensive genome analysis pipeline that integrates KBase data with the Biology Experimental Reference Data Lake (BERDL). Given one or more genomes, the module runs a multi-stage pipeline that performs annotation, pangenome analysis, metabolic model reconstruction, and phenotype simulation, producing a unified SQLite database and interactive HTML viewer as output.
Takes a list of Genome or GenomeSet workspace references and runs the full BERDL pipeline:
- Genome Pipeline - Exports genomes, runs ANI analysis via skani, and assigns genomes to BERDL pangenome clades
- Annotation Pipeline - Runs RAST, KOfam, Bakta, and PSORTb annotations in parallel
- Pangenome Pipeline - Clusters proteins with MMseqs2 and generates pangenome member annotations
- Modeling Pipeline - Reconstructs metabolic models with ModelSEEDpy, runs phenotype simulations and gapfilling
- Table Generation - Assembles all results into a SQLite database with genome, feature, and annotation tables
Parameters:
input_refs- Workspace references to Genome or GenomeSet objectssuffix- Optional suffix for generated table namessave_models- Whether to save generated metabolic models to the workspace- Skip/export flags for controlling which pipeline stages run and what data products are exported
Output: An interactive HTML table viewer with the assembled database, plus downloadable data products.
This module relies on several external tools and libraries:
- ModelSEEDpy - Metabolic model reconstruction
- cobrakbase - COBRApy/KBase integration
- KBUtilLib - Shared KBase utilities (workspace ops, genome parsing, model utilities)
- ModelSEED Database - Biochemistry reference data
- skani - Fast ANI calculation
- MMseqs2 - Protein sequence clustering
- KBase annotation services: RAST_SDK, kb_bakta, kb_psortb, kb_kofam
Add your KBase developer token to test_local/test.cfg and run the following:
$ make
$ kb-sdk testAfter making any additional changes to this repo, run kb-sdk test again to verify that everything still works.
To use this code in another SDK module, call kb-sdk install KBDatalakeApps in the other module's root directory.
You may find the answers to your questions in our FAQ or Troubleshooting Guide.