Skip to content

Apps designed for generating insights into user data in KBase using KBase Datalake

License

Notifications You must be signed in to change notification settings

kbaseapps/KBDatalakeApps

Repository files navigation

KBDatalakeApps

This is a KBase module generated by the KBase Software Development Kit (SDK).

You will need to have the SDK installed to use this module. Learn more about the SDK and how to use it.

You can also learn more about the apps implemented in this module from its catalog page or its spec file.

Overview

KBDatalakeApps provides a comprehensive genome analysis pipeline that integrates KBase data with the Biology Experimental Reference Data Lake (BERDL). Given one or more genomes, the module runs a multi-stage pipeline that performs annotation, pangenome analysis, metabolic model reconstruction, and phenotype simulation, producing a unified SQLite database and interactive HTML viewer as output.

Apps

build_genome_datalake_tables

Takes a list of Genome or GenomeSet workspace references and runs the full BERDL pipeline:

  1. Genome Pipeline - Exports genomes, runs ANI analysis via skani, and assigns genomes to BERDL pangenome clades
  2. Annotation Pipeline - Runs RAST, KOfam, Bakta, and PSORTb annotations in parallel
  3. Pangenome Pipeline - Clusters proteins with MMseqs2 and generates pangenome member annotations
  4. Modeling Pipeline - Reconstructs metabolic models with ModelSEEDpy, runs phenotype simulations and gapfilling
  5. Table Generation - Assembles all results into a SQLite database with genome, feature, and annotation tables

Parameters:

  • input_refs - Workspace references to Genome or GenomeSet objects
  • suffix - Optional suffix for generated table names
  • save_models - Whether to save generated metabolic models to the workspace
  • Skip/export flags for controlling which pipeline stages run and what data products are exported

Output: An interactive HTML table viewer with the assembled database, plus downloadable data products.

Dependencies

This module relies on several external tools and libraries:

  • ModelSEEDpy - Metabolic model reconstruction
  • cobrakbase - COBRApy/KBase integration
  • KBUtilLib - Shared KBase utilities (workspace ops, genome parsing, model utilities)
  • ModelSEED Database - Biochemistry reference data
  • skani - Fast ANI calculation
  • MMseqs2 - Protein sequence clustering
  • KBase annotation services: RAST_SDK, kb_bakta, kb_psortb, kb_kofam

Setup and test

Add your KBase developer token to test_local/test.cfg and run the following:

$ make
$ kb-sdk test

After making any additional changes to this repo, run kb-sdk test again to verify that everything still works.

Installation from another module

To use this code in another SDK module, call kb-sdk install KBDatalakeApps in the other module's root directory.

Help

You may find the answers to your questions in our FAQ or Troubleshooting Guide.

About

Apps designed for generating insights into user data in KBase using KBase Datalake

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5