Skip to content

MakrisJr/GEMS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

44 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GEMS β€” Genomic & Experimental Metabolic Suite

GEMS is an end-to-end platform for fungal metabolic model reconstruction, ML-driven growth-condition optimisation, and geometry-aware optimisation design using polytope sampling. It combines a ModelSEED-based genome-scale model (GEM) pipeline, a trained multi-target regressor, and a convex-geometry analysis layer for four industrial fungal strains.


Table of Contents

  1. Overview
  2. Architecture
  3. Project Structure
  4. Backend
  5. Frontend
  6. GEM Pipeline (src/ + scripts/)
  7. Experimental Analysis β€” Polytope Module
  8. Data
  9. Quick Start
  10. CLI Usage
  11. API Reference
  12. Supported Fungal Strains

Overview

GEMS has three integrated components:

Component Purpose
GEM Pipeline Protein FASTA β†’ draft metabolic model β†’ gapfill β†’ FBA analysis β†’ validation
ML Recommender Historical growth data (online learning) β†’ train Random Forest / XGBoost / LightGBM β†’ recommend optimal media conditions
Experimental / Geometry Aware Fungal GEM + scenario generator β†’ polytope sampling β†’ geometry features β†’ surrogate ML β†’ industrial ranking

All three components are accessible through a single Streamlit UI and a FastAPI backend.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                  Streamlit UI  (frontend_app.py)                   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  GEM Pipeline β”‚   β”‚   ML Recommender     β”‚  β”‚  Experimental β”‚  β”‚
β”‚  β”‚  Tab          β”‚   β”‚   Tab                β”‚  β”‚  Analysis Tab β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ HTTP (REST)          β”‚ Direct Python        β”‚ Direct Python
           β–Ό                      β–Ό                      β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  FastAPI Backend    β”‚   β”‚  ML Backend (backend/) β”‚  β”‚  Experimental/       β”‚
β”‚  (backend/main.py)  β”‚   β”‚  model_trainer.py      β”‚  β”‚  Polytopes/          β”‚
β”‚                     β”‚   β”‚  recommender.py        β”‚  β”‚  dataset_builder.py  β”‚
β”‚  POST /run          β”‚   β”‚  retrainer.py          β”‚  β”‚  train_model.py      β”‚
β”‚  POST /run/custom   β”‚   β”‚  data_ingestion.py     β”‚  β”‚  postprocess_scores.pyβ”‚
β”‚  GET  /health       β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚ subprocess
           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              GEM Pipeline (scripts/ + src/)     β”‚
β”‚  run_mvp_pipeline.py  β†’  analyze_mvp.py         β”‚
β”‚  β†’ validate_mvp.py                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Project Structure

GEMS/
β”œβ”€β”€ frontend_app.py            # Streamlit UI β€” GEM Pipeline + ML Recommender + Experimental tabs
β”œβ”€β”€ requirements.txt           # Python dependencies
β”œβ”€β”€ installation.txt           # Step-by-step setup and pipeline walkthrough
β”œβ”€β”€ USAGE.md                   # Detailed usage examples
β”œβ”€β”€ ARCHITECTURE.md            # In-depth architecture notes
β”‚
β”œβ”€β”€ backend/                   # ML recommender + API orchestration
β”‚   β”œβ”€β”€ main.py                # FastAPI app β€” /run, /run/custom, /health
β”‚   β”œβ”€β”€ pipeline_runner.py     # PipelineRunner: orchestrates MVP pipeline steps
β”‚   β”œβ”€β”€ config.py              # Paths, feature/target columns, model hyperparams
β”‚   β”œβ”€β”€ data_loader.py         # Load / save combined training dataset
β”‚   β”œβ”€β”€ feature_engineering.py # Encoders, scalers, sample weight computation
β”‚   β”œβ”€β”€ model_trainer.py       # Train RF / XGBoost / LightGBM; CV; persistence
β”‚   β”œβ”€β”€ recommender.py         # Generate Exploit + Explore condition recommendations
β”‚   β”œβ”€β”€ retrainer.py           # Adaptive retraining with round tracking
β”‚   β”œβ”€β”€ data_ingestion.py      # Validate and ingest new wet-lab CSV results
β”‚   β”œβ”€β”€ lab_exporter.py        # Export recommendations to Excel lab sheets
β”‚   └── __init__.py
β”‚
β”œβ”€β”€ scripts/                   # CLI entry points for the GEM pipeline
β”‚   β”œβ”€β”€ run_mvp_pipeline.py    # Step 1 β€” build draft model, gapfill, COBRA inspect
β”‚   β”œβ”€β”€ analyze_mvp.py         # Steps 2–4 β€” theoretical / preset / custom analysis
β”‚   β”œβ”€β”€ validate_mvp.py        # Step 5 β€” FBA, dead-ends, FVA, gene essentiality
β”‚   β”œβ”€β”€ build_draft_model.py   # Standalone draft-model builder
β”‚   β”œβ”€β”€ gapfill_and_export_model.py
β”‚   β”œβ”€β”€ inspect_with_cobra.py
β”‚   β”œβ”€β”€ screen_media.py
β”‚   β”œβ”€β”€ diagnose_exchange_space.py
β”‚   β”œβ”€β”€ debug_growth.py
β”‚   β”œβ”€β”€ run_oracle_growth.py
β”‚   β”œβ”€β”€ screen_oracle_medium.py
β”‚   β”œβ”€β”€ benchmark_bio2.py
β”‚   β”œβ”€β”€ inspect_oracle_condition.py
β”‚   β”œβ”€β”€ first_modelseed_step.py
β”‚   β”œβ”€β”€ prepare_input.py
β”‚   └── compare_template_runs.py
β”‚
β”œβ”€β”€ src/                       # Core GEM pipeline library
β”‚   β”œβ”€β”€ paths.py               # Canonical path constants (PROJECT_ROOT, MODELS_DIR, …)
β”‚   β”œβ”€β”€ reconstruction.py      # MSBuilder draft-model construction
β”‚   β”œβ”€β”€ template_loader.py     # Load built-in or local ModelSEED templates
β”‚   β”œβ”€β”€ gapfill.py             # Best-effort minimal gapfilling
β”‚   β”œβ”€β”€ export_model.py        # SBML / JSON model export helpers
β”‚   β”œβ”€β”€ cobra_loader.py        # Load COBRA model from directory
β”‚   β”œβ”€β”€ cobra_inspect.py       # FBA, exchange table, baseline optimization
β”‚   β”œβ”€β”€ cobra_outputs.py       # Save COBRA inspection outputs
β”‚   β”œβ”€β”€ cobra_debug.py         # Debug utilities for COBRA models
β”‚   β”œβ”€β”€ mvp_analysis.py        # Theoretical / preset / custom condition analysis
β”‚   β”œβ”€β”€ mvp_outputs.py         # Save all MVP analysis outputs + plots
β”‚   β”œβ”€β”€ validation.py          # Dead-end analysis, exchange FVA, gene essentiality
β”‚   β”œβ”€β”€ validation_outputs.py  # Save validation dashboard and summary files
β”‚   β”œβ”€β”€ media_screen.py        # First-pass media screening
β”‚   β”œβ”€β”€ media_outputs.py       # Save media screen outputs
β”‚   β”œβ”€β”€ exchange_diagnostics.py
β”‚   β”œβ”€β”€ exchange_diagnostic_outputs.py
β”‚   β”œβ”€β”€ oracle_growth.py       # Oracle growth check
β”‚   β”œβ”€β”€ oracle_medium.py       # Oracle-derived debug media
β”‚   β”œβ”€β”€ oracle_medium_outputs.py
β”‚   β”œβ”€β”€ bio2_benchmark.py      # Benchmark bio2 reaction rate
β”‚   β”œβ”€β”€ bio2_benchmark_outputs.py
β”‚   β”œβ”€β”€ modelseed_step.py      # ModelSEED first-pass step helpers
β”‚   β”œβ”€β”€ input_parser.py        # Detect protein FASTA / genome FASTA / accession input
β”‚   β”œβ”€β”€ model_io.py            # Save model summary text/JSON
β”‚   β”œβ”€β”€ plot_utils.py          # Ranked bar chart plotting helpers
β”‚   β”œβ”€β”€ report_utils.py        # Plain-text report builders
β”‚   β”œβ”€β”€ logging_utils.py       # Configured logger factory
β”‚   └── __init__.py
β”‚
β”œβ”€β”€ Experimental/              # Geometry-aware fermentation optimisation (polytope module)
β”‚   β”œβ”€β”€ README.md              # Experimental module documentation
β”‚   β”œβ”€β”€ A_oryzae_optimized.xml # Aspergillus oryzae GEM (SBML) used for simulations
β”‚   β”œβ”€β”€ scenarios_fungi.json   # Fermentation scenario definitions (nutrients, T, pH, mixing)
β”‚   β”œβ”€β”€ scenarios.json         # Additional scenarios (standard exchange reaction names)
β”‚   β”œβ”€β”€ dataset_builder.py     # Main engine: FBA β†’ FVA β†’ PolyRound β†’ polytope sampling β†’ features
β”‚   β”œβ”€β”€ scenario_generator_adaptive.py  # Adaptive explore/exploit scenario generator
β”‚   β”œβ”€β”€ train_model.py         # Train Random Forest surrogate on dataset.csv
β”‚   β”œβ”€β”€ rank_scenarios.py      # Rank scenarios by predicted overall_rank_score
β”‚   β”œβ”€β”€ postprocess_scores.py  # Compute economic, morphology, meatiness, industrial scores
β”‚   β”œβ”€β”€ rank_scenarios_industrial.py   # Rank by industrial_score
β”‚   β”œβ”€β”€ feature_importance.py  # Feature importance from trained surrogate model
β”‚   β”œβ”€β”€ top_region_summary.py  # Summarise top-performing scenario region (medians, ranges)
β”‚   β”œβ”€β”€ plot_pareto.py         # Pareto plot: growth vs byproduct burden
β”‚   β”œβ”€β”€ plot_industrial_tradeoff.py    # Industrial score vs growth scatter + trend line
β”‚   β”œβ”€β”€ plot_geometry_vs_growth.py     # 3-panel: geometry/byproduct/validation plots
β”‚   β”œβ”€β”€ reactions.py           # Search model reactions by keyword
β”‚   β”œβ”€β”€ test_fungal_model.py   # Verify GEM loading and biomass reaction
β”‚   β”œβ”€β”€ ml_pipeline.py         # ML pipeline utility
β”‚   └── Results A_oryzae/      # Pre-computed results for Aspergillus oryzae
β”‚       β”œβ”€β”€ dataset.csv                            # Raw FBA + geometry dataset
β”‚       β”œβ”€β”€ dataset_postprocessed.csv              # With industrial scores
β”‚       β”œβ”€β”€ model.pkl                              # Trained surrogate model
β”‚       β”œβ”€β”€ feature_importances.csv
β”‚       β”œβ”€β”€ predicted_ranked_scenarios.csv
β”‚       β”œβ”€β”€ predicted_ranked_scenarios_industrial.csv
β”‚       β”œβ”€β”€ top_region_summary.txt
β”‚       β”œβ”€β”€ pareto_growth_vs_byproduct.png
β”‚       └── plot_industrial_tradeoff.png
β”‚
β”œβ”€β”€ polytopes/                 # Mirror of Experimental/ (identical content)
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ synthetic_fungal_growth_dataset.csv   # 2,000-row synthetic training set
β”‚   β”œβ”€β”€ intermediate/          # Combined dataset, encoded features (auto-generated)
β”‚   β”œβ”€β”€ models/                # GEM model output directories + ML model checkpoints
β”‚   └── raw/uploads/           # Uploaded protein FASTA files
β”‚
β”œβ”€β”€ config/
β”‚   └── media_library.yml      # Named media definitions for screening
β”‚
β”œβ”€β”€ ModelSEEDDatabase/         # Local copy of the ModelSEED reference database
β”‚   β”œβ”€β”€ Templates/
β”‚   β”‚   β”œβ”€β”€ Fungi/Fungi.json   # Fungal reconstruction template (local source)
β”‚   β”‚   β”œβ”€β”€ Core/              # Core template
β”‚   β”‚   └── …                  # GramNeg, GramPos, Human, Plant, etc.
β”‚   β”œβ”€β”€ Biochemistry/          # Compounds, reactions, aliases, structures
β”‚   └── Annotations/           # Complexes, Roles
β”‚
└── docs/                      # Pipeline diagrams and template comparison reports

Backend

The backend/ package contains two distinct responsibilities:

1. FastAPI Pipeline API (main.py + pipeline_runner.py)

Endpoint Method Description
/run POST Upload a .faa file; run the 4-step MVP pipeline; return model_id + step status
/run/custom POST Run an optional custom-condition analysis on an existing model
/health GET Liveness check

PipelineRunner (in pipeline_runner.py) orchestrates:

  1. run_mvp_pipeline.py β€” build draft model
  2. analyze_mvp.py --mode theoretical
  3. analyze_mvp.py --mode preset
  4. validate_mvp.py --mode theoretical_upper_bound

Each step is a child subprocess. If a step fails its returncode, the pipeline stops and returns partial results.

2. ML Recommender Backend

Module Responsibility
config.py Feature columns, target names, model hyperparameters, directory paths
data_loader.py Load/save the combined (synthetic + real) training CSV
feature_engineering.py Label-encode categoricals, min-max scale numerics, compute sample weights
model_trainer.py Cross-validate Random Forest / XGBoost / LightGBM; select best; persist with joblib
recommender.py Sample 2,000 candidate conditions; predict all targets; return top-N exploit + explore
retrainer.py Adaptive retraining loop with round tracking (retrain_log.json)
data_ingestion.py Validate lab CSV schema; rename columns; recompute composite score; append to combined dataset
lab_exporter.py Render recommendations into an Excel workbook for the wet lab

Frontend

frontend_app.py is a Streamlit single-page application with three top-level tabs:

🧫 GEM Pipeline Tab

  • Upload & Run β€” upload a .faa file, choose template (Core / Fungal), toggle RAST, click β–Ά Run Pipeline
  • View Results β€” model selector dropdown; six sub-tabs:
    1. Draft Model β€” mvp_summary.json metrics card + mode comparison plot
    2. Theoretical Upper Bound β€” FBA benchmark plot, condition table, JSON summary
    3. Preset Conditions β€” ranked bar chart, conditions table, text summary
    4. Custom Condition β€” run and display a user-defined media condition
    5. Validation β€” dashboard image, FBA status, dead-end metabolites, exchange FVA, gene essentiality
    6. Full Pipeline Files β€” all 12 intermediate file outputs in pipeline order

πŸ€– ML Recommender Tab

  • Train β€” train all 3 model types Γ— 4 targets; display CV RΒ² scores
  • Recommendations β€” select strain, get top-N exploit + explore conditions; download Excel lab sheet
  • Upload & Retrain β€” upload a filled lab results CSV, ingest, retrain with updated data

πŸ”¬ Experimental Analysis Tab (Polytope Module)

Visualises results from the geometry-aware fermentation optimisation pipeline in Experimental/:

  • Results Overview β€” display pre-computed Results A_oryzae/ outputs
  • Scenario Rankings β€” tabular view of predicted_ranked_scenarios.csv and predicted_ranked_scenarios_industrial.csv
  • Pareto Analysis β€” Pareto plot image (growth vs byproduct burden)
  • Industrial Tradeoff β€” industrial score vs growth scatter with trend line
  • Geometry vs Growth β€” 3-panel figure: feasible space log-volume / byproduct pressure / ML validation
  • Feature Importances β€” bar chart of which variables drive the surrogate model score
  • Top Region Summary β€” median and range of the top-performing scenario cluster

GEM Pipeline

The MVP pipeline runs in a fixed order via scripts/run_mvp_pipeline.py:

Protein FASTA (.faa)
        β”‚
        β–Ό
  MSGenome.from_fasta()         β€” load features
        β”‚
        β–Ό
  MSBuilder.build_metabolic_model()   β€” draft reconstruction
        β”‚  (template: Core builtin  OR  Fungi local)
        β–Ό
  gapfill_model_minimally()     β€” best-effort gapfill on bio1
        β”‚
        β–Ό
  save_model_sbml_if_possible() β€” export model.xml (SBML) or model.json
        β”‚
        β–Ό
  load_cobra_model()            β€” load via COBRApy
  run_baseline_optimization()   β€” FBA baseline
  get_exchange_table()          β€” exchange metabolite fluxes
        β”‚
        β–Ό
  save_mvp_summary()            β€” mvp_summary.json / .txt

Analysis steps (run after step 1):

Script Mode Output
analyze_mvp.py theoretical theoretical_upper_bound.{json,txt,png,csv}
analyze_mvp.py preset preset_conditions.{json,csv,txt,png}
analyze_mvp.py custom custom_condition_NAME.{json,txt,png}
validate_mvp.py theoretical_upper_bound validation dashboard, dead-end CSV, FVA CSV, gene essentiality CSV

Templates

Label --template-name --template-source File
Core Template (built-in) template_core builtin modelseedpy built-in
Fungal Template (local) fungi local ModelSEEDDatabase/Templates/Fungi/Fungi.json

Experimental Analysis β€” Polytope Module

Located in Experimental/ (and mirrored in polytopes/), this module implements a geometry-aware, biologically grounded optimisation framework for fermentation design.

Concept

Instead of optimising a single metabolic solution, this system:

  1. Explores the full feasible metabolic flux space (solution polytope) for a fungal GEM
  2. Extracts geometric and biological features from that space
  3. Trains a surrogate ML model that learns how environmental conditions shape performance
  4. Identifies robust and efficient operating regions for fermentation

Pipeline

scenarios_fungi.json          β€” fermentation scenario definitions
        β”‚
        β–Ό
  dataset_builder.py
  β”œβ”€β”€ cobra.io.read_sbml_model(A_oryzae_optimized.xml)
  β”œβ”€β”€ apply_model_specific_medium(scenario)
  β”œβ”€β”€ model.optimize()                  β€” FBA
  β”œβ”€β”€ flux_variability_analysis()       β€” FVA range
  β”œβ”€β”€ polyround_preprocess()            β€” convert SBML β†’ polytope (Ax ≀ b)
  β”œβ”€β”€ PolytopeSampler.sample_from_polytope()  β€” MCMC interior sampling
  β”œβ”€β”€ back_transform()                  β€” recover flux vectors
  └── extract geometry features         β€” log-volume, anisotropy, flux_std
        β”‚
        β–Ό
  dataset.csv                           β€” FBA + geometry features per scenario
        β”‚
        β–Ό
  train_model.py                        β€” Random Forest surrogate on overall_rank_score
        β”‚
        β–Ό
  rank_scenarios.py                     β€” predicted_ranked_scenarios.csv
        β”‚
        β–Ό
  postprocess_scores.py
  β”œβ”€β”€ economic scores (substrate + mixing cost / yield)
  β”œβ”€β”€ morphology score (growth Γ— mixing Γ— pH penalty)
  └── meatiness score (growth + biomass + morphology βˆ’ byproducts)
        β”‚
        β–Ό
  dataset_postprocessed.csv             β€” enhanced with industrial scores
        β”‚
        β–Ό
  rank_scenarios_industrial.py          β€” predicted_ranked_scenarios_industrial.csv

Scripts Reference

Script Description Key Output
dataset_builder.py Main engine β€” FBA + FVA + polytope sampling + feature extraction results/dataset.csv
scenario_generator_adaptive.py Generate uniform explore + local exploit scenarios scenarios.json
train_model.py Train RandomForest surrogate; evaluate RΒ² / MAE results/model.pkl
rank_scenarios.py Apply trained model; rank by predicted score results/predicted_ranked_scenarios.csv
postprocess_scores.py Compute economic, morphology, meatiness, industrial scores results/dataset_postprocessed.csv
rank_scenarios_industrial.py Rank by composite industrial score results/predicted_ranked_scenarios_industrial.csv
feature_importance.py Extract and display feature importances results/feature_importances.csv
top_region_summary.py Summarise top-performing scenario cluster (medians, ranges) results/top_region_summary.txt
plot_pareto.py Pareto view: growth vs total byproducts results/pareto_growth_vs_byproduct.png
plot_industrial_tradeoff.py Industrial score vs growth with trend line results/plot_industrial_tradeoff.png
plot_geometry_vs_growth.py 3-panel: geometry / byproduct pressure / ML validation results/final_3panel_figure.png
test_fungal_model.py Verify GEM loads and biomass reaction exists β€”
reactions.py Search GEM reactions by keyword β€”

Geometry Features Extracted

Feature Description
log_volume Log of polytope volume (sum of log eigenvalues of the flux covariance matrix)
anisotropy_log Log of max eigenvalue / median eigenvalue β€” measures directionality of the flux space
flux_std Mean standard deviation of flux samples β€” measures overall variability
biomass_flux_mean Mean biomass flux across polytope samples
biomass_std Standard deviation of biomass flux
fva_range Mean FVA (min→max) range across all reactions

Industrial Score Components

Component Weight Description
Growth (FBA) 0.25 Raw FBA biomass rate
Biomass flux mean 0.15 Average biomass across polytope samples
Biomass yield 0.15 Growth / glucose_uptake
Economic score 0.20 1 βˆ’ (substrate cost + mixing cost) / yield
Morphology score 0.15 Growth Γ— mixing_fibrousness βˆ’ pH penalty
Meatiness score 0.10 Growth + biomass + morphology βˆ’ byproducts
Byproduct penalty βˆ’0.20 Total byproduct excretion

Running the Experimental Pipeline

cd GEMS/Experimental

# 1. generate the Scenarios
python scenario_generator_adaptive.py

# 2. Build the dataset (requires PolyRound + PolytopeSampler)
python dataset_builder.py

# 3. Train the surrogate model
python train_model.py

# 4. Rank scenarios by overall score
python rank_scenarios.py

# 5. Add industrial scoring layer
python postprocess_scores.py
python rank_scenarios_industrial.py

# 6. Analyse and visualise
python feature_importance.py
python top_region_summary.py
python plot_pareto.py
python plot_industrial_tradeoff.py
python plot_geometry_vs_growth.py

Data

File Description
data/synthetic_fungal_growth_dataset.csv 2,000 synthetic growth experiments across 4 strains; features include carbon source, nitrogen source, pH, temperature, RPM, inoculum size, nutrient concentrations
data/intermediate/combined_dataset.csv Merged synthetic + real uploaded data (auto-generated after ingest)
data/intermediate/features.pkl Fitted encoder/scaler pipeline (auto-generated after training)
data/models/ One directory per GEM model run; one directory per ML training run (run_YYYYMMDD_HHMMSS/)
Experimental/Results A_oryzae/dataset.csv FBA + geometry features for A. oryzae simulated scenarios
Experimental/Results A_oryzae/dataset_postprocessed.csv Enhanced dataset with industrial scores
Experimental/Results A_oryzae/model.pkl Trained Random Forest surrogate for A. oryzae

Quick Start

# 1. Install dependencies
pip install -r GEMS/requirements.txt
pip install modelseedpy cobra fastapi uvicorn

# 2. Start the API server (from the GEMS/ directory)
cd GEMS
uvicorn backend.main:app --reload --port 8000

# 3. Start the Streamlit UI (separate terminal, from GEMS/ directory)
cd GEMS
streamlit run frontend_app.py

Navigate to http://localhost:8501 to access the UI.


CLI Usage

# Build a draft fungal model using the local Fungi template
python GEMS/scripts/run_mvp_pipeline.py \
  --input ncbi_dataset/data/GCA_000182925.2/protein.faa \
  --model-id fungi_test \
  --use-rast \
  --template-name fungi \
  --template-source local

# Run theoretical upper bound analysis
python GEMS/scripts/analyze_mvp.py \
  --model-dir GEMS/data/models/fungi_test \
  --mode theoretical

# Run preset conditions
python GEMS/scripts/analyze_mvp.py \
  --model-dir GEMS/data/models/fungi_test \
  --mode preset

# Run validation
python GEMS/scripts/validate_mvp.py \
  --model-dir GEMS/data/models/fungi_test \
  --mode theoretical_upper_bound \
  --biomass-reaction bio2

API Reference

POST /run

Upload a protein FASTA and run the full 4-step MVP pipeline.

Field Type Default Description
file .faa upload required Protein FASTA file
use_rast bool false Annotate with RAST
template_name string template_core template_core or fungi
template_source string builtin builtin or local

Response: model_id, steps[] (name, returncode, stdout, stderr), all_succeeded

POST /run/custom

Run a single custom-condition analysis on an existing model.

Field Type Default Description
model_id string required Existing model directory name
condition_name string required Output filename stem
preset_seed string rich_debug_medium Starting preset
metabolite_ids string optional Comma-separated metabolite IDs

Supported Fungal Strains

Works with any SBML file and any protein FASTA file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages