Gaia Benchmarking

Benchmarking scripts for Gaia

Installation

git clone https://github.com/TattaBio/gaia-benchmark.git
cd gaia-benchmark
pip install -r requirements.txt

Benchmark Directories

`benchmark_sequence/`

Sequence similarity search benchmark on the OG_prot90 dataset. Uses BLASTp results as ground truth to evaluate recall@k performance.

`benchmark_context/`

Genomic context retrieval sensitivity benchmark. Recall is calculated based on the retrieval of genes with similar genomic context (proteins in context matching at >50% sequence identity and >50% sequence coverage) within the top K retrievals. Uses the OG_prot90 dataset.

`benchmark_structure/`

Protein structure similarity search benchmark. Evaluates retrieval of proteins with similar structures using the SCOPe-40 test dataset.

`benchmark_bac_arch/`

Benchmark for remote homology matching between functional homologs of bacterial (E. coli K-12) and archaeal (S. acidocaldarius DSM 639) proteins. Uses the bac_arch_bigene dataset from DGEB

Dataset preparation

`prepare_data/`

Scripts for sequence embedding and setting up vector search with Qdrant.

Citation

@article{jha2024gaia,
  title={Gaia: An AI-enabled Genomic Context-Aware Platform for Protein Sequence Annotation},
  author={Jha, Nishant and Kravitz, Joshua and West-Roberts, Jacob and Camargo, Antonio and Roux, Simon and Cornman, Andre and Hwang, Yunha},
  journal={bioRxiv},
  year={2024},
  publisher={Cold Spring Harbor Laboratory},
  doi={10.1101/2024.11.19.624387}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gaia Benchmarking

Installation

Benchmark Directories

`benchmark_sequence/`

`benchmark_context/`

`benchmark_structure/`

`benchmark_bac_arch/`

Dataset preparation

`prepare_data/`

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
benchmark_bac_arch		benchmark_bac_arch
benchmark_context		benchmark_context
benchmark_sequence		benchmark_sequence
benchmark_structure		benchmark_structure
prepare_data		prepare_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Gaia Benchmarking

Installation

Benchmark Directories

benchmark_sequence/

benchmark_context/

benchmark_structure/

benchmark_bac_arch/

Dataset preparation

prepare_data/

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`benchmark_sequence/`

`benchmark_context/`

`benchmark_structure/`

`benchmark_bac_arch/`

`prepare_data/`

Packages