Hi @scopello, thanks for the really cool work on this mixed-modality gLM. It has got us very curious to see if it can be useful for our research context.
We are preparing a new release of GlobDB https://globdb.org/home with >300000 species representative microbial genomes. We are thinking it should be feasible to use this dataset to re-train your gLM2. We see some advantages of at least trying this not least because GlobDB collects a lot of microbial diversity from different sources. Hopefully it might also interest you as part of testing the wider applicability or limitations of the model.
From the clear descriptions you gave in the data pre-processing section of the paper we think we can take care of the multi-modal data setup. However, after looking through your repo I am not clear how one would go about the training process even though a lot of the functions and classes are there. Is it possible for you to also provide some code or scripts that you used when initially training the gLM2 please?
Thank you for your time.
Hi @scopello, thanks for the really cool work on this mixed-modality gLM. It has got us very curious to see if it can be useful for our research context.
We are preparing a new release of GlobDB https://globdb.org/home with >300000 species representative microbial genomes. We are thinking it should be feasible to use this dataset to re-train your gLM2. We see some advantages of at least trying this not least because GlobDB collects a lot of microbial diversity from different sources. Hopefully it might also interest you as part of testing the wider applicability or limitations of the model.
From the clear descriptions you gave in the data pre-processing section of the paper we think we can take care of the multi-modal data setup. However, after looking through your repo I am not clear how one would go about the training process even though a lot of the functions and classes are there. Is it possible for you to also provide some code or scripts that you used when initially training the gLM2 please?
Thank you for your time.