Skip to content

Main#258

Merged
bputzeys merged 23 commits intoreleasefrom
main
Aug 8, 2025
Merged

Main#258
bputzeys merged 23 commits intoreleasefrom
main

Conversation

@maxiallard
Copy link
Contributor

This pull request introduces several important updates focused on integrating new, larger Geneformer models, updating documentation, and improving test coverage for gene output functionality. The changes also include updates to CI workflows and the addition of contributing guidelines. Below are the most significant changes grouped by theme:

Geneformer Model Updates and Integration

  • Updated all references to Geneformer models throughout the codebase, CI workflows, and tests to use the new larger models (e.g., gf-12L-40M-i2048, gf-12L-38M-i4096, etc.), replacing previous model variants. This includes download_all.py, test parameterizations, and configuration logic. [1] [2] [3] [4] [5] [6] [7] [8] [9]

  • Added a new test in test_geneformer_model.py to verify that Geneformer can return gene outputs alongside embeddings when requested, improving test coverage for this feature.

Documentation and Community

  • Added a new CONTRIBUTING.md file outlining guidelines for contributing, support expectations, and PR procedures. Also linked these guidelines in the README.md for better community onboarding. [1] [2]

  • Updated the README.md and documentation (docs/index.md) to announce the integration of new, larger Geneformer models, highlight the new comparison notebook, and update installation instructions for Python version consistency. [1] [2] [3]

  • Added the new Geneformer-Series-Comparison.ipynb notebook to the list of example notebooks in both the README.md and documentation, providing users with a direct example of model scaling comparisons. [1] [2]

Continuous Integration and Testing

  • Updated GitHub Actions workflows (main.yml, release.yml) to use the new Geneformer model names for both execution and fine-tuning steps, ensuring CI remains in sync with the latest model suite. [1] [2]

  • Removed the Geneformer-Series-Comparison.ipynb notebook from the list of notebooks to be deleted during certain CI steps, ensuring it remains available for users and tests.

scGPT Model Testing

  • Added tests for scGPT to verify that gene outputs are correctly returned alongside embeddings for both "cell" and "cls" embedding modes, ensuring consistency with Geneformer test coverage.

mattwoodx and others added 22 commits July 7, 2025 17:17
…ng (#242)

* Update typo to include regression as an option for HyenaDNA fine tuning

* Upgrade datasets version to fix loading issue

* fixup! Upgrade datasets version to fix loading issue
All hidden states correct accumulation
`anndata` 0.12 allowed in `pyproject.toml`.
Remove flash attention and its error message as we never used it
* Addition of new geneformer v2 models
Called them v3 because we already had new models added a while ago which we named v2. Hashes currently for one of the models

* Add notebook comparing Geneformer scalings

* Rename older geneformer models to have model size instead of number of pretraining cells

* Update model names to be number of parameters to match newer model
Keep names the same for downloads to ensure older installs of the package still work

* Update readme

* Fix test download names

* fixup! Fix test download names

* fixup! fixup! Fix test download names

* Models updated on azure so no need for download names

* Make test naming up to date with model name changes

* fixup! Make test naming up to date with model name changes

* Do not run the comparison notebook as it is too large

* Remove accelerator from Geneformer config

* UPdate readme and docs

* Add what's new section to main readme
* Make output attentions for Geneformer a list to account for different sizes

Add output_genes option to scGPT and Geneformer which when set to true for cls and cell it will return a tuple which contains the list of genes from the context window for each embedding

* Add testing for returning genes

* fixup! Add testing for returning genes

* Remove shape from list print

* Update model names to match changes in main for new tests
* Add CZI fine-tuned Geneformer

* Add to CZI Geneformer to model card

* fixup! Add CZI fine-tuned Geneformer
@maxiallard maxiallard requested review from bputzeys and mattwoodx and removed request for bputzeys August 6, 2025 09:36
@bputzeys bputzeys merged commit 250e225 into release Aug 8, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants