Conversation
…ng (#242) * Update typo to include regression as an option for HyenaDNA fine tuning * Upgrade datasets version to fix loading issue * fixup! Upgrade datasets version to fix loading issue
All hidden states correct accumulation
`anndata` 0.12 allowed in `pyproject.toml`.
Remove flash attention and its error message as we never used it
* Addition of new geneformer v2 models Called them v3 because we already had new models added a while ago which we named v2. Hashes currently for one of the models * Add notebook comparing Geneformer scalings * Rename older geneformer models to have model size instead of number of pretraining cells * Update model names to be number of parameters to match newer model Keep names the same for downloads to ensure older installs of the package still work * Update readme * Fix test download names * fixup! Fix test download names * fixup! fixup! Fix test download names * Models updated on azure so no need for download names * Make test naming up to date with model name changes * fixup! Make test naming up to date with model name changes * Do not run the comparison notebook as it is too large * Remove accelerator from Geneformer config * UPdate readme and docs * Add what's new section to main readme
* Make output attentions for Geneformer a list to account for different sizes Add output_genes option to scGPT and Geneformer which when set to true for cls and cell it will return a tuple which contains the list of genes from the context window for each embedding * Add testing for returning genes * fixup! Add testing for returning genes * Remove shape from list print * Update model names to match changes in main for new tests
* Add CZI fine-tuned Geneformer * Add to CZI Geneformer to model card * fixup! Add CZI fine-tuned Geneformer
fix for issue 243
…don't affect downloads
updated downloader
mattwoodx
approved these changes
Aug 6, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces several important updates focused on integrating new, larger Geneformer models, updating documentation, and improving test coverage for gene output functionality. The changes also include updates to CI workflows and the addition of contributing guidelines. Below are the most significant changes grouped by theme:
Geneformer Model Updates and Integration
Updated all references to Geneformer models throughout the codebase, CI workflows, and tests to use the new larger models (e.g.,
gf-12L-40M-i2048,gf-12L-38M-i4096, etc.), replacing previous model variants. This includesdownload_all.py, test parameterizations, and configuration logic. [1] [2] [3] [4] [5] [6] [7] [8] [9]Added a new test in
test_geneformer_model.pyto verify that Geneformer can return gene outputs alongside embeddings when requested, improving test coverage for this feature.Documentation and Community
Added a new
CONTRIBUTING.mdfile outlining guidelines for contributing, support expectations, and PR procedures. Also linked these guidelines in theREADME.mdfor better community onboarding. [1] [2]Updated the
README.mdand documentation (docs/index.md) to announce the integration of new, larger Geneformer models, highlight the new comparison notebook, and update installation instructions for Python version consistency. [1] [2] [3]Added the new
Geneformer-Series-Comparison.ipynbnotebook to the list of example notebooks in both theREADME.mdand documentation, providing users with a direct example of model scaling comparisons. [1] [2]Continuous Integration and Testing
Updated GitHub Actions workflows (
main.yml,release.yml) to use the new Geneformer model names for both execution and fine-tuning steps, ensuring CI remains in sync with the latest model suite. [1] [2]Removed the
Geneformer-Series-Comparison.ipynbnotebook from the list of notebooks to be deleted during certain CI steps, ensuring it remains available for users and tests.scGPT Model Testing