Main by maxiallard · Pull Request #258 · helicalAI/helical

maxiallard · 2025-08-06T09:36:31Z

This pull request introduces several important updates focused on integrating new, larger Geneformer models, updating documentation, and improving test coverage for gene output functionality. The changes also include updates to CI workflows and the addition of contributing guidelines. Below are the most significant changes grouped by theme:

Geneformer Model Updates and Integration

Updated all references to Geneformer models throughout the codebase, CI workflows, and tests to use the new larger models (e.g., gf-12L-40M-i2048, gf-12L-38M-i4096, etc.), replacing previous model variants. This includes download_all.py, test parameterizations, and configuration logic. [1] [2] [3] [4] [5] [6] [7] [8] [9]
Added a new test in test_geneformer_model.py to verify that Geneformer can return gene outputs alongside embeddings when requested, improving test coverage for this feature.

Documentation and Community

Added a new CONTRIBUTING.md file outlining guidelines for contributing, support expectations, and PR procedures. Also linked these guidelines in the README.md for better community onboarding. [1] [2]
Updated the README.md and documentation (docs/index.md) to announce the integration of new, larger Geneformer models, highlight the new comparison notebook, and update installation instructions for Python version consistency. [1] [2] [3]
Added the new Geneformer-Series-Comparison.ipynb notebook to the list of example notebooks in both the README.md and documentation, providing users with a direct example of model scaling comparisons. [1] [2]

Continuous Integration and Testing

Updated GitHub Actions workflows (main.yml, release.yml) to use the new Geneformer model names for both execution and fine-tuning steps, ensuring CI remains in sync with the latest model suite. [1] [2]
Removed the Geneformer-Series-Comparison.ipynb notebook from the list of notebooks to be deleted during certain CI steps, ensuring it remains available for users and tests.

scGPT Model Testing

Added tests for scGPT to verify that gene outputs are correctly returned alongside embeddings for both "cell" and "cls" embedding modes, ensuring consistency with Geneformer test coverage.

…ng (#242) * Update typo to include regression as an option for HyenaDNA fine tuning * Upgrade datasets version to fix loading issue * fixup! Upgrade datasets version to fix loading issue

…d be accumulated

All hidden states correct accumulation

`anndata` 0.12 allowed in `pyproject.toml`.

Remove flash attention and its error message as we never used it

* Addition of new geneformer v2 models Called them v3 because we already had new models added a while ago which we named v2. Hashes currently for one of the models * Add notebook comparing Geneformer scalings * Rename older geneformer models to have model size instead of number of pretraining cells * Update model names to be number of parameters to match newer model Keep names the same for downloads to ensure older installs of the package still work * Update readme * Fix test download names * fixup! Fix test download names * fixup! fixup! Fix test download names * Models updated on azure so no need for download names * Make test naming up to date with model name changes * fixup! Make test naming up to date with model name changes * Do not run the comparison notebook as it is too large * Remove accelerator from Geneformer config * UPdate readme and docs * Add what's new section to main readme

* Make output attentions for Geneformer a list to account for different sizes Add output_genes option to scGPT and Geneformer which when set to true for cls and cell it will return a tuple which contains the list of genes from the context window for each embedding * Add testing for returning genes * fixup! Add testing for returning genes * Remove shape from list print * Update model names to match changes in main for new tests

* Add CZI fine-tuned Geneformer * Add to CZI Geneformer to model card * fixup! Add CZI fine-tuned Geneformer

fix for issue 243

…don't affect downloads

updated downloader

mattwoodx and others added 22 commits July 7, 2025 17:17

Update typo to include regression as an option for HyenaDNA fine tuni…

0a3d870

…ng (#242) * Update typo to include regression as an option for HyenaDNA fine tuning * Upgrade datasets version to fix loading issue * fixup! Upgrade datasets version to fix loading issue

Update the helix model files in the case that all hidden states shoul…

1815c7c

…d be accumulated

Merge pull request #245 from helicalAI/emb-layer-helix

66b9a02

All hidden states correct accumulation

updated downloader

1444916

anndata 0.12 allowed in pyproject.toml.

3a58ea4

Merge pull request #248 from helicalAI/anndata-up

b42e814

`anndata` 0.12 allowed in `pyproject.toml`.

Remove flash attention and its error message as we never used it

3895f82

Merge pull request #250 from helicalAI/remove-flash-attn-for-scgpt

e12eb30

Remove flash attention and its error message as we never used it

Update pyproject.toml

4962d85

fix for issue 243

55e055a

fix for issue 243

8b95bf1

CZI Fine Tuned Geneformer Integration (#257)

ed48d8e

* Add CZI fine-tuned Geneformer * Add to CZI Geneformer to model card * fixup! Add CZI fine-tuned Geneformer

removed commented out code

6c0fd21

Add contributions section in readme (#256)

4cbecfc

Update python to 3.11.13 dependency

26f915a

Update helical/models/uce/uce_dataset.py

c2371f8

Merge pull request #255 from izumiando/release

ddd3aec

fix for issue 243

removed hash values

c3bcb15

Pull using link download instead of boto3 to ensure aws environments …

8da312d

…don't affect downloads

Merge pull request #246 from helicalAI/new_downloader

ada9266

updated downloader

maxiallard requested review from bputzeys and mattwoodx and removed request for bputzeys August 6, 2025 09:36

mattwoodx approved these changes Aug 6, 2025

View reviewed changes

Remove GF comparison notebook on release action

a02138f

bputzeys merged commit 250e225 into release Aug 8, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Main#258

Main#258
bputzeys merged 23 commits intoreleasefrom
main

maxiallard commented Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

maxiallard commented Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants