Merging New Features by maxiallard · Pull Request #240 · helicalAI/helical

maxiallard · 2025-06-18T11:21:15Z

This pull request introduces several updates to improve functionality, usability, and maintainability across multiple modules in the codebase. The most significant changes include adding support for attention outputs in embedding methods, improving installation instructions, and refining configuration and processing logic.

Enhancements to Embedding Methods:

Added an output_attentions parameter to embedding methods in Geneformer and scGPT models, enabling the return of attention maps for analysis. This includes updates to the get_embs function in geneformer_utils.py and _encode function in scgpt/model_dir/model.py to handle attention outputs. [1] [2] [3] [4]
Updated the TransformerEncoder import in scGPT to use a custom implementation for attention weight support.

Installation and Documentation Improvements:

Improved installation instructions in README.md, including a detailed guide for resolving issues with mamba-ssm installation using .whl files.
Updated the evo_2/README.md file to use HTTPS for cloning the vortex repository instead of SSH.

Configuration and Defaults:

Changed the default embedding mode in scGPTConfig from "cls" to "cell" for better usability. [1] [2]
Removed the precision field from transcriptformer_config.yaml to simplify configuration.

Code Cleanup and Refinements:

Refactored error and warning messages in Geneformer to improve readability and maintain consistency.
Replaced padding="max_length" with padding="longest" in the process_data method of the helix_mrna model for more efficient tokenization.

Miscellaneous:

Added new hash values for embedding-related files in constants/hash_values.py.

These changes collectively enhance the functionality, usability, and maintainability of the codebase, particularly in embedding workflows and installation processes.

* Remove lightning from transcriptformer * Correct docstring * Make test device GPU * Remove device choice from config as flex attention is only GPU compatible * fixup! Remove device choice from config as flex attention is only GPU compatible

Change default emb mode to cell

* Stop helix-mrna from padding to max length and rather pad to max length of inputs * fixup! Stop helix-mrna from padding to max length and rather pad to max length of inputs

Attention maps

mattwoodx and others added 17 commits May 7, 2025 09:36

Transcriptformer lightning removal (#232)

6823de8

* Remove lightning from transcriptformer * Correct docstring * Make test device GPU * Remove device choice from config as flex attention is only GPU compatible * fixup! Remove device choice from config as flex attention is only GPU compatible

Update README.md

39a5e12

Update README.md

e518e6c

Reduce compiled block size in transcriptformer. (#234)

f198280

Change default emb_mode to cell

802a410

Merge pull request #235 from helicalAI/change-default-emb-mode

a1ca262

Change default emb mode to cell

Remove torch.compile in flex attention for transcriptformer. (#236)

e633266

Update Vortex GitHub link in Evo2 README (#237)

e06b8c9

Stop helix-mrna from padding to model max length (#238)

be96082

* Stop helix-mrna from padding to max length and rather pad to max length of inputs * fixup! Stop helix-mrna from padding to max length and rather pad to max length of inputs

added attention_maps for scgpt and geneformer

ad246f7

Merge branch 'main' into attention_maps

604eb04

returning all heads not averaging them for scgpt.

c1fbfaa

cleanup

978c9ee

Try and reduce memory usage of Geneformer

182a467

Fix conversion to numpy

22b2860

Merge pull request #239 from helicalAI/attention_maps

3506dce

Attention maps

Update pyproject.toml

2e55225

bputzeys approved these changes Jun 18, 2025

View reviewed changes

Update pyproject.toml

de62056

bputzeys merged commit ad78f9a into release Jun 18, 2025
14 of 15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging New Features#240

Merging New Features#240
bputzeys merged 18 commits intoreleasefrom
main

maxiallard commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

maxiallard commented Jun 18, 2025

Enhancements to Embedding Methods:

Installation and Documentation Improvements:

Configuration and Defaults:

Code Cleanup and Refinements:

Miscellaneous:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants