Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
f44f38a
first test
maxiallard Nov 13, 2025
49526e8
package integration for tahoe1x
maxiallard Nov 13, 2025
6d15af9
adding run file
maxiallard Nov 13, 2025
d53577e
code reduction
maxiallard Nov 13, 2025
b5bbcb2
cleaning up separation for dataloader
maxiallard Nov 13, 2025
1afa425
added attn outputs
maxiallard Nov 13, 2025
a0adcb5
tested attention maps wiith torch
maxiallard Nov 13, 2025
917add0
added readme info
maxiallard Nov 13, 2025
8943c06
changed versions in pyproject.toml
maxiallard Nov 13, 2025
7f85335
fixing flash_attn
maxiallard Nov 13, 2025
7585827
removing more files
maxiallard Nov 13, 2025
29f05a4
adding tests
maxiallard Nov 13, 2025
cc4c54d
new testiing instal"
maxiallard Nov 13, 2025
863b05e
new testiing instal"
maxiallard Nov 13, 2025
8b61eca
new testiing instal"
maxiallard Nov 13, 2025
79e209e
new testiing instal"
maxiallard Nov 13, 2025
27254ad
fixing versions for CI
maxiallard Nov 13, 2025
063f6eb
fixing versions for CI
maxiallard Nov 13, 2025
336a0fe
fixing versions for CI
maxiallard Nov 13, 2025
45a3372
uninstalling torch before CI"
maxiallard Nov 14, 2025
8d39d06
uninstalling torch before CI"
maxiallard Nov 14, 2025
da2d050
uninstalling torch before CI"
maxiallard Nov 14, 2025
31a1755
installing wheels directly
maxiallard Nov 14, 2025
ab4f258
fixed tests
maxiallard Nov 14, 2025
2968598
checking versions
maxiallard Nov 14, 2025
1faf52a
adjsusted sckit-misc version
maxiallard Nov 14, 2025
1760f53
removed sqlite
maxiallard Nov 18, 2025
9a5635d
merged main
maxiallard Nov 18, 2025
e77910d
updated workflow
maxiallard Nov 18, 2025
cf3e8cd
added torchvision and installing tahoe reqs
maxiallard Nov 18, 2025
9fe6a83
Merge branch 'main' into tahoe1x
maxiallard Nov 21, 2025
df65b2f
added notebook
maxiallard Nov 21, 2025
0c82a47
adding tahoe to readme
maxiallard Nov 21, 2025
fe3e979
returning gene embeddings
maxiallard Nov 21, 2025
12da332
returning gene embeddings
maxiallard Nov 21, 2025
ed59d33
returning gene embeddings
maxiallard Nov 21, 2025
008b73d
fixing test
maxiallard Nov 21, 2025
f2787c6
decoder added
maxiallard Nov 21, 2025
d94f4cf
decoder added
maxiallard Nov 21, 2025
14025e9
taking out unnecessary code
maxiallard Nov 21, 2025
7c339e5
removed s3 download
maxiallard Nov 27, 2025
07601ae
changing logger name
maxiallard Nov 27, 2025
96bd9a1
updated testing tahoe
maxiallard Nov 27, 2025
2968407
fixing logger
maxiallard Nov 27, 2025
8e2e1bf
tahoe
raschedh Dec 2, 2025
2d3f7a3
tahoe docs
raschedh Dec 2, 2025
10ac0f1
updated imports
raschedh Dec 2, 2025
4661871
docs
raschedh Dec 2, 2025
cce2a8e
Merge branch 'main' into tahoe1x
raschedh Dec 2, 2025
9f6f212
added minimal llm foundry
raschedh Dec 3, 2025
0a84779
coverage file
raschedh Dec 3, 2025
7ad158c
Replace notebooks in docs with symlinks
bputzeys Dec 8, 2025
33be088
Add license text to each mosaicml files
bputzeys Dec 8, 2025
66ab960
Merge pull request #290 from helicalAI/tahoe1x
maxiallard Dec 9, 2025
7ae8033
Update pyproject.toml
bputzeys Dec 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[run]
omit =
*/minimal_llm_foundry/*
8 changes: 6 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ jobs:

- name: Install dependencies
run: |
pip install -r requirements-dev.txt
pip install .[mamba-ssm]
pip install -r requirements-dev.txt

# First download before tests as they make use of the downloaded files
- name: Download all files
Expand Down Expand Up @@ -133,6 +133,10 @@ jobs:
run: |
python examples/run_models/run_c2s.py

- name: Execute Tahoe
run: |
python examples/run_models/run_tahoe.py ++device="cuda"

notebooks:
needs: tests
runs-on: self-hosted
Expand All @@ -150,7 +154,7 @@ jobs:
# because jobs may not be run in the same order, we need to install the dependencies again
- name: Install helical
run: |
pip install .[mamba-ssm]
pip install --no-cache-dir .[mamba-ssm]

- name: Reduce datasets to speedup checks
run: |
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,10 @@ jobs:
run: |
python examples/run_models/run_c2s.py

- name: Execute Tahoe
run: |
python examples/run_models/run_tahoe.py ++device="cuda"

notebooks:
needs: tests
runs-on: self-hosted
Expand Down
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ Let’s build the most exciting AI-for-Bio community together!

## What's new?

### Tahoe-x1
We have integrated the Tahoe-x1 foundation model for single-cell RNA-seq data. This transformer-based model can extract both cell and gene embeddings from raw count data and supports attention weight extraction for interpretability. Try it out with our [comprehensive tutorial notebook](./examples/notebooks/Tahoe-x1-Tutorial.ipynb)!

### Cell2Sentence-Scale
We have integrated the new Cell2Sentence-Scale models which use cell sentences as input and are based on the Gemma language model architecture (2B and 27B models available in quantised versions too). You can use this model for embeddings and perturbation prediction. Follow our notebook tutorial [here](./examples/notebooks/Cell2Sen-Tutorial.ipynb).

Expand Down Expand Up @@ -67,6 +70,12 @@ To install the latest pip release of our Helical package, you can run the comman
pip install helical
```

***Note***
Sometimes Torch is not installed as the CUDA compiled version (e.g. on different architectures) which is why you need to manually install Helical with GPU support, run the command below (or install pytorch with cuda first and then install helical):
```
pip install helical --extra-index-url https://download.pytorch.org/whl/cuXXX (replace XXX with your cuda version, e.g. 128 for cuda 12.8)
```

To install the latest Helical package, you can run the command below:
```
pip install --upgrade git+https://github.com/helicalAI/helical.git
Expand All @@ -78,6 +87,15 @@ git clone https://github.com/helicalAI/helical.git
pip install .
```


###Flash Attention Support
To enable Flash Attention (required by some models), run the command below:
```
pip install flash-attn --no-build-isolation
```
**Important** Make sure that your Pytorch CUDA Version matches your system CUDA version, especially when using flash-attn.

###Mamba-SSM Model Installation
[Optional] To install mamba-ssm and causal-conv1d use the command below:
```
pip install helical[mamba-ssm]
Expand All @@ -86,6 +104,14 @@ or in case you're installing from the Helical repo cloned locally:
```
pip install .[mamba-ssm]
```
###Evo2 Model Installation
To install Evo2 Specifically, follow the instructions in the [evo-2 model card](helical/models/evo_2/README.md).

### Tahoe-X1 Model Installation
To install Tahoe-X1 do the following after installing helical:
```
pip install helical[tahoe]
```

## Notes on the installation:
- Make sure your machine has GPU(s) and Cuda installed. Currently this is a requirement for the packages mamba-ssm and causal-conv1d.
Expand Down Expand Up @@ -114,6 +140,7 @@ apptainer shell --nv --fakeroot singularity/helical/
- [scGPT](https://helical.readthedocs.io/en/latest/model_cards/scgpt/)
- [Universal Cell Embedding (UCE)](https://helical.readthedocs.io/en/latest/model_cards/uce/)
- [TranscriptFormer](https://helical.readthedocs.io/en/latest/model_cards/transcriptformer/)
- [Tahoe-x1](https://helical.readthedocs.io/en/latest/model_cards/tahoe/)

### DNA models:
- [HyenaDNA](https://helical.readthedocs.io/en/latest/model_cards/hyena_dna/)
Expand Down Expand Up @@ -145,6 +172,7 @@ Within the `examples/notebooks` folder, open the notebook of your choice. We rec
|[Cell-Gene-Cls-embedding-generation.ipynb](./examples/notebooks/Cell-Gene-Cls-embedding-generation.ipynb)|A notebook explaining the different embedding modes of single cell RNA models.|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/helicalAI/helical/blob/main/examples/notebooks/Cell-Gene-Cls-embedding-generation.ipynb) |
|[Geneformer-Series-Comparison.ipynb](./examples/notebooks/Geneformer-Series-Comparison.ipynb)|A zero shot comparison between Geneformer model scaling on drug perturbation prediction|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/helicalAI/helical/blob/main/examples/notebooks/Geneformer-Series-Comparison.ipynb) |
|[Cell2Sen-Tutorial.ipynb](./examples/notebooks/Cell2Sen-Tutorial.ipynb)|An example tutorial of how to use cell2sen models for embeddings and perturbation predictions.|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/helicalAI/helical/blob/main/examples/notebooks/Cell2Sen-Tutorial.ipynb) |
|[Tahoe-x1-Tutorial.ipynb](./examples/notebooks/Tahoe-x1-Tutorial.ipynb)|A comprehensive tutorial on using the Tahoe-x1 model for extracting cell and gene embeddings, with attention visualization.|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/helicalAI/helical/blob/main/examples/notebooks/Tahoe-x1-Tutorial.ipynb) |


## Stuck somewhere ? Other ideas ?
Expand Down Expand Up @@ -176,6 +204,9 @@ A lot of our models have been published by talented authors developing these exc
- [TranscriptFormer](https://github.com/czi-ai/transcriptformer)
- [HyenaDNA](https://github.com/HazyResearch/hyena-dna)
- [Cell2Sen](https://github.com/vandijklab/cell2sentence)
- [Tahoe-X1](https://github.com/tahoebio/tahoe-x1)
- [llm-foundry](https://github.com/mosaicml/llm-foundry)
- [composer](https://github.com/mosaicml/composer)
- [anndata](https://github.com/scverse/anndata)
- [scanpy](https://github.com/scverse/scanpy)
- [transformers](https://github.com/huggingface/transformers)
Expand All @@ -199,6 +230,7 @@ You can find the Licenses for each model implementation in the model repositorie
- [HyenaDNA](https://github.com/helicalAI/helical/blob/release/helical/models/hyena_dna/LICENSE)
- [Evo2](https://github.com/helicalAI/helical/blob/release/helical/models/evo_2/LICENSE)
- [Cell2Sen](https://github.com/helicalAI/helical/blob/release/helical/models/c2s/LICENSE)
- [Tahoe-X1](https://github.com/helicalAI/helical/blob/release/helical/models/tahoe/LICENSE)

## Citation

Expand Down
Empty file added ci/tests/test_tahoe/__init__.py
Empty file.
92 changes: 92 additions & 0 deletions ci/tests/test_tahoe/test_tahoe_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import pytest
from helical.models.tahoe import TahoeConfig


class TestTahoeConfig:
"""Test suite for TahoeConfig class."""

def test_default_config(self):
"""Test that default configuration is created correctly."""
config = TahoeConfig()

assert config.config["model_size"] == "70m"
assert config.config["batch_size"] == 8
assert config.config["emb_mode"] == "cell"
assert config.config["attn_impl"] == "flash"
assert config.config["device"] == "cpu"
assert config.config["max_length"] == 2048
assert config.config["num_workers"] == 8
assert config.config["prefetch_factor"] == 48

def test_custom_model_size(self):
"""Test configuration with custom model size."""
config = TahoeConfig(model_size="1b")
assert config.config["model_size"] == "1b"

def test_custom_batch_size(self):
"""Test configuration with custom batch size."""
config = TahoeConfig(batch_size=32)
assert config.config["batch_size"] == 32

def test_custom_emb_mode(self):
"""Test configuration with custom embedding mode."""
config = TahoeConfig(emb_mode="gene")
assert config.config["emb_mode"] == "gene"

def test_custom_attn_impl(self):
"""Test configuration with custom attention implementation."""
config = TahoeConfig(attn_impl="torch")
assert config.config["attn_impl"] == "torch"

def test_custom_device(self):
"""Test configuration with custom device."""
config = TahoeConfig(device="cpu")
assert config.config["device"] == "cpu"

def test_custom_max_length(self):
"""Test configuration with custom max length."""
config = TahoeConfig(max_length=5000)
assert config.config["max_length"] == 5000

def test_multiple_custom_parameters(self):
"""Test configuration with multiple custom parameters."""
config = TahoeConfig(
model_size="1b",
batch_size=16,
emb_mode="gene",
attn_impl="torch",
device="cpu",
max_length=8000,
num_workers=4
)

assert config.config["model_size"] == "1b"
assert config.config["batch_size"] == 16
assert config.config["emb_mode"] == "gene"
assert config.config["attn_impl"] == "torch"
assert config.config["device"] == "cpu"
assert config.config["max_length"] == 8000
assert config.config["num_workers"] == 4

@pytest.mark.parametrize("emb_mode", ["cell", "gene"])
def test_valid_emb_modes(self, emb_mode):
"""Test that valid embedding modes are accepted."""
config = TahoeConfig(emb_mode=emb_mode)
assert config.config["emb_mode"] == emb_mode

@pytest.mark.parametrize("attn_impl", ["flash", "torch", "triton"])
def test_valid_attn_impl(self, attn_impl):
"""Test that valid attention implementations are accepted."""
config = TahoeConfig(attn_impl=attn_impl)
assert config.config["attn_impl"] == attn_impl

def test_hf_repo_id(self):
"""Test that HuggingFace repository ID is set correctly."""
config = TahoeConfig()
assert config.config["hf_repo_id"] == "tahoebio/Tahoe-x1"

def test_config_immutability(self):
"""Test that config can be modified after creation."""
config = TahoeConfig(batch_size=10)
config.config["batch_size"] = 20
assert config.config["batch_size"] == 20
Loading
Loading