helicalAI · bputzeys · Dec 9, 2025 · Nov 13, 2025 · Nov 13, 2025 · Nov 13, 2025
diff --git a/.coveragerc b/.coveragerc
@@ -0,0 +1,3 @@
+[run]
+omit =
+    */minimal_llm_foundry/*
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
@@ -19,8 +19,8 @@ jobs:
 
       - name: Install dependencies
         run: |
-            pip install -r requirements-dev.txt
             pip install .[mamba-ssm]
+            pip install -r requirements-dev.txt
 
       # First download before tests as they make use of the downloaded files 
       - name: Download all files
@@ -133,6 +133,10 @@ jobs:
         run: |
           python examples/run_models/run_c2s.py
 
+      - name: Execute Tahoe
+        run: |
+          python examples/run_models/run_tahoe.py ++device="cuda"
+
   notebooks:
     needs: tests
     runs-on: self-hosted
@@ -150,7 +154,7 @@ jobs:
       # because jobs may not be run in the same order, we need to install the dependencies again
       - name: Install helical
         run: |
-            pip install .[mamba-ssm]
+            pip install --no-cache-dir .[mamba-ssm]
 
       - name: Reduce datasets to speedup checks
         run: |

diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -151,6 +151,10 @@ jobs:
         run: |
           python examples/run_models/run_c2s.py
 
+      - name: Execute Tahoe
+        run: |
+          python examples/run_models/run_tahoe.py ++device="cuda"
+
   notebooks:
     needs: tests
     runs-on: self-hosted

diff --git a/README.md b/README.md
@@ -35,6 +35,9 @@ Let’s build the most exciting AI-for-Bio community together!
 
 ## What's new?
 
+### Tahoe-x1
+We have integrated the Tahoe-x1 foundation model for single-cell RNA-seq data. This transformer-based model can extract both cell and gene embeddings from raw count data and supports attention weight extraction for interpretability. Try it out with our [comprehensive tutorial notebook](./examples/notebooks/Tahoe-x1-Tutorial.ipynb)!
+
 ### Cell2Sentence-Scale
 We have integrated the new Cell2Sentence-Scale models which use cell sentences as input and are based on the Gemma language model architecture (2B and 27B models available in quantised versions too). You can use this model for embeddings and perturbation prediction. Follow our notebook tutorial [here](./examples/notebooks/Cell2Sen-Tutorial.ipynb). 
 
@@ -67,6 +70,12 @@ To install the latest pip release of our Helical package, you can run the comman
 pip install helical
 ```
 
+***Note***
+Sometimes Torch is not installed as the CUDA compiled version (e.g. on different architectures) which is why you need to manually install Helical with GPU support, run the command below (or install pytorch with cuda first and then install helical):
+```
+pip install helical --extra-index-url https://download.pytorch.org/whl/cuXXX (replace XXX with your cuda version, e.g. 128 for cuda 12.8)
+```
+
 To install the latest Helical package, you can run the command below:
 ```
 pip install --upgrade git+https://github.com/helicalAI/helical.git
@@ -78,6 +87,15 @@ git clone https://github.com/helicalAI/helical.git
 pip install .
 ```
 
+
+###Flash Attention Support
+To enable Flash Attention (required by some models), run the command below:
+```
+pip install flash-attn --no-build-isolation
+```
+**Important** Make sure that your Pytorch CUDA Version matches your system CUDA version, especially when using flash-attn.
+
+###Mamba-SSM Model Installation
 [Optional] To install mamba-ssm and causal-conv1d use the command below:
 ```
 pip install helical[mamba-ssm]
@@ -86,6 +104,14 @@ or in case you're installing from the Helical repo cloned locally:
 ```
 pip install .[mamba-ssm]
 ```
+###Evo2 Model Installation
+To install Evo2 Specifically, follow the instructions in the [evo-2 model card](helical/models/evo_2/README.md).
+
+### Tahoe-X1 Model Installation
+To install Tahoe-X1 do the following after installing helical:
+```
+pip install helical[tahoe]
+```
 
 ## Notes on the installation: 
 - Make sure your machine has GPU(s) and Cuda installed. Currently this is a requirement for the packages mamba-ssm and causal-conv1d. 
@@ -114,6 +140,7 @@ apptainer shell --nv --fakeroot singularity/helical/
 - [scGPT](https://helical.readthedocs.io/en/latest/model_cards/scgpt/)
 - [Universal Cell Embedding (UCE)](https://helical.readthedocs.io/en/latest/model_cards/uce/)
 - [TranscriptFormer](https://helical.readthedocs.io/en/latest/model_cards/transcriptformer/)
+- [Tahoe-x1](https://helical.readthedocs.io/en/latest/model_cards/tahoe/)
 
 ### DNA models:
 - [HyenaDNA](https://helical.readthedocs.io/en/latest/model_cards/hyena_dna/)
@@ -145,6 +172,7 @@ Within the `examples/notebooks` folder, open the notebook of your choice. We rec
 |[Cell-Gene-Cls-embedding-generation.ipynb](./examples/notebooks/Cell-Gene-Cls-embedding-generation.ipynb)|A notebook explaining the different embedding modes of single cell RNA models.|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/helicalAI/helical/blob/main/examples/notebooks/Cell-Gene-Cls-embedding-generation.ipynb) |
 |[Geneformer-Series-Comparison.ipynb](./examples/notebooks/Geneformer-Series-Comparison.ipynb)|A zero shot comparison between Geneformer model scaling on drug perturbation prediction|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/helicalAI/helical/blob/main/examples/notebooks/Geneformer-Series-Comparison.ipynb) |
 |[Cell2Sen-Tutorial.ipynb](./examples/notebooks/Cell2Sen-Tutorial.ipynb)|An example tutorial of how to use cell2sen models for embeddings and perturbation predictions.|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/helicalAI/helical/blob/main/examples/notebooks/Cell2Sen-Tutorial.ipynb) |
+|[Tahoe-x1-Tutorial.ipynb](./examples/notebooks/Tahoe-x1-Tutorial.ipynb)|A comprehensive tutorial on using the Tahoe-x1 model for extracting cell and gene embeddings, with attention visualization.|[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/helicalAI/helical/blob/main/examples/notebooks/Tahoe-x1-Tutorial.ipynb) |
 
 
 ## Stuck somewhere ? Other ideas ?
@@ -176,6 +204,9 @@ A lot of our models have been published by talented authors developing these exc
 - [TranscriptFormer](https://github.com/czi-ai/transcriptformer)
 - [HyenaDNA](https://github.com/HazyResearch/hyena-dna)
 - [Cell2Sen](https://github.com/vandijklab/cell2sentence)
+- [Tahoe-X1](https://github.com/tahoebio/tahoe-x1)
+- [llm-foundry](https://github.com/mosaicml/llm-foundry)
+- [composer](https://github.com/mosaicml/composer)
 - [anndata](https://github.com/scverse/anndata)
 - [scanpy](https://github.com/scverse/scanpy)
 - [transformers](https://github.com/huggingface/transformers)
@@ -199,6 +230,7 @@ You can find the Licenses for each model implementation in the model repositorie
 - [HyenaDNA](https://github.com/helicalAI/helical/blob/release/helical/models/hyena_dna/LICENSE)
 - [Evo2](https://github.com/helicalAI/helical/blob/release/helical/models/evo_2/LICENSE)
 - [Cell2Sen](https://github.com/helicalAI/helical/blob/release/helical/models/c2s/LICENSE)
+- [Tahoe-X1](https://github.com/helicalAI/helical/blob/release/helical/models/tahoe/LICENSE)
 
 ## Citation
 

diff --git a/ci/tests/test_tahoe/__init__.py b/ci/tests/test_tahoe/__init__.py
diff --git a/ci/tests/test_tahoe/test_tahoe_config.py b/ci/tests/test_tahoe/test_tahoe_config.py
@@ -0,0 +1,92 @@
+import pytest
+from helical.models.tahoe import TahoeConfig
+
+
+class TestTahoeConfig:
+    """Test suite for TahoeConfig class."""
+
+    def test_default_config(self):
+        """Test that default configuration is created correctly."""
+        config = TahoeConfig()
+
+        assert config.config["model_size"] == "70m"
+        assert config.config["batch_size"] == 8
+        assert config.config["emb_mode"] == "cell"
+        assert config.config["attn_impl"] == "flash"
+        assert config.config["device"] == "cpu"
+        assert config.config["max_length"] == 2048
+        assert config.config["num_workers"] == 8
+        assert config.config["prefetch_factor"] == 48
+
+    def test_custom_model_size(self):
+        """Test configuration with custom model size."""
+        config = TahoeConfig(model_size="1b")
+        assert config.config["model_size"] == "1b"
+
+    def test_custom_batch_size(self):
+        """Test configuration with custom batch size."""
+        config = TahoeConfig(batch_size=32)
+        assert config.config["batch_size"] == 32
+
+    def test_custom_emb_mode(self):
+        """Test configuration with custom embedding mode."""
+        config = TahoeConfig(emb_mode="gene")
+        assert config.config["emb_mode"] == "gene"
+
+    def test_custom_attn_impl(self):
+        """Test configuration with custom attention implementation."""
+        config = TahoeConfig(attn_impl="torch")
+        assert config.config["attn_impl"] == "torch"
+
+    def test_custom_device(self):
+        """Test configuration with custom device."""
+        config = TahoeConfig(device="cpu")
+        assert config.config["device"] == "cpu"
+
+    def test_custom_max_length(self):
+        """Test configuration with custom max length."""
+        config = TahoeConfig(max_length=5000)
+        assert config.config["max_length"] == 5000
+
+    def test_multiple_custom_parameters(self):
+        """Test configuration with multiple custom parameters."""
+        config = TahoeConfig(
+            model_size="1b",
+            batch_size=16,
+            emb_mode="gene",
+            attn_impl="torch",
+            device="cpu",
+            max_length=8000,
+            num_workers=4
+        )
+
+        assert config.config["model_size"] == "1b"
+        assert config.config["batch_size"] == 16
+        assert config.config["emb_mode"] == "gene"
+        assert config.config["attn_impl"] == "torch"
+        assert config.config["device"] == "cpu"
+        assert config.config["max_length"] == 8000
+        assert config.config["num_workers"] == 4
+
+    @pytest.mark.parametrize("emb_mode", ["cell", "gene"])
+    def test_valid_emb_modes(self, emb_mode):
+        """Test that valid embedding modes are accepted."""
+        config = TahoeConfig(emb_mode=emb_mode)
+        assert config.config["emb_mode"] == emb_mode
+
+    @pytest.mark.parametrize("attn_impl", ["flash", "torch", "triton"])
+    def test_valid_attn_impl(self, attn_impl):
+        """Test that valid attention implementations are accepted."""
+        config = TahoeConfig(attn_impl=attn_impl)
+        assert config.config["attn_impl"] == attn_impl
+
+    def test_hf_repo_id(self):
+        """Test that HuggingFace repository ID is set correctly."""
+        config = TahoeConfig()
+        assert config.config["hf_repo_id"] == "tahoebio/Tahoe-x1"
+
+    def test_config_immutability(self):
+        """Test that config can be modified after creation."""
+        config = TahoeConfig(batch_size=10)
+        config.config["batch_size"] = 20
+        assert config.config["batch_size"] == 20