Hey, I went through the eval suite carefully. It's a solid set of zero-shot tasks covering VEP, sequence recovery, perturbation sensitivity, and long-context retrieval. One notable gap: there's no chromatin accessibility eval, despite the model being trained on eukaryotic regulatory sequence where chromatin state is a primary functional signal.
A natural addition would be a zero-shot ATAC-seq peak discrimination task given two sequences from the same genomic region, one from an open chromatin peak (ENCODE ATAC-seq narrowPeak) and one from flanking closed chromatin, does the model assign higher log-likelihood to the open one? This follows the same pairwise discrimination pattern already used in the perturbation tasks (mean(LL(real) > LL(perturbed))), so it slots cleanly into the existing eval structure.
A minimal implementation would use ENCODE ATAC-seq peak calls (e.g. GM12878 or K562, already publicly available) as positive examples, with matched GC-content flanking regions as negatives. Window size could mirror the VEP setup (8 kb centered on peak summit).
I work on ATAC-seq pipelines and would be happy to put together a PR for evaluation/atac_eval.py if this direction looks right to the team.
Hey, I went through the eval suite carefully. It's a solid set of zero-shot tasks covering VEP, sequence recovery, perturbation sensitivity, and long-context retrieval. One notable gap: there's no chromatin accessibility eval, despite the model being trained on eukaryotic regulatory sequence where chromatin state is a primary functional signal.
A natural addition would be a zero-shot ATAC-seq peak discrimination task given two sequences from the same genomic region, one from an open chromatin peak (ENCODE ATAC-seq narrowPeak) and one from flanking closed chromatin, does the model assign higher log-likelihood to the open one? This follows the same pairwise discrimination pattern already used in the perturbation tasks (mean(LL(real) > LL(perturbed))), so it slots cleanly into the existing eval structure.
A minimal implementation would use ENCODE ATAC-seq peak calls (e.g. GM12878 or K562, already publicly available) as positive examples, with matched GC-content flanking regions as negatives. Window size could mirror the VEP setup (8 kb centered on peak summit).
I work on ATAC-seq pipelines and would be happy to put together a PR for evaluation/atac_eval.py if this direction looks right to the team.