[Eval] Add chromatin accessibility benchmark to the zero-shot eval suite

Hey, I went through the eval suite carefully. It's a solid set of zero-shot tasks covering VEP, sequence recovery, perturbation sensitivity, and long-context retrieval. One notable gap: there's no chromatin accessibility eval, despite the model being trained on eukaryotic regulatory sequence where chromatin state is a primary functional signal.

A natural addition would be a zero-shot ATAC-seq peak discrimination task given two sequences from the same genomic region, one from an open chromatin peak (ENCODE ATAC-seq narrowPeak) and one from flanking closed chromatin, does the model assign higher log-likelihood to the open one? This follows the same pairwise discrimination pattern already used in the perturbation tasks (mean(LL(real) > LL(perturbed))), so it slots cleanly into the existing eval structure.
A minimal implementation would use ENCODE ATAC-seq peak calls (e.g. GM12878 or K562, already publicly available) as positive examples, with matched GC-content flanking regions as negatives. Window size could mirror the VEP setup (8 kb centered on peak summit).

I work on ATAC-seq pipelines and would be happy to put together a PR for evaluation/atac_eval.py if this direction looks right to the team.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Eval] Add chromatin accessibility benchmark to the zero-shot eval suite #18

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Eval] Add chromatin accessibility benchmark to the zero-shot eval suite #18

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions