Cross benchmark contamination

Hi, 
Thank you very much for this work.
I am trying to assess the contamination between different benchmarks. For example, between mmlu training set and mmlu-pro validation set (and vice versa). However, different benchmarks might have different format i.e different keys. And we can set only a global text_keys arg. Thus we run into the [assertion error](https://github.com/ntunlp/LLMSanitize/blob/8fb443778a791f4f7e7998c513b74fc6ec5902f6/llmsanitize/base_contamination_checker.py#L78) 
Is there any way to set different text keys/key currently?  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross benchmark contamination #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cross benchmark contamination #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions