-
Notifications
You must be signed in to change notification settings - Fork 41
Bug report in otb project #77
Copy link
Copy link
Open
Description
In UnderThinkingBench, there exists some items without "source_dataset", but "source" instead with values "aime" and "hmmt", which seems to be in the UnderThinkingBench-Math subset.
UnderThinkingBench entry:
Lines 39 to 40 in 264c047
| elif subset == "underthinking-bench" or "underthinking" in subset: | |
| acc = eval_underthink(row) |
Error encountered here:
RAM/projects/otb/evals/underthink_eval.py
Lines 33 to 34 in 264c047
| def eval_underthink(row, find_last_box: bool = False) -> float: | |
| puzzle = json.loads(row["metadata"])["source_dataset"] |
Possible solution in eval.py:
import json
# ...
elif subset == "underthinking-bench" or "underthinking" in subset:
metadata = json.loads(row["metadata"])
if "source" in metadata and metadata["source"] in ["aime", "hmmt"]:
acc = eval_math(row, tokenizer, model_name)
else:
acc = eval_underthink(row)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels