Hello,
1.In the paper,DeepSeek-R1's call Accuracy

2.But, i got 37.9,when i run the code: python3 EVAL/eval_T/0_call_acc.py --source LLM_generated/Bench_T_general_purpose/output_DeepSeek-R1.jsonl --target output/deepseek --GPUs 0,1,2,3
Does this replication process look correct to you?
Please reply to me
Thinks
Hello,
1.In the paper,DeepSeek-R1's call Accuracy

2.But, i got 37.9,when i run the code: python3 EVAL/eval_T/0_call_acc.py --source LLM_generated/Bench_T_general_purpose/output_DeepSeek-R1.jsonl --target output/deepseek --GPUs 0,1,2,3
Does this replication process look correct to you?
Please reply to me
Thinks