Replicate the evaluations from the paper.

Hello，

1.In the paper，DeepSeek-R1's call Accuracy 
<img width="902" height="400" alt="Image" src="https://github.com/user-attachments/assets/0ba48a5a-81c9-419b-8a30-618033f1323d" />

2.But, i got 37.9，when i run the code: python3 EVAL/eval_T/0_call_acc.py   --source LLM_generated/Bench_T_general_purpose/output_DeepSeek-R1.jsonl  --target output/deepseek --GPUs 0,1,2,3

Does this replication process look correct to you?
Please reply to me

Thinks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicate the evaluations from the paper. #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replicate the evaluations from the paper. #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions