Hi, thank you for the great work!
I evaluated thus checkpoint [KernelGen-LM-8B](https://huggingface.co/AscendKernelGen/KernelGen-LM-8B?utm_source=chatgpt.com) with kernel_only_mode: False/kernel_only_mode: True and n_sample: 10 and I could not reproduce the results in the paper. Could you please help me with that?
Overall Statistics Summary (kernel_only_mode: False)
| op_level |
total_samples |
parse_pass_rate |
compile_pass_rate |
precision_pass_rate |
| 1 |
370 |
93.51% |
20.00% |
2.97% |
These results correspond to a PASS@10 execution rate of 21%, while the numbers reported in Figure 5 are significantly better, where Qwen3-8B achieves a PASS@1 of approximately 32%.
When I set kernel_only_mode: True, I obtained the following results.
Overall Statistics Summary (kernel_only_mode: True)
| op_level |
total_samples |
parse_pass_rate |
compile_pass_rate |
precision_pass_rate |
| 1 |
370 |
95.95% |
48.38% |
12.97% |
These results correspond to a PASS@10 execution rate of 46%.
Hi, thank you for the great work!
I evaluated thus checkpoint [KernelGen-LM-8B](https://huggingface.co/AscendKernelGen/KernelGen-LM-8B?utm_source=chatgpt.com) with
kernel_only_mode: False/kernel_only_mode: Trueandn_sample: 10and I could not reproduce the results in the paper. Could you please help me with that?Overall Statistics Summary (
kernel_only_mode: False)These results correspond to a PASS@10 execution rate of 21%, while the numbers reported in Figure 5 are significantly better, where Qwen3-8B achieves a PASS@1 of approximately 32%.
When I set
kernel_only_mode: True, I obtained the following results.Overall Statistics Summary (
kernel_only_mode: True)These results correspond to a PASS@10 execution rate of 46%.