Skip to content

Reproducing the Results: #1

@mgholamikn

Description

@mgholamikn

Hi, thank you for the great work!

I evaluated thus checkpoint [KernelGen-LM-8B](https://huggingface.co/AscendKernelGen/KernelGen-LM-8B?utm_source=chatgpt.com) with kernel_only_mode: False/kernel_only_mode: True and n_sample: 10 and I could not reproduce the results in the paper. Could you please help me with that?

Overall Statistics Summary (kernel_only_mode: False)

op_level total_samples parse_pass_rate compile_pass_rate precision_pass_rate
1 370 93.51% 20.00% 2.97%

These results correspond to a PASS@10 execution rate of 21%, while the numbers reported in Figure 5 are significantly better, where Qwen3-8B achieves a PASS@1 of approximately 32%.

When I set kernel_only_mode: True, I obtained the following results.

Overall Statistics Summary (kernel_only_mode: True)

op_level total_samples parse_pass_rate compile_pass_rate precision_pass_rate
1 370 95.95% 48.38% 12.97%

These results correspond to a PASS@10 execution rate of 46%.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions