Skip to content

uw-math-ai/lean-error-correction

Repository files navigation

Learning to Repair Lean Proofs from Compiler Feedback

Evan Wang, Simon Chess, Daniel Lee, Siyuan Ge, Ajit Mallavarapu, Vasily Ilin

arXiv HF Paper Dataset: APRIL

TL;DR

Existing Lean datasets contain correct proofs. Models learn error correction with RL, that's expensive. We release a dataset of 260k erroneous Lean proofs, the compiler feedback, error explanation, proof repair reasoning trace, and the corrected proof.

Dataset

Single-shot repair results

Model Baseline Fine-tuned (APRIL)
Goedel-Prover-V2-8B 15.5% 34.6%
Kimina-Prover-8B 11.1% 31.9%
Qwen3-4B-Instruct-2507 1.1% 27.4%

Links

Citation

@article{wang2026repairlean,
  title  = {Learning to Repair Lean Proofs from Compiler Feedback},
  author = {Wang, Evan and Chess, Simon and Lee, Daniel and Ge, Siyuan and Mallavarapu, Ajit and Ilin, Vasily},
  journal= {arXiv preprint arXiv:2602.02990},
  year   = {2026},
  doi    = {10.48550/arXiv.2602.02990},
  url    = {https://arxiv.org/abs/2602.02990}
}

Running

First, run the following commands

pip install -r requirements.txt
pip install -e .

For the full process, run these. Alternatively, you can run them in a slightly different order

python -m scripts.filter_files && \
python -m scripts.build_theorem_cache && \
python -m scripts.create_theorem_pairs && \
python -m scripts.expand_theorem_pairs && \
python -m scripts.create_incorrect_proofs && \
python -m scripts.annotate_proofs && \
python -m scripts.generate_explanations && \
python -m scripts.build_sft_dataset

Categories

Initial Check

For filtering into src_pass.jsonl (check to make sure no errors)

python -m scripts.filter_files

Error Generation

For generating errors by replacing theorems

python -m scripts.build_theorem_cache && \
python -m scripts.create_theorem_pairs && \
python -m scripts.expand_theorem_pairs && \
python -m scripts.create_incorrect_proofs && \
python -m scripts.annotate_proofs

TODO (alternative ways to generate errors)

Explanation

python -m scripts.generate_explanations && \
python -m scripts.build_sft_dataset

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors