Preliminaries

This project is a prototype framework for extracting training data for machine-learning-based proof assistants. Specifically, it extracts maximally rich geometric data from constructive mathematical proofs written in Lean.

For additional context, refer to main.pdf.

Getting Started

Clone this repository.
Initialize a Lean 4 project within the cloned repository and ensure Python is installed.
Modify your lakefile to include any required packages and Lean modules, ensuring they can be imported in ~/Foo/Test.lean.

Lean.foo

The module Export.lean was originally programmed by Mario Carneiro (see Lean4Export and Lean3 Export Format).

Minimal modifications were made to Main.lean, including an I/O operation at the end to write the output of Carneiro’s export process to a .txt file.

How to Use

Write your Lean program (i.e., constructive mathematical theorem) in ~/Foo/Test.lean, ensuring it compiles.
Alternatively, import all necessary Lean modules you'd like to export.
Execute the following command in a terminal within the same directory as Main.lean:
```
lake exe foo Foo.Test -- <full name of your Lean program>
```
This will export the fully elaborated Lean program to Export.txt.
To export all imported modules in ./Foo/Test.lean, use:
```
lake exe foo Foo.Test
```

This process allows exporting entire GitHub repositories, modules, etc.

Python.main

This component processes Lean exports into structured data.

How to Use

Once Export.txt has been generated using the Lean.foo process, run:

python main.py Export.txt

Upon successful execution, this Python script generates graph.csv in the same directory.

Output Structure

A set of declarations forming a directed acyclic graph (DAG).
An edge d → d' exists if d' is referenced within d.
Dependencies are labeled based on their position in the structure.

Important Notes

Computational Perspective: The function computed by main.py is an isomorphism—it acts as an interpreter for Lean.
Data Representation: The exported data provides a compiler-invariant representation of fully elaborated Lean programs—constructive proofs of theorems. While the output format (graph.csv) represents labeled directed edges, alternative representations (e.g., sets of vertices and edges) can be derived using the functions in base.py.
Machine Learning Relevance: The exported data serves as the rawest form of positive training data for ML-based proof assistants.

Visualization (Optional)

To visualize the exported digraph, use Cosmograph:

Upload graph.csv to Cosmograph.
The image Nat.gcd_self.png in this repo is an example visualization.

Future Directions

A deeper understanding of the syntactic structure of each declaration will facilitate better insights into the exported digraph, enabling more effective transformations into training data.

This project leverages the LeanDojo (LeanDojo GitHub), LeanCopilot (LeanCopilot GitHub), and ReProver (ReProver GitHub) frameworks.

The PDF main.pdf is a preprint of a paper I am writing on this project. Look there for specifics.

Summary

This framework provides an alternative approach to Lean data extraction by focusing on fully elaborated declarations and dependency structures. The extracted data can be directly utilized for machine learning applications, particularly in proof assistant research.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Example Exports		Example Exports
Foo		Foo
.DS_Store		.DS_Store
LICENSE		LICENSE
Limitations — LeanDojo 2.2.0 documentation.pdf		Limitations — LeanDojo 2.2.0 documentation.pdf
Main.lean		Main.lean
Nat.gcd_self.png		Nat.gcd_self.png
README.md		README.md
base.py		base.py
lakefile.lean		lakefile.lean
lean-toolchain		lean-toolchain
main.pdf		main.pdf
main.py		main.py
references.bib		references.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Preliminaries

Getting Started

Lean.foo

How to Use

Python.main

How to Use

Output Structure

Important Notes

Visualization (Optional)

Future Directions

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Preliminaries

Getting Started

Lean.foo

How to Use

Python.main

How to Use

Output Structure

Important Notes

Visualization (Optional)

Future Directions

Summary

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages