Skip to content

lean-dojo/BRIDGE

Repository files navigation

BRIDGE: Building Representations In Domain Guided Program Synthesis

Large language models can often write plausible code, but verified synthesis requires more than a final implementation. The code must type-check, terminate, match the intended behavior, support meaningful specifications, and expose theorem statements that are useful for proof. BRIDGE studies this as a structured reasoning problem across three connected domains:

  • Code: executable Lean implementations.
  • Specifications: contracts, predicates, and behavioral properties.
  • Proofs: implementation-relevant theorem statements and proof attempts.

BRIDGE guides models through domain-aligned intermediate representations before asking for the final formal output. In the code domain, functional reasoning in an OCaml/Haskell style often produces program structure that is closer to Lean's total, immutable, recursive style than direct natural-language-to-Lean generation.

What is here

  • datasets/bridge178.jsonl: 178 algorithmic tasks with Lean signatures and oracle tests.
  • prompts/: direct, functional, imperative, specification, theorem, and proof prompts.
  • scripts/: prompt rendering, OpenAI-compatible generation, code extraction, and repair.
  • BRIDGE.lean, BRIDGE/, lakefile.toml, lean-toolchain: the Lean/Lake project skeleton.
  • external_benchmarks/: adapters for running BRIDGE prompts on VERINA, CLEVER, and DafnyBench.
  • modal/: Modal/vLLM endpoint template for open-weight model runs.
  • MODELS_AND_DATA.md: models, data sources, and the SFT side experiment note.

Setup

Install the Python runner dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Install elan (Lean's version manager), then initialize the Lean project:

lake update
lake exe cache get
lake build

The project pins Lean in lean-toolchain and pulls Mathlib through lakefile.toml.

Choose an inference backend

BRIDGE uses OpenAI-compatible chat completion endpoints.

For the OpenAI API:

export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
export MODEL="gpt-4o"

For Modal/vLLM:

pip install modal
modal setup
modal deploy modal/modal_vllm_template.py

export OPENAI_BASE_URL="https://YOUR-WORKSPACE--bridge-vllm-endpoint-serve.modal.run/v1"
export OPENAI_API_KEY="EMPTY"
export MODEL="Goedel-LM/Goedel-Code-Prover-8B"

Quickstart

Render BRIDGE-178 prompts:

python scripts/render_prompts.py \
  --dataset datasets/bridge178.jsonl \
  --prompt prompts/code_domain/functional_bridge.md \
  --output rendered_prompts/bridge178_functional.jsonl \
  --max-tests 3

Generate candidates:

python scripts/openai_compatible_generate.py \
  --prompts rendered_prompts/bridge178_functional.jsonl \
  --model "$MODEL" \
  --output results/bridge178_functional_generations.jsonl \
  --temperature 0.7 \
  --max-tokens 4096

Extract the final Lean blocks:

python scripts/extract_fenced_code.py \
  --input results/bridge178_functional_generations.jsonl \
  --output results/bridge178_functional_extracted.jsonl \
  --language Lean

For direct and imperative baselines, swap the prompt file:

  • prompts/code_domain/direct_lean.md
  • prompts/code_domain/imperative_bridge.md

Evaluation

The code-domain metric is Lean executable correctness. A candidate passes when it elaborates in Lean, satisfies Lean's termination checks, and passes the task oracle tests. The checker interface is described in scripts/evaluate_lean_outputs.md.

External benchmark runs use the official benchmark repositories and checkers:

  • VERINA: external_benchmarks/VERINA.md
  • CLEVER: external_benchmarks/CLEVER.md
  • DafnyBench: external_benchmarks/DAFNYBENCH.md

Models and SFT

The main experiments use Claude Sonnet 4, DeepSeek Coder V2, DeepSeek R1, Llama-3.1-70B, and Qwen2.5-Coder-7B. See MODELS_AND_DATA.md for model and dataset details.

The SFT experiment is an additional side story. We will release the SFT dataset and Hugging Face checkpoint soon.

Citation

@misc{george2025bridge,
  title = {BRIDGE: Building Representations In Domain Guided Program Synthesis},
  author = {George, Robert Joseph and Eisenach, Carson and Ghai, Udaya and Perrault-Joncas, Dominique and Anandkumar, Anima and Foster, Dean},
  year = {2025},
  eprint = {2511.21104},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  doi = {10.48550/arXiv.2511.21104},
  url = {https://arxiv.org/abs/2511.21104}
}

About

BRIDGE is a framework for program verification and synthesis in Lean.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors