Large language models can often write plausible code, but verified synthesis requires more than a final implementation. The code must type-check, terminate, match the intended behavior, support meaningful specifications, and expose theorem statements that are useful for proof. BRIDGE studies this as a structured reasoning problem across three connected domains:
- Code: executable Lean implementations.
- Specifications: contracts, predicates, and behavioral properties.
- Proofs: implementation-relevant theorem statements and proof attempts.
BRIDGE guides models through domain-aligned intermediate representations before asking for the final formal output. In the code domain, functional reasoning in an OCaml/Haskell style often produces program structure that is closer to Lean's total, immutable, recursive style than direct natural-language-to-Lean generation.
datasets/bridge178.jsonl: 178 algorithmic tasks with Lean signatures and oracle tests.prompts/: direct, functional, imperative, specification, theorem, and proof prompts.scripts/: prompt rendering, OpenAI-compatible generation, code extraction, and repair.BRIDGE.lean,BRIDGE/,lakefile.toml,lean-toolchain: the Lean/Lake project skeleton.external_benchmarks/: adapters for running BRIDGE prompts on VERINA, CLEVER, and DafnyBench.modal/: Modal/vLLM endpoint template for open-weight model runs.MODELS_AND_DATA.md: models, data sources, and the SFT side experiment note.
Install the Python runner dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtInstall elan (Lean's version manager), then initialize the Lean project:
lake update
lake exe cache get
lake buildThe project pins Lean in lean-toolchain and pulls Mathlib through
lakefile.toml.
BRIDGE uses OpenAI-compatible chat completion endpoints.
For the OpenAI API:
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
export MODEL="gpt-4o"For Modal/vLLM:
pip install modal
modal setup
modal deploy modal/modal_vllm_template.py
export OPENAI_BASE_URL="https://YOUR-WORKSPACE--bridge-vllm-endpoint-serve.modal.run/v1"
export OPENAI_API_KEY="EMPTY"
export MODEL="Goedel-LM/Goedel-Code-Prover-8B"Render BRIDGE-178 prompts:
python scripts/render_prompts.py \
--dataset datasets/bridge178.jsonl \
--prompt prompts/code_domain/functional_bridge.md \
--output rendered_prompts/bridge178_functional.jsonl \
--max-tests 3Generate candidates:
python scripts/openai_compatible_generate.py \
--prompts rendered_prompts/bridge178_functional.jsonl \
--model "$MODEL" \
--output results/bridge178_functional_generations.jsonl \
--temperature 0.7 \
--max-tokens 4096Extract the final Lean blocks:
python scripts/extract_fenced_code.py \
--input results/bridge178_functional_generations.jsonl \
--output results/bridge178_functional_extracted.jsonl \
--language LeanFor direct and imperative baselines, swap the prompt file:
prompts/code_domain/direct_lean.mdprompts/code_domain/imperative_bridge.md
The code-domain metric is Lean executable correctness. A candidate passes when
it elaborates in Lean, satisfies Lean's termination checks, and passes the task
oracle tests. The checker interface is described in
scripts/evaluate_lean_outputs.md.
External benchmark runs use the official benchmark repositories and checkers:
- VERINA:
external_benchmarks/VERINA.md - CLEVER:
external_benchmarks/CLEVER.md - DafnyBench:
external_benchmarks/DAFNYBENCH.md
The main experiments use Claude Sonnet 4, DeepSeek Coder V2, DeepSeek R1,
Llama-3.1-70B, and Qwen2.5-Coder-7B. See MODELS_AND_DATA.md for model and
dataset details.
The SFT experiment is an additional side story. We will release the SFT dataset and Hugging Face checkpoint soon.
@misc{george2025bridge,
title = {BRIDGE: Building Representations In Domain Guided Program Synthesis},
author = {George, Robert Joseph and Eisenach, Carson and Ghai, Udaya and Perrault-Joncas, Dominique and Anandkumar, Anima and Foster, Dean},
year = {2025},
eprint = {2511.21104},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
doi = {10.48550/arXiv.2511.21104},
url = {https://arxiv.org/abs/2511.21104}
}