BRIDGE: Building Representations In Domain Guided Program Synthesis

Large language models can often write plausible code, but verified synthesis requires more than a final implementation. The code must type-check, terminate, match the intended behavior, support meaningful specifications, and expose theorem statements that are useful for proof. BRIDGE studies this as a structured reasoning problem across three connected domains:

Code: executable Lean implementations.
Specifications: contracts, predicates, and behavioral properties.
Proofs: implementation-relevant theorem statements and proof attempts.

BRIDGE guides models through domain-aligned intermediate representations before asking for the final formal output. In the code domain, functional reasoning in an OCaml/Haskell style often produces program structure that is closer to Lean's total, immutable, recursive style than direct natural-language-to-Lean generation.

What is here

datasets/bridge178.jsonl: 178 algorithmic tasks with Lean signatures and oracle tests.
prompts/: direct, functional, imperative, specification, theorem, and proof prompts.
scripts/: prompt rendering, OpenAI-compatible generation, code extraction, and repair.
BRIDGE.lean, BRIDGE/, lakefile.toml, lean-toolchain: the Lean/Lake project skeleton.
external_benchmarks/: adapters for running BRIDGE prompts on VERINA, CLEVER, and DafnyBench.
modal/: Modal/vLLM endpoint template for open-weight model runs.
MODELS_AND_DATA.md: models, data sources, and the SFT side experiment note.

Setup

Install the Python runner dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Install elan (Lean's version manager), then initialize the Lean project:

lake update
lake exe cache get
lake build

The project pins Lean in lean-toolchain and pulls Mathlib through lakefile.toml.

Choose an inference backend

BRIDGE uses OpenAI-compatible chat completion endpoints.

For the OpenAI API:

export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"
export MODEL="gpt-4o"

For Modal/vLLM:

pip install modal
modal setup
modal deploy modal/modal_vllm_template.py

export OPENAI_BASE_URL="https://YOUR-WORKSPACE--bridge-vllm-endpoint-serve.modal.run/v1"
export OPENAI_API_KEY="EMPTY"
export MODEL="Goedel-LM/Goedel-Code-Prover-8B"

Quickstart

Render BRIDGE-178 prompts:

python scripts/render_prompts.py \
  --dataset datasets/bridge178.jsonl \
  --prompt prompts/code_domain/functional_bridge.md \
  --output rendered_prompts/bridge178_functional.jsonl \
  --max-tests 3

Generate candidates:

python scripts/openai_compatible_generate.py \
  --prompts rendered_prompts/bridge178_functional.jsonl \
  --model "$MODEL" \
  --output results/bridge178_functional_generations.jsonl \
  --temperature 0.7 \
  --max-tokens 4096

Extract the final Lean blocks:

python scripts/extract_fenced_code.py \
  --input results/bridge178_functional_generations.jsonl \
  --output results/bridge178_functional_extracted.jsonl \
  --language Lean

For direct and imperative baselines, swap the prompt file:

prompts/code_domain/direct_lean.md
prompts/code_domain/imperative_bridge.md

Evaluation

The code-domain metric is Lean executable correctness. A candidate passes when it elaborates in Lean, satisfies Lean's termination checks, and passes the task oracle tests. The checker interface is described in scripts/evaluate_lean_outputs.md.

External benchmark runs use the official benchmark repositories and checkers:

VERINA: external_benchmarks/VERINA.md
CLEVER: external_benchmarks/CLEVER.md
DafnyBench: external_benchmarks/DAFNYBENCH.md

Models and SFT

The main experiments use Claude Sonnet 4, DeepSeek Coder V2, DeepSeek R1, Llama-3.1-70B, and Qwen2.5-Coder-7B. See MODELS_AND_DATA.md for model and dataset details.

The SFT experiment is an additional side story. We will release the SFT dataset and Hugging Face checkpoint soon.

Citation

@misc{george2025bridge,
  title = {BRIDGE: Building Representations In Domain Guided Program Synthesis},
  author = {George, Robert Joseph and Eisenach, Carson and Ghai, Udaya and Perrault-Joncas, Dominique and Anandkumar, Anima and Foster, Dean},
  year = {2025},
  eprint = {2511.21104},
  archivePrefix = {arXiv},
  primaryClass = {cs.LG},
  doi = {10.48550/arXiv.2511.21104},
  url = {https://arxiv.org/abs/2511.21104}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BRIDGE: Building Representations In Domain Guided Program Synthesis

What is here

Setup

Choose an inference backend

Quickstart

Evaluation

Models and SFT

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
BRIDGE		BRIDGE
configs		configs
datasets		datasets
docs		docs
external_benchmarks		external_benchmarks
modal		modal
prompts		prompts
results		results
scripts		scripts
.gitignore		.gitignore
BRIDGE.lean		BRIDGE.lean
LICENSE		LICENSE
MODELS_AND_DATA.md		MODELS_AND_DATA.md
README.md		README.md
RUNNING.md		RUNNING.md
lakefile.toml		lakefile.toml
lean-toolchain		lean-toolchain
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

BRIDGE: Building Representations In Domain Guided Program Synthesis

What is here

Setup

Choose an inference backend

Quickstart

Evaluation

Models and SFT

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages