This GitHub Action provides an AI-powered code review for Pull Requests in Lean 4 projects, with a strong focus on detecting misformalization issues. It supports multiple LLM providers (Gemini, Anthropic Claude, OpenAI GPT) and analyzes code changes against formal specifications and project dependencies through a multi-agent pipeline.
- 5-Agent Review Pipeline:
- Mechanical Pre-Checks: Deterministic scanning for escape hatches (
sorry,axiom,native_decide,opaque,implemented_by,sorryAx) in both newly introduced and pre-existing code, with comment- and string-awareness (including Lean 4 nested block comments). - Specification Analyst (Agent A): Reads external PDFs and math papers (native multimodal across Gemini, Anthropic, and OpenAI) to extract a "Formalization Checklist" — a mapping from paper results to mathematical content that any correct formalization must preserve. Receives repository structure and dependency graph for awareness of existing formalizations.
- Triage Agent: Groups changed files into review clusters based on the dependency graph and type signatures, prioritizing tightly-coupled files for joint review. Produces a review strategy and key hypotheses per cluster that guide the per-file reviewers.
- Code Reviewer (Agent B): Evaluates each Lean file's diff and full content against the spec checklist, repository context, and Lean 4 best practices. Writes a structured analysis before producing findings (what the code does mathematically, risk assessment, spec mapping). Runs in parallel across files (up to 5 concurrent workers) with cluster-level type signatures, review strategy, and key hypotheses.
- Cross-File Analysis Agent: Analyzes composition chains, type-flow across files, axiom/escape-hatch impact propagation, and external dependency correctness.
- Lead Synthesizer (Agent C): Aggregates per-file reviews (both formatted and structured data) into a prioritized executive summary with deduplication.
- Mechanical Pre-Checks: Deterministic scanning for escape hatches (
- Transitive Dependency Discovery: Uses
lake exe graph --jsonwith BFS traversal (configurable depth, default 2) to find both direct and transitive dependencies. Asymmetric depth: dependencies (what we import) go to depth 2; dependents (what imports us) stay at depth 1. - Lean Toolchain Extraction: Post-build extraction of axiom dependencies (
#print axioms),sorry/admitlocations, and compiler diagnostics for all changed files, plus lightweight sorry/admit scanning for overflow files. - Tiered Context Management: Full file content for up to 50 files (configurable via
CONTEXT_LIMIT), with type-signature-only summaries for overflow files. Depth-1 dependencies are prioritized over depth-2 in the full-context tier, and the cap is enforced against the total tier size so very-large PRs gracefully demote excess files to summaries rather than blowing the prompt budget. Per-file reviewers additionally filter out sibling changed files fromREPO_CONTEXT(each changed file gets its own review pass; cross-file coupling is surfaced by the dedicated cross-file analyzer and the triage-produced cluster signatures). A per-call character budget (MAX_PROMPT_CHARS, default ~2.5M chars ≈ 830K tokens) defensively trims dependency content if an individual review's assembled prompt would still exceed the model's context limit. - Context Completeness Guarantees: External reference fetch errors, Lean toolchain extraction failures, and content cache failures are surfaced as warnings in the review output. External references fall back to inline delivery if caching fails.
- Agent Deliberation: Agents think before reviewing. Extended thinking (configurable token budget) gives the model internal reasoning time. FileReview and CrossFileAnalysis schemas include an
analysisfield where the model writes its step-by-step reasoning before committing to findings. Triage produces per-cluster review strategies and testable hypotheses that flow to the per-file reviewers. - Multi-Provider Support: Supports Gemini, Anthropic (Claude), and OpenAI (GPT) as interchangeable backends via a provider abstraction layer. Each provider handles structured output, PDF content, caching, and extended thinking according to its native API.
- Structured Output: All agents produce Pydantic-validated JSON responses. Line-level annotations are posted via the GitHub Review API using the modern
line/sideparameters. - Per-Agent Model Selection: Each pipeline stage can use a different model via CLI flags (
--spec-model,--review-model,--cross-file-model,--synthesis-model). - Adaptive Pipeline: Single-file PRs skip triage, cross-file analysis, and synthesis (the per-file review is the output, with a deterministic downstream impact note from the dependency graph). Two-file PRs skip triage but get cross-file analysis.
- Checkout & Environment Setup: Fetches full Git history, sets up Python 3.13 and Lean/Lake via
lean-action. - Build: Builds the Lean project with
lake build(with optional linting). - Discover Related Files: Identifies changed
.leanfiles, then uses the Lake dependency graph for BFS-based transitive dependency and dependent discovery. Splits results into full-context and summary-context tiers. - Extract Lean Toolchain Info: Runs
#print axiomsper declaration, scans forsorry/admit, and captures compiler diagnostics for changed files. Performs lightweight sorry/admit scanning on summary-context overflow files. Operates within a configurable time budget (default 300s). - Run Multi-Agent Review Pipeline:
- Pre-checks (deterministic): Scans diffs for escape hatches with nested block comment and string literal awareness.
- Agent A (spec analysis): Reads external PDFs/papers with repository structure context, produces a formalization checklist.
- Triage: Groups files into review clusters using dependency graph and type signatures. Produces review strategies and key hypotheses per cluster.
- Agent B (per-file review, parallel): Writes a step-by-step analysis of each file, then derives findings from the analysis. Receives cluster review strategy, key hypotheses, and type signatures of related files.
- Cross-File Agent: Traces composition chains, type-flow, and axiom propagation, then reports issues grounded in that analysis.
- Synthesis: Aggregates structured review data (finding counts, verdicts, violated checklist items) and formatted reviews into an executive summary.
- Post Review: Publishes or updates an AI review comment on the PR, with collapsible per-file details grouped by cluster. Line-level annotations are posted as GitHub Review comments where findings map to diff lines.
This is a composite action. It supports both automatic review on PR open and on-demand review via /review comments, ideally combined in a single workflow.
Create a workflow file at .github/workflows/ai-review.yml:
name: PR Review
on:
pull_request:
types: [opened]
issue_comment:
types: [created]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.event.issue.number }}
cancel-in-progress: true
jobs:
review:
if: >-
(
github.event_name == 'pull_request' &&
(
github.event.pull_request.author_association == 'OWNER' ||
github.event.pull_request.author_association == 'MEMBER' ||
github.event.pull_request.author_association == 'COLLABORATOR'
)
) ||
(
github.event_name == 'issue_comment' &&
github.event.issue.pull_request &&
startsWith(github.event.comment.body, '/review') &&
(
github.event.comment.author_association == 'OWNER' ||
github.event.comment.author_association == 'MEMBER' ||
github.event.comment.author_association == 'COLLABORATOR'
)
)
runs-on: ubuntu-latest
timeout-minutes: 90
permissions:
contents: read
pull-requests: write
steps:
- name: Extract arguments from comment
id: get_args
if: github.event_name == 'issue_comment'
env:
COMMENT_BODY: ${{ github.event.comment.body }}
run: |
# Generate a random EOF marker for the GITHUB_OUTPUT heredoc.
# This variable is passed to awk which handles all output writing.
EOF=$(openssl rand -hex 8)
awk -v eof="$EOF" -v gh_out="$GITHUB_OUTPUT" '
BEGIN {
ext = ""
repo = ""
com = ""
section = ""
}
/^External:/ { section="ext"; next }
/^Internal:/ { section="repo"; next }
/^Comments:/ { section="com"; next }
{
gsub(/\r/, "", $0);
gsub(/^[ \t]+|[ \t]+$/, "", $0);
if ($0 == "" || $0 == "/review") { next }
if (section == "ext") {
sub(/^- +/, "");
if (ext != "") { ext = ext "," $0 } else { ext = $0 }
}
else if (section == "repo") {
sub(/^- +/, "");
if (repo != "") { repo = repo "," $0 } else { repo = $0 }
}
else if (section == "com") {
if (com != "") { com = com "\n" $0 } else { com = $0 }
}
}
END {
printf "external_refs=%s\n", ext >> gh_out
printf "repo_context_refs=%s\n", repo >> gh_out
printf "additional_comments<<%s\n", eof >> gh_out
printf "%s\n", com >> gh_out
printf "%s\n", eof >> gh_out
}
' <<< "$COMMENT_BODY"
shell: bash
- uses: alexanderlhicks/lean-review-workflow@main
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
api_key: ${{ secrets.LLM_API_KEY }}
provider: anthropic # or: gemini, openai
model: claude-opus-4-7 # or: gemini-3.1-pro-preview, gpt-5.4
pr_number: ${{ github.event.issue.number || github.event.pull_request.number }}
external_refs: "${{ steps.get_args.outputs.external_refs }}"
repo_context_refs: "${{ steps.get_args.outputs.repo_context_refs }}"
additional_comments: "${{ steps.get_args.outputs.additional_comments }}"Key features of this workflow:
- Concurrency control prevents duplicate reviews on the same PR.
- Access control restricts triggers to owners, members, and collaborators only (prevents abuse on public repos).
- Combined triggers handle both auto-review on PR open and on-demand
/reviewcomments. - Multiline comment parsing supports structured sections with bullet points.
How developers use it:
Automatic review runs on PR open. For re-review with context, leave a comment:
/review
External:
- https://arxiv.org/pdf/2301.12345.pdf
- https://eprint.iacr.org/2024/001.pdf
Internal:
- docs/spec.md
Comments:
Check that the definition of the ring homomorphism in Section 4
matches the formalization, especially the commutativity hypothesis.
name: AI Code Review for Lean PRs
on:
pull_request:
types: [opened, synchronize]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true
jobs:
ai_review_lean:
if: >-
github.event.pull_request.author_association == 'OWNER' ||
github.event.pull_request.author_association == 'MEMBER' ||
github.event.pull_request.author_association == 'COLLABORATOR'
runs-on: ubuntu-latest
timeout-minutes: 90
permissions:
contents: read
pull-requests: write
steps:
- uses: alexanderlhicks/lean-review-workflow@main
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
api_key: ${{ secrets.LLM_API_KEY }}
provider: anthropic # or: gemini, openai
model: claude-opus-4-7 # or: gemini-3.1-pro-preview, gpt-5.4
pr_number: ${{ github.event.pull_request.number }}| Input | Required | Default | Description |
|---|---|---|---|
github_token |
Yes | — | GitHub Token for API calls |
api_key |
Yes | — | API key for the LLM provider |
provider |
No | gemini |
LLM provider: gemini, anthropic, or openai |
model |
Yes | — | Model name (e.g., claude-opus-4-7, gemini-3.1-pro-preview, gpt-5.4) |
pr_number |
Yes | — | The Pull Request number |
external_refs |
No | "" |
Comma-separated URLs to external documents (PDFs, HTML, raw source) |
repo_context_refs |
No | "" |
Comma-separated paths to additional internal context files/directories |
additional_comments |
No | "" |
Extra focus instructions for the AI reviewer |
lint |
No | false |
Whether to run the Lean linter |
dependency_depth |
No | 2 |
Depth of transitive dependency traversal (1=direct only, 2=imports of imports) |
thinking_budget |
No | 10240 |
Thinking token budget for deep-analysis agents. On Anthropic: mapped to output_config.effort (low/medium/high) for adaptive-thinking models (Opus 4.7/4.6, Sonnet 4.6, Mythos); ignored with warning on other models. On OpenAI: mapped to reasoning.effort for reasoning-capable models (o1/o3/o4/gpt-5). On Gemini 3: mapped to ThinkingConfig.thinking_level. |
spec_model |
No | model |
Model override for the Specification Analyst agent |
triage_model |
No | model |
Model override for the Triage agent |
review_model |
No | model |
Model override for the per-file Code Reviewer agent |
cross_file_model |
No | model |
Model override for the Cross-File Analysis agent |
synthesis_model |
No | model |
Model override for the Synthesis agent |
| Feature | Gemini | Anthropic (Claude) | OpenAI (GPT) |
|---|---|---|---|
| Structured output | Native schema | Tool use pattern | Native schema (Responses API text_format) |
| Native PDF | Yes | Yes (document blocks) | Yes (input_file via Responses API) |
| Extended thinking | ThinkingConfig.thinking_level (Gemini 3 only) |
Adaptive output_config.effort (Opus 4.7/4.6, Sonnet 4.6, Mythos only); ignored with warning on other models |
reasoning.effort on reasoning models (o1/o3/o4/gpt-5); ignored with warning on non-reasoning models |
| Content caching | Server-side (TTL) | Per-request (ephemeral) | Automatic |
lean-review-workflow/
action.yml # GitHub Actions composite action definition
review.py # Main review orchestration (multi-agent pipeline)
llm_provider.py # LLM provider abstraction (Gemini, Anthropic, OpenAI)
discover_files.py # Dependency discovery via lake graph (BFS)
lean_info_extractor.py # Lean toolchain data extraction (axioms, sorry, diagnostics)
lean_utils.py # Shared utilities (module names, comment parsing, file cache)
requirements.txt # Python dependencies
prompts/
analyze_spec.md # Agent A: specification analysis prompt
triage.md # Triage agent: file clustering prompt
review_file.md # Agent B: per-file review (no spec)
review_code_with_spec.md # Agent B: per-file review (with spec checklist)
cross_file_analysis.md # Cross-file analysis prompt
synthesize_summary.md # Synthesis agent: executive summary prompt
lean4_checklist.md # Lean 4 best practices checklist (injected into Agent B)
verdict_rules.md # Hard verdict rules (injected into Agent B and Synthesis)
tests/
test_review.py
test_llm_provider.py
test_discover_files.py
test_lean_info_extractor.py
test_lean_utils.py
The intelligence and behavior of the AI reviewer are governed by Markdown prompt templates in the prompts/ directory. Each template uses {{PLACEHOLDER}} syntax for dynamic content injection at runtime.
analyze_spec.md (Agent A — Specification Analyst):
{{EXTERNAL_CONTEXT}}, {{FILE_DIFFS}}, {{REPO_STRUCTURE}}, {{DEPENDENCY_GRAPH}}
triage.md (Triage Agent):
{{DEPENDENCY_GRAPH}}, {{ALL_DIFFS}}, {{CHANGED_FILE_SIGNATURES}}, {{SPEC_CHECKLIST}}, {{ADDITIONAL_COMMENTS}}
review_file.md / review_code_with_spec.md (Agent B — Code Reviewer):
{{REPO_CONTEXT}}, {{FILE_PATH}}, {{FULL_CONTENT}}, {{FILE_DIFF}}, {{SPEC_CHECKLIST}}, {{ADDITIONAL_COMMENTS}}, {{CLUSTER_CONTEXT}}, {{LEAN4_CHECKLIST}}, {{VERDICT_RULES}}
cross_file_analysis.md (Cross-File Agent):
{{SPEC_CHECKLIST}}, {{PRE_CHECK_FINDINGS}}, {{ALL_DIFFS}}, {{ALL_CHANGED_CONTENTS}}, {{DEPENDENCY_CONTEXT}}, {{ADDITIONAL_COMMENTS}}
synthesize_summary.md (Synthesis Agent):
{{SPEC_CHECKLIST}}, {{PRE_CHECK_FINDINGS}}, {{CROSS_FILE_ANALYSIS}}, {{PER_FILE_REVIEWS}}, {{STRUCTURED_REVIEWS}}, {{VERDICT_RULES}}
lean4_checklist.md and verdict_rules.md are static content injected into Agent B and Synthesis prompts. They contain the Lean 4 best practices checklist and hard verdict rules respectively.
pip install pytest
python -m pytest tests/ -vSee requirements.txt: requests, beautifulsoup4, pydantic, google-genai, anthropic, openai, pymupdf.
Contributions are welcome. Please ensure changes pass the existing test suite.