AI Code Review Workflow for Lean Projects

This GitHub Action provides an AI-powered code review for Pull Requests in Lean 4 projects, with a strong focus on detecting misformalization issues. It supports multiple LLM providers (Gemini, Anthropic Claude, OpenAI GPT) and analyzes code changes against formal specifications and project dependencies through a multi-agent pipeline.

Features

5-Agent Review Pipeline:
1. Mechanical Pre-Checks: Deterministic scanning for escape hatches (sorry, axiom, native_decide, opaque, implemented_by, sorryAx) in both newly introduced and pre-existing code, with comment- and string-awareness (including Lean 4 nested block comments).
2. Specification Analyst (Agent A): Reads external PDFs and math papers (native multimodal across Gemini, Anthropic, and OpenAI) to extract a "Formalization Checklist" — a mapping from paper results to mathematical content that any correct formalization must preserve. Receives repository structure and dependency graph for awareness of existing formalizations.
3. Triage Agent: Groups changed files into review clusters based on the dependency graph and type signatures, prioritizing tightly-coupled files for joint review. Produces a review strategy and key hypotheses per cluster that guide the per-file reviewers.
4. Code Reviewer (Agent B): Evaluates each Lean file's diff and full content against the spec checklist, repository context, and Lean 4 best practices. Writes a structured analysis before producing findings (what the code does mathematically, risk assessment, spec mapping). Runs in parallel across files (up to 5 concurrent workers) with cluster-level type signatures, review strategy, and key hypotheses.
5. Cross-File Analysis Agent: Analyzes composition chains, type-flow across files, axiom/escape-hatch impact propagation, and external dependency correctness.
6. Lead Synthesizer (Agent C): Aggregates per-file reviews (both formatted and structured data) into a prioritized executive summary with deduplication.
Transitive Dependency Discovery: Uses lake exe graph --json with BFS traversal (configurable depth, default 2) to find both direct and transitive dependencies. Asymmetric depth: dependencies (what we import) go to depth 2; dependents (what imports us) stay at depth 1.
Lean Toolchain Extraction: Post-build extraction of axiom dependencies (#print axioms), sorry/admit locations, and compiler diagnostics for all changed files, plus lightweight sorry/admit scanning for overflow files.
Tiered Context Management: Full file content for up to 50 files (configurable via CONTEXT_LIMIT), with type-signature-only summaries for overflow files. Depth-1 dependencies are prioritized over depth-2 in the full-context tier, and the cap is enforced against the total tier size so very-large PRs gracefully demote excess files to summaries rather than blowing the prompt budget. Per-file reviewers additionally filter out sibling changed files from REPO_CONTEXT (each changed file gets its own review pass; cross-file coupling is surfaced by the dedicated cross-file analyzer and the triage-produced cluster signatures). A per-call character budget (MAX_PROMPT_CHARS, default ~2.5M chars ≈ 830K tokens) defensively trims dependency content if an individual review's assembled prompt would still exceed the model's context limit.
Context Completeness Guarantees: External reference fetch errors, Lean toolchain extraction failures, and content cache failures are surfaced as warnings in the review output. External references fall back to inline delivery if caching fails.
Agent Deliberation: Agents think before reviewing. Extended thinking (configurable token budget) gives the model internal reasoning time. FileReview and CrossFileAnalysis schemas include an analysis field where the model writes its step-by-step reasoning before committing to findings. Triage produces per-cluster review strategies and testable hypotheses that flow to the per-file reviewers.
Multi-Provider Support: Supports Gemini, Anthropic (Claude), and OpenAI (GPT) as interchangeable backends via a provider abstraction layer. Each provider handles structured output, PDF content, caching, and extended thinking according to its native API.
Structured Output: All agents produce Pydantic-validated JSON responses. Line-level annotations are posted via the GitHub Review API using the modern line/side parameters.
Per-Agent Model Selection: Each pipeline stage can use a different model via CLI flags (--spec-model, --review-model, --cross-file-model, --synthesis-model).
Adaptive Pipeline: Single-file PRs skip triage, cross-file analysis, and synthesis (the per-file review is the output, with a deterministic downstream impact note from the dependency graph). Two-file PRs skip triage but get cross-file analysis.

How it Works

Checkout & Environment Setup: Fetches full Git history, sets up Python 3.13 and Lean/Lake via lean-action.
Build: Builds the Lean project with lake build (with optional linting).
Discover Related Files: Identifies changed .lean files, then uses the Lake dependency graph for BFS-based transitive dependency and dependent discovery. Splits results into full-context and summary-context tiers.
Extract Lean Toolchain Info: Runs #print axioms per declaration, scans for sorry/admit, and captures compiler diagnostics for changed files. Performs lightweight sorry/admit scanning on summary-context overflow files. Operates within a configurable time budget (default 300s).
Run Multi-Agent Review Pipeline:
- Pre-checks (deterministic): Scans diffs for escape hatches with nested block comment and string literal awareness.
- Agent A (spec analysis): Reads external PDFs/papers with repository structure context, produces a formalization checklist.
- Triage: Groups files into review clusters using dependency graph and type signatures. Produces review strategies and key hypotheses per cluster.
- Agent B (per-file review, parallel): Writes a step-by-step analysis of each file, then derives findings from the analysis. Receives cluster review strategy, key hypotheses, and type signatures of related files.
- Cross-File Agent: Traces composition chains, type-flow, and axiom propagation, then reports issues grounded in that analysis.
- Synthesis: Aggregates structured review data (finding counts, verdicts, violated checklist items) and formatted reviews into an executive summary.
Post Review: Publishes or updates an AI review comment on the PR, with collapsible per-file details grouped by cluster. Line-level annotations are posted as GitHub Review comments where findings map to diff lines.

Usage

This is a composite action. It supports both automatic review on PR open and on-demand review via /review comments, ideally combined in a single workflow.

Recommended: Combined Workflow (Auto + ChatOps)

Create a workflow file at .github/workflows/ai-review.yml:

name: PR Review

on:
  pull_request:
    types: [opened]
  issue_comment:
    types: [created]

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.event.issue.number }}
  cancel-in-progress: true

jobs:
  review:
    if: >-
      (
        github.event_name == 'pull_request' &&
        (
          github.event.pull_request.author_association == 'OWNER' ||
          github.event.pull_request.author_association == 'MEMBER' ||
          github.event.pull_request.author_association == 'COLLABORATOR'
        )
      ) ||
      (
        github.event_name == 'issue_comment' &&
        github.event.issue.pull_request &&
        startsWith(github.event.comment.body, '/review') &&
        (
          github.event.comment.author_association == 'OWNER' ||
          github.event.comment.author_association == 'MEMBER' ||
          github.event.comment.author_association == 'COLLABORATOR'
        )
      )
    runs-on: ubuntu-latest
    timeout-minutes: 90
    permissions:
      contents: read
      pull-requests: write
    steps:
      - name: Extract arguments from comment
        id: get_args
        if: github.event_name == 'issue_comment'
        env:
          COMMENT_BODY: ${{ github.event.comment.body }}
        run: |
          # Generate a random EOF marker for the GITHUB_OUTPUT heredoc.
          # This variable is passed to awk which handles all output writing.
          EOF=$(openssl rand -hex 8)

          awk -v eof="$EOF" -v gh_out="$GITHUB_OUTPUT" '
            BEGIN {
              ext = ""
              repo = ""
              com = ""
              section = ""
            }
            /^External:/ { section="ext"; next }
            /^Internal:/ { section="repo"; next }
            /^Comments:/ { section="com"; next }
            {
              gsub(/\r/, "", $0);
              gsub(/^[ \t]+|[ \t]+$/, "", $0);
              if ($0 == "" || $0 == "/review") { next }

              if (section == "ext") {
                sub(/^- +/, "");
                if (ext != "") { ext = ext "," $0 } else { ext = $0 }
              }
              else if (section == "repo") {
                sub(/^- +/, "");
                if (repo != "") { repo = repo "," $0 } else { repo = $0 }
              }
              else if (section == "com") {
                if (com != "") { com = com "\n" $0 } else { com = $0 }
              }
            }
            END {
              printf "external_refs=%s\n", ext >> gh_out
              printf "repo_context_refs=%s\n", repo >> gh_out
              printf "additional_comments<<%s\n", eof >> gh_out
              printf "%s\n", com >> gh_out
              printf "%s\n", eof >> gh_out
            }
          ' <<< "$COMMENT_BODY"
        shell: bash

      - uses: alexanderlhicks/lean-review-workflow@main
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          api_key: ${{ secrets.LLM_API_KEY }}
          provider: anthropic  # or: gemini, openai
          model: claude-opus-4-7  # or: gemini-3.1-pro-preview, gpt-5.4
          pr_number: ${{ github.event.issue.number || github.event.pull_request.number }}
          external_refs: "${{ steps.get_args.outputs.external_refs }}"
          repo_context_refs: "${{ steps.get_args.outputs.repo_context_refs }}"
          additional_comments: "${{ steps.get_args.outputs.additional_comments }}"

Key features of this workflow:

Concurrency control prevents duplicate reviews on the same PR.
Access control restricts triggers to owners, members, and collaborators only (prevents abuse on public repos).
Combined triggers handle both auto-review on PR open and on-demand /review comments.
Multiline comment parsing supports structured sections with bullet points.

How developers use it:

Automatic review runs on PR open. For re-review with context, leave a comment:

/review
External:
- https://arxiv.org/pdf/2301.12345.pdf
- https://eprint.iacr.org/2024/001.pdf
Internal:
- docs/spec.md
Comments:
Check that the definition of the ring homomorphism in Section 4
matches the formalization, especially the commutativity hypothesis.

Alternative: Minimal Push Workflow (No ChatOps)

name: AI Code Review for Lean PRs

on:
  pull_request:
    types: [opened, synchronize]

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
  cancel-in-progress: true

jobs:
  ai_review_lean:
    if: >-
      github.event.pull_request.author_association == 'OWNER' ||
      github.event.pull_request.author_association == 'MEMBER' ||
      github.event.pull_request.author_association == 'COLLABORATOR'
    runs-on: ubuntu-latest
    timeout-minutes: 90
    permissions:
      contents: read
      pull-requests: write

    steps:
      - uses: alexanderlhicks/lean-review-workflow@main
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          api_key: ${{ secrets.LLM_API_KEY }}
          provider: anthropic  # or: gemini, openai
          model: claude-opus-4-7  # or: gemini-3.1-pro-preview, gpt-5.4
          pr_number: ${{ github.event.pull_request.number }}

Inputs

Input	Required	Default	Description
`github_token`	Yes	—	GitHub Token for API calls
`api_key`	Yes	—	API key for the LLM provider
`provider`	No	`gemini`	LLM provider: `gemini`, `anthropic`, or `openai`
`model`	Yes	—	Model name (e.g., `claude-opus-4-7`, `gemini-3.1-pro-preview`, `gpt-5.4`)
`pr_number`	Yes	—	The Pull Request number
`external_refs`	No	`""`	Comma-separated URLs to external documents (PDFs, HTML, raw source)
`repo_context_refs`	No	`""`	Comma-separated paths to additional internal context files/directories
`additional_comments`	No	`""`	Extra focus instructions for the AI reviewer
`lint`	No	`false`	Whether to run the Lean linter
`dependency_depth`	No	`2`	Depth of transitive dependency traversal (1=direct only, 2=imports of imports)
`thinking_budget`	No	`10240`	Thinking token budget for deep-analysis agents. On Anthropic: mapped to `output_config.effort` (low/medium/high) for adaptive-thinking models (Opus 4.7/4.6, Sonnet 4.6, Mythos); ignored with warning on other models. On OpenAI: mapped to `reasoning.effort` for reasoning-capable models (o1/o3/o4/gpt-5). On Gemini 3: mapped to `ThinkingConfig.thinking_level`.
`spec_model`	No	`model`	Model override for the Specification Analyst agent
`triage_model`	No	`model`	Model override for the Triage agent
`review_model`	No	`model`	Model override for the per-file Code Reviewer agent
`cross_file_model`	No	`model`	Model override for the Cross-File Analysis agent
`synthesis_model`	No	`model`	Model override for the Synthesis agent

Provider Feature Notes

Feature	Gemini	Anthropic (Claude)	OpenAI (GPT)
Structured output	Native schema	Tool use pattern	Native schema (Responses API `text_format`)
Native PDF	Yes	Yes (document blocks)	Yes (`input_file` via Responses API)
Extended thinking	`ThinkingConfig.thinking_level` (Gemini 3 only)	Adaptive `output_config.effort` (Opus 4.7/4.6, Sonnet 4.6, Mythos only); ignored with warning on other models	`reasoning.effort` on reasoning models (o1/o3/o4/gpt-5); ignored with warning on non-reasoning models
Content caching	Server-side (TTL)	Per-request (ephemeral)	Automatic

Project Structure

lean-review-workflow/
  action.yml                  # GitHub Actions composite action definition
  review.py                   # Main review orchestration (multi-agent pipeline)
  llm_provider.py             # LLM provider abstraction (Gemini, Anthropic, OpenAI)
  discover_files.py           # Dependency discovery via lake graph (BFS)
  lean_info_extractor.py      # Lean toolchain data extraction (axioms, sorry, diagnostics)
  lean_utils.py               # Shared utilities (module names, comment parsing, file cache)
  requirements.txt            # Python dependencies
  prompts/
    analyze_spec.md           # Agent A: specification analysis prompt
    triage.md                 # Triage agent: file clustering prompt
    review_file.md            # Agent B: per-file review (no spec)
    review_code_with_spec.md  # Agent B: per-file review (with spec checklist)
    cross_file_analysis.md    # Cross-file analysis prompt
    synthesize_summary.md     # Synthesis agent: executive summary prompt
    lean4_checklist.md        # Lean 4 best practices checklist (injected into Agent B)
    verdict_rules.md          # Hard verdict rules (injected into Agent B and Synthesis)
  tests/
    test_review.py
    test_llm_provider.py
    test_discover_files.py
    test_lean_info_extractor.py
    test_lean_utils.py

Customizing AI Prompts

The intelligence and behavior of the AI reviewer are governed by Markdown prompt templates in the prompts/ directory. Each template uses {{PLACEHOLDER}} syntax for dynamic content injection at runtime.

Key prompt files and their placeholders:

analyze_spec.md (Agent A — Specification Analyst): {{EXTERNAL_CONTEXT}}, {{FILE_DIFFS}}, {{REPO_STRUCTURE}}, {{DEPENDENCY_GRAPH}}

triage.md (Triage Agent): {{DEPENDENCY_GRAPH}}, {{ALL_DIFFS}}, {{CHANGED_FILE_SIGNATURES}}, {{SPEC_CHECKLIST}}, {{ADDITIONAL_COMMENTS}}

review_file.md / review_code_with_spec.md (Agent B — Code Reviewer): {{REPO_CONTEXT}}, {{FILE_PATH}}, {{FULL_CONTENT}}, {{FILE_DIFF}}, {{SPEC_CHECKLIST}}, {{ADDITIONAL_COMMENTS}}, {{CLUSTER_CONTEXT}}, {{LEAN4_CHECKLIST}}, {{VERDICT_RULES}}

cross_file_analysis.md (Cross-File Agent): {{SPEC_CHECKLIST}}, {{PRE_CHECK_FINDINGS}}, {{ALL_DIFFS}}, {{ALL_CHANGED_CONTENTS}}, {{DEPENDENCY_CONTEXT}}, {{ADDITIONAL_COMMENTS}}

synthesize_summary.md (Synthesis Agent): {{SPEC_CHECKLIST}}, {{PRE_CHECK_FINDINGS}}, {{CROSS_FILE_ANALYSIS}}, {{PER_FILE_REVIEWS}}, {{STRUCTURED_REVIEWS}}, {{VERDICT_RULES}}

lean4_checklist.md and verdict_rules.md are static content injected into Agent B and Synthesis prompts. They contain the Lean 4 best practices checklist and hard verdict rules respectively.

Development

Running Tests

pip install pytest
python -m pytest tests/ -v

Dependencies

See requirements.txt: requests, beautifulsoup4, pydantic, google-genai, anthropic, openai, pymupdf.

Contributions are welcome. Please ensure changes pass the existing test suite.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Code Review Workflow for Lean Projects

Features

How it Works

Usage

Recommended: Combined Workflow (Auto + ChatOps)

Alternative: Minimal Push Workflow (No ChatOps)

Inputs

Provider Feature Notes

Project Structure

Customizing AI Prompts

Key prompt files and their placeholders:

Development

Running Tests

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
prompts		prompts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
discover_files.py		discover_files.py
lean_info_extractor.py		lean_info_extractor.py
lean_utils.py		lean_utils.py
llm_provider.py		llm_provider.py
requirements.txt		requirements.txt
review.py		review.py

Folders and files

Latest commit

History

Repository files navigation

AI Code Review Workflow for Lean Projects

Features

How it Works

Usage

Recommended: Combined Workflow (Auto + ChatOps)

Alternative: Minimal Push Workflow (No ChatOps)

Inputs

Provider Feature Notes

Project Structure

Customizing AI Prompts

Key prompt files and their placeholders:

Development

Running Tests

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages