PR-Distiller

Extract reusable coding rules from your team's PR review history and serve them to AI coding assistants in real-time.

PR-Distiller crawls your GitHub repositories for merged pull request review comments, uses an LLM to classify and extract generalizable coding rules (security, architecture, performance, correctness, code-style, testing), deduplicates them semantically using ChromaDB vector similarity, and exposes them through a Model Context Protocol (MCP) server. IDE agents such as Cursor and Claude Code query the MCP server on every code change and receive project-specific constraints grounded in your team's actual review history.

How It Works

flowchart LR
    GH[GitHub PRs]
    CR[Deep Crawler]
    SR[Security Redactor]
    LLM[LLM Extractor]
    SF[Semantic Fuser]
    DB[(ChromaDB)]
    MCP[MCP Server]
    IDE[IDE Agent\nCursor / Claude Code]

    GH -->|review comments| CR
    CR -->|raw comments| SR
    SR -->|PII-redacted text| LLM
    LLM -->|extracted rule JSON| SF
    SF -->|deduplicated rule| DB
    DB -->|vector query| MCP
    MCP -->|ranked constraints| IDE

Deep Crawler — fetches PR review comments from GitHub via the REST API, filtered by keyword heuristics and bot exclusion lists. Supports incremental crawling via per-repo cursor checkpoints.
Security Redactor — runs Microsoft Presidio PII detection and custom regex patterns (AWS keys, tokens) before any text reaches the LLM.
LLM Extractor — classifies each comment as EXTRACT or SKIP, then extracts structured JSON: title, description, enforcement prompt, category, confidence score, and file-path scoping patterns.
Semantic Fuser — queries ChromaDB for vector-similar existing rules (cosine distance threshold 0.32). Duplicate hits are merged by the LLM rather than stored as redundant entries. Occurrence counts accumulate on merged rules.
ChromaDB — persists rules with full metadata. Rules start in needs_review state; they are auto-promoted to active at confidence >= 0.80 or after human approval in the dashboard.
MCP Server — a TypeScript @modelcontextprotocol/sdk server exposing query_architectural_constraints. IDEs send the active code diff; the server returns the top-k semantically relevant rules ranked by occurrence density.

Features

Automated PR comment crawling with incremental cursor checkpoints (no re-processing)
Two-stage LLM pipeline: fast classification pass, then structured extraction
PII and secret redaction via Microsoft Presidio before any LLM call
Semantic deduplication with ChromaDB cosine distance and LLM-assisted rule merging
Rule review workflow: needs_review → active (auto or manual), blocked
Per-rule effectiveness tracking: served, applied, dismissed, acceptance rate
Auto-promotion of rules with >= 5 serves and > 70% acceptance rate
File-path scoping: rules match only files whose paths satisfy configured glob patterns
GitHub webhook integration: triggers incremental extraction on each merged PR
Export to multiple formats: system prompt, Claude skill markdown, OpenAI function spec, AGENTS.md
Multi-language AST support via tree-sitter (Python, TypeScript, JavaScript, Go, Rust)
Next.js dashboard for rule browsing, manual approval, and pipeline configuration
LLM-agnostic via LiteLLM: Ollama (default), vLLM, OpenAI, Anthropic, Google, OpenRouter

Quick Start

Pick the path that matches your machine. Backend + Web UI always run in Docker; only the LLM placement differs.

Common steps (all platforms)

git clone https://github.com/rendermani/PR-Distiller.git
cd PR-Distiller
cp .env.example .env           # set GITHUB_TOKEN at minimum
make preflight                 # detects OS + available accelerators

macOS (Apple Silicon)

Docker on Mac cannot access the GPU, so the LLM runs on the host. Ollama on macOS uses Metal and unified memory automatically — full GPU acceleration, no config.

brew install ollama            # or download from https://ollama.com
ollama serve &                 # or launch the Ollama.app
make run-model                 # pulls qwen3:8b (~5.2GB)
make up                        # starts backend + web-ui in Docker

Windows — host Ollama (WSL2 or native)

# Install Ollama from https://ollama.com/download (native Windows installer)
ollama serve                   # runs in background automatically after install
make run-model
make up                        # points Docker at host Ollama

Run make inside WSL2 or Git Bash. Docker Desktop must be running.

Linux + NVIDIA GPU — everything in Docker

# Requires NVIDIA Container Toolkit: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/
make up-linux-gpu              # backend + web-ui + Ollama (GPU) all in Docker
make run-model

Linux / any OS — CPU-only fallback

make up-cpu                    # backend + web-ui + Ollama (CPU) all in Docker
make run-model

Finishing up

Open the dashboard at http://localhost:4096. Enter a repo in owner/repo format and click Start to trigger a crawl.

To stop everything: make down.

Architecture

flowchart TB
    subgraph Docker Compose
        OL[Ollama\nlocalhost:11434]
        BE[Backend\nFastAPI + ChromaDB\nlocalhost:8923]
        UI[Web UI\nNext.js dashboard\nlocalhost:4096]
        MCP[MCP Server\nstdio transport]
    end

    IDE[IDE Agent] -->|stdio MCP| MCP
    MCP -->|HTTP /api/mcp/query| BE
    UI -->|HTTP /api/*| BE
    BE -->|HTTP OpenAI-compat| OL

All services are containerized. The MCP server uses stdio transport and is started on demand (docker compose run --rm mcp-server) rather than as a persistent service.

Configuration

All configuration is via environment variables. Copy .env.example to .env before first run. Runtime settings (deduplication threshold, auto-approve confidence, crawl depth) can also be adjusted through the dashboard Settings panel and are persisted to data/config.json.

Variable	Default	Description
`GITHUB_TOKEN`	—	GitHub PAT with `repo` read access. Required for crawling.
`LLM_MODEL`	`ollama/qwen3:8b`	LiteLLM model string. Must include provider prefix (`ollama/`, `openai/`, `anthropic/`, etc.).
`LLM_API_BASE`	`http://host.docker.internal:11434/v1`	OpenAI-compatible base URL for the LLM. Auto-overridden to `http://ollama:11434/v1` when using the Ollama container overlay.
`LLM_API_KEY`	—	API key for external providers (OpenAI, Anthropic, etc.).
`EMBEDDING_MODEL`	`BAAI/bge-base-en-v1.5`	Sentence-transformer model used for ChromaDB embeddings.
`FERNET_KEY`	—	32-byte base64 Fernet key for encrypting stored tokens.
`API_AUTH_TOKEN`	—	Optional Bearer token to restrict API access.
`GITHUB_WEBHOOK_SECRET`	—	HMAC secret for validating GitHub webhook payloads.
`DEDUP_DISTANCE_THRESHOLD`	`0.32`	Cosine distance below which two rules are considered duplicates.
`AUTO_APPROVE_CONFIDENCE`	`0.80`	LLM confidence score above which rules are auto-approved.
`WEBHOOK_MIN_INTERVAL`	`60`	Minimum seconds between webhook-triggered jobs per repository.
`API_PORT`	`8923`	Host port for the FastAPI backend.
`WEB_PORT`	`4096`	Host port for the Next.js dashboard.
`OLLAMA_PORT`	`11434`	Host port for the Ollama service.

MCP Server Integration

Add the MCP server to your IDE configuration. The server communicates over stdio.

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "pr-distiller": {
      "command": "docker",
      "args": ["compose", "-f", "/path/to/PR-Distiller/docker-compose.yml",
               "run", "--rm", "mcp-server"],
      "env": { "PR_DISTILLER_API_URL": "http://localhost:8923" }
    }
  }
}

Claude Code (.claude/mcp.json in your project):

{
  "mcpServers": {
    "pr-distiller": {
      "command": "docker",
      "args": ["compose", "-f", "/path/to/PR-Distiller/docker-compose.yml",
               "run", "--rm", "mcp-server"]
    }
  }
}

The server exposes one tool: query_architectural_constraints. Pass the active code diff; optionally pass file_path to restrict results to path-scoped rules.

Tech Stack

Backend: Python 3.11, FastAPI, ChromaDB, LiteLLM, Presidio, tree-sitter
Frontend: Next.js 14, React, Tailwind CSS
MCP Server: TypeScript, @modelcontextprotocol/sdk, Zod
LLM (default): Ollama with qwen3:8b (~5.2GB); supports larger Qwen3/Qwen3-Coder/Gemma 4/Llama 4 and vLLM, OpenAI, Anthropic, Google Gemini, OpenRouter via LiteLLM
Embeddings: BAAI/bge-base-en-v1.5 (sentence-transformers, runs locally in the backend container)
AST parsing: tree-sitter grammars for Python, TypeScript, JavaScript, Go, Rust
Containerization: Docker Compose with optional NVIDIA GPU override

License

Business Source License 1.1. See LICENSE for full terms.

Free for non-commercial use. Converts to Apache License 2.0 on 2030-04-13.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
backend		backend
docs		docs
mcp-server		mcp-server
web-ui		web-ui
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.ollama.yml		docker-compose.ollama.yml
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PR-Distiller

How It Works

Features

Quick Start

Common steps (all platforms)

macOS (Apple Silicon)

Windows — host Ollama (WSL2 or native)

Linux + NVIDIA GPU — everything in Docker

Linux / any OS — CPU-only fallback

Finishing up

Architecture

Configuration

MCP Server Integration

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PR-Distiller

How It Works

Features

Quick Start

Common steps (all platforms)

macOS (Apple Silicon)

Windows — host Ollama (WSL2 or native)

Linux + NVIDIA GPU — everything in Docker

Linux / any OS — CPU-only fallback

Finishing up

Architecture

Configuration

MCP Server Integration

Tech Stack

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages