ghinks · ghinks · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026
diff --git a/README.md b/README.md
@@ -5,9 +5,11 @@ A CLI tool to identify pull request outliers in GitHub repositories using Z-scor
 ## Features
 
 - **Fetch & Store**: retrieve PR data from GitHub (with rate-limit handling) and store it in a local SQLite database.
-- **Outlier Detection**: Z-score analysis across multiple metrics — additions, deletions, changed files, comments, review duration, code churn, and comment density.
+- **Classify**: Z-score analysis across multiple metrics — additions, deletions, changed files, comments, review duration, code churn, and comment density.
 - **Baseline window**: define a historical measurement period so recent PRs are evaluated against an independent baseline rather than skewing their own statistics.
+- **Primary-branch filter**: focus analysis on PRs that were not merged into the primary branch (e.g. feature-to-feature or abandoned branches).
 - **Flexible output**: view results as a terminal table or export to JSON/CSV.
+- **Deferred output**: when processing multiple repositories, results for all repos are printed together after all processing completes, with a summary of any repos that could not be classified.
 
 ## Installation
 
@@ -21,7 +23,7 @@ uv sync
 
 ## Usage
 
-The tool works in two steps: **fetch** data, then **detect-outliers**.
+The tool has three commands: **fetch**, **classify**, and **fetch-and-classify**.
 
 ### 1. Configure GitHub Token
 
@@ -34,17 +36,19 @@ Without a token the GitHub API rate limit is very low.
 ### 2. `fetch` — retrieve and store PR data
 
 ```bash
-# Fetch PRs merged in the last 30 days (default) for a specific repo
+# Fetch PRs created in the last 30 days (default) for a specific repo
 uv run review-classify fetch --repo owner/repo
 
 # Fetch PRs for an entire organization
 uv run review-classify fetch --org your-org
 
 # Fetch PRs within a specific date range
-uv run review-classify fetch --repo owner/repo --start 2024-01-01 --end 2024-06-30
+uv run review-classify fetch --repo owner/repo \
+  --collate-start 2024-01-01 --collate-end 2024-06-30
 
 # Clear existing data before fetching
-uv run review-classify fetch --repo owner/repo --reset-db --start 2024-01-01
+uv run review-classify fetch --repo owner/repo \
+  --reset-db --collate-start 2024-01-01
 
 # Run fetching using a TOML configuration file
 uv run review-classify fetch --config config.toml
@@ -55,25 +59,30 @@ uv run review-classify fetch --config config.toml
 | `--repo` / `-r` | GitHub repository (owner/repo). Can be specified multiple times. |
 | `--org` / `-o` | GitHub organization. Fetches all repositories in the org. Can be specified multiple times. |
 | `--config` / `-c` | Path to a TOML config file defining multiple repositories/organizations. |
-| `--start` / `-s` | Start date for PR range (YYYY-MM-DD). Defaults to 30 days ago. |
-| `--end` / `-e` | End date for PR range (YYYY-MM-DD). |
+| `--collate-start` | Start date for PR collation range (YYYY-MM-DD). Defaults to 30 days ago. |
+| `--collate-end` | End date for PR collation range (YYYY-MM-DD). |
 | `--reset-db` | Delete all stored data before fetching. |
 | `--verbose` / `-v` | Print progress details. |
 
-### 3. `detect-outliers` — find unusual PRs
+### 3. `classify` — find unusual PRs
+
+Operates on data already fetched with `fetch`. Results for all repositories are printed together after all repos have been processed.
 
 ```bash
-# Detect outliers across all stored PRs for a repo
-uv run review-classify detect-outliers --repo owner/repo
+# Classify all stored PRs for a repo
+uv run review-classify classify --repo owner/repo
 
-# Detect outliers for an entire organization
-uv run review-classify detect-outliers --org your-org
+# Classify PRs for an entire organization
+uv run review-classify classify --org your-org
 
 # Stricter threshold (fewer, more extreme outliers)
-uv run review-classify detect-outliers --repo owner/repo --threshold 3.0
+uv run review-classify classify --repo owner/repo --threshold 3.0
 
 # Export to JSON
-uv run review-classify detect-outliers --repo owner/repo --format json > outliers.json
+uv run review-classify classify --repo owner/repo --format json > outliers.json
+
+# Exclude PRs merged into the primary branch (main/master)
+uv run review-classify classify --repo owner/repo --exclude-primary-merged
 ```
 
 | Option | Description |
@@ -84,69 +93,126 @@ uv run review-classify detect-outliers --repo owner/repo --format json > outlier
 | `--threshold` / `-t` | Z-score threshold for flagging an outlier. Default: `2.0`. |
 | `--min-samples` | Minimum number of PRs required for analysis. Default: `30`. |
 | `--format` / `-f` | Output format: `table` (default), `json`, or `csv`. |
-| `--classify-start` | Start of the baseline measurement window (YYYY-MM-DD). |
-| `--classify-end` | End of the baseline measurement window (YYYY-MM-DD). |
+| `--start` | Start of the classification window (YYYY-MM-DD). |
+| `--end` | End of the classification window (YYYY-MM-DD). |
+| `--exclude-primary-merged` | Exclude PRs whose base branch is `main` or `master`. |
 | `--verbose` / `-v` | Print progress details. |
 
-#### Baseline window (`--classify-start` / `--classify-end`)
+#### Classification window (`--start` / `--end`)
 
 By default all stored PRs feed both the baseline statistics and the outlier evaluation. This is problematic: an unusually large PR inflates the mean and standard deviation it is measured against, masking itself as normal.
 
-Use `--classify-start` and `--classify-end` to define a historical baseline window. Statistics are computed from PRs merged **within** that window; only PRs merged **after** `--classify-end` are evaluated and reported.
+Use `--start` and `--end` to define a historical baseline window. Statistics are computed from PRs merged **within** that window; only PRs merged **after** `--end` are evaluated and reported.
 
 ```
-[--classify-start ────────── --classify-end]   >classify-end
-         ↑                         ↑                 ↑
-   baseline start            baseline end     PRs evaluated here
+[--start ──────────────── --end]   >end
+      ↑                     ↑          ↑
+baseline start         baseline end  PRs evaluated here
 ```
 
 ```bash
 # Use Jan–Jun 2024 as the baseline; evaluate PRs merged after 2024-06-30
-uv run review-classify detect-outliers --repo owner/repo \
-  --classify-start 2024-01-01 \
-  --classify-end   2024-06-30
+uv run review-classify classify --repo owner/repo \
+  --start 2024-01-01 \
+  --end   2024-06-30
 
 # Same, with stricter threshold and JSON output
-uv run review-classify detect-outliers --repo owner/repo \
-  --classify-start 2024-01-01 \
-  --classify-end   2024-06-30 \
+uv run review-classify classify --repo owner/repo \
+  --start 2024-01-01 \
+  --end   2024-06-30 \
   --threshold 2.5 \
   --format json > outliers.json
 ```
 
-#### Per-repository analysis
+#### Excluding primary-branch PRs (`--exclude-primary-merged`)
+
+Pass `--exclude-primary-merged` to restrict analysis to PRs that were **not** merged into `main` or `master`. This is useful for focusing on PRs targeting feature branches, release branches, or PRs that may have been abandoned.
+
+```bash
+uv run review-classify classify --repo owner/repo --exclude-primary-merged
+```
+
+### 4. `fetch-and-classify` — fetch and classify in one step
+
+Combines both steps. If PR data already exists in the local database for a repository, the fetch is skipped automatically. Use `--reset-db` to force a fresh fetch.
+
+```bash
+# Fetch (if needed) and classify in one command
+uv run review-classify fetch-and-classify --repo owner/repo
+
+# With explicit date ranges for both collation and classification
+uv run review-classify fetch-and-classify --repo owner/repo \
+  --collate-start 2024-01-01 --collate-end 2024-12-31 \
+  --start 2024-01-01 --end 2024-06-30
+
+# Force a fresh fetch even if data already exists
+uv run review-classify fetch-and-classify --repo owner/repo --reset-db
+
+# Exclude primary-branch PRs from the classification
+uv run review-classify fetch-and-classify --repo owner/repo \
+  --exclude-primary-merged
+```
+
+| Option | Description |
+| --- | --- |
+| `--repo` / `-r` | GitHub repository (owner/repo). Can be specified multiple times. |
+| `--org` / `-o` | GitHub organization. Can be specified multiple times. |
+| `--config` / `-c` | Path to a TOML config file. |
+| `--collate-start` | Start date for PR collation range (YYYY-MM-DD). |
+| `--collate-end` | End date for PR collation range (YYYY-MM-DD). |
+| `--start` | Start of the classification window (YYYY-MM-DD). |
+| `--end` | End of the classification window (YYYY-MM-DD). |
+| `--threshold` / `-t` | Z-score threshold for flagging an outlier. Default: `2.0`. |
+| `--min-samples` | Minimum number of PRs required for analysis. Default: `30`. |
+| `--format` / `-f` | Output format: `table` (default), `json`, or `csv`. |
+| `--exclude-primary-merged` | Exclude PRs whose base branch is `main` or `master`. |
+| `--reset-db` | Delete existing data and force a fresh fetch. |
+| `--verbose` / `-v` | Print progress details. |
+
+### Per-repository analysis
 
 Outlier detection is always **scoped to a single repository**. When you target multiple repositories (via `--org`, multiple `--repo` flags, or a config file), each repository is analysed independently:
 
-1. **Baseline statistics** — mean and standard deviation for every metric are computed from that repository's own merged PRs (optionally restricted to the baseline window).
+1. **Baseline statistics** — mean and standard deviation for every metric are computed from that repository's own merged PRs (optionally restricted to the classification window).
 2. **Z-scores** — each PR is scored against its own repository's statistics, not a cross-repository pool.
 3. **Isolation** — a PR in `owner/repo-a` is never compared against PRs from `owner/repo-b`.
 
-This means thresholds adapt to each project's natural pace and size. A large PR in a small, infrequently-updated repository is judged against that repository's history, not the (potentially very different) norms of a busier sibling repository in the same organisation.
+This means thresholds adapt to each project's natural pace and size.
 
 ```
 repo-a  ──►  stats(repo-a)  ──►  z-scores(repo-a PRs)
 repo-b  ──►  stats(repo-b)  ──►  z-scores(repo-b PRs)
              (independent)
 ```
 
+### Deferred output
+
+When processing multiple repositories, per-repo results are **not** printed as they are produced. Instead:
+
+- After all repositories have been processed, results for every successfully classified repo are printed.
+- Repositories that could not be classified (insufficient data, no PRs found, etc.) are listed in a summary block on stderr at the end.
+
 ### End-to-end example
 
 ```bash
-# 1. Fetch a full year of history as the baseline
+# Option A — two explicit steps
 uv run review-classify fetch --repo owner/repo \
-  --start 2024-01-01 --end 2024-12-31
+  --collate-start 2024-01-01 --collate-end 2024-12-31
 
-# 2. Evaluate PRs from January 2025 against that baseline
-uv run review-classify detect-outliers --repo owner/repo \
-  --classify-start 2024-01-01 \
-  --classify-end   2024-12-31 \
+uv run review-classify classify --repo owner/repo \
+  --start 2024-01-01 \
+  --end   2024-12-31 \
   --format table
+
+# Option B — single combined command
+uv run review-classify fetch-and-classify --repo owner/repo \
+  --collate-start 2024-01-01 --collate-end 2024-12-31 \
+  --start 2024-01-01 --end 2024-12-31
 ```
 
 ## Configuration file
 
-Both `fetch` and `detect-outliers` accept `--config <file.toml>` as an alternative to passing `--repo` / `--org` flags. The file is TOML and supports three sections:
+`fetch`, `classify`, and `fetch-and-classify` all accept `--config <file.toml>` as an alternative to passing `--repo` / `--org` flags. The file is TOML and supports three sections:
 
 | Section | Purpose |
 | --- | --- |
@@ -160,12 +226,12 @@ Both `fetch` and `detect-outliers` accept `--config <file.toml>` as an alternati
 # config.toml
 
 [defaults]
-start           = "2024-01-01"
-end             = "2024-12-31"
-threshold       = 2.0
-min_samples     = 30
-classify_start  = "2024-01-01"
-classify_end    = "2024-06-30"
+collate_start = "2024-01-01"
+collate_end   = "2024-12-31"
+threshold     = 2.0
+min_samples   = 30
+start         = "2024-01-01"
+end           = "2024-06-30"
 
 # Individual repositories ─────────────────────────────────────────────────────
 
@@ -174,11 +240,11 @@ name = "owner/repo-a"
 # inherits all [defaults]
 
 [[repositories]]
-name           = "owner/repo-b"
-start          = "2024-06-01"   # overrides [defaults] start
-threshold      = 2.5            # stricter outlier threshold for this repo
-classify_start = "2024-06-01"
-classify_end   = "2024-09-30"
+name          = "owner/repo-b"
+collate_start = "2024-06-01"   # overrides [defaults] collate_start
+threshold     = 2.5            # stricter outlier threshold for this repo
+start         = "2024-06-01"
+end           = "2024-09-30"
 
 # Organizations ───────────────────────────────────────────────────────────────
 
@@ -188,9 +254,9 @@ name = "my-org"
 exclude_repos = ["my-org/archived-repo", "my-org/fork-only"]
 
 [[organizations]]
-name        = "another-org"
-start       = "2024-03-01"
-min_samples = 20
+name          = "another-org"
+collate_start = "2024-03-01"
+min_samples   = 20
 ```
 
 ### Key rules