Implement `commitlens cochange [--min N] [--top N]` — pairs of files that frequently appear in the same commit.
Deliverable
Module: `src/commitlens/cochange.py`. Register in `_register_subcommands`.
CLI args: `--min N` (minimum co-occurrence count to include; default 3), `--top N` (cap output rows; default 30).
Output schema
```json
{
"_render": "",
"pairs": [
{"a": "src/auth.py", "b": "tests/test_auth.py", "together": 18, "a_total": 22, "b_total": 19, "jaccard": 0.78},
...
]
}
```
`together` is the count of commits that touched BOTH files. `a_total` / `b_total` are the total commits touching each. `jaccard` is `together / (a_total + b_total - together)`, rounded to 2 decimals. Sort by `jaccard` desc, then `together` desc.
Filter out:
- pairs where together < `--min`
- pairs where one of the files is the same path
Human render
Fixed-width table. Columns: a (truncate to 35), b (truncate to 35), together, jaccard (display as 0.78).
Algorithm
For each commit (`git log --name-only --pretty=format:%H`), collect the set of paths. Increment counters for each unique pair (a, b) with a < b. Aggregate at the end. Skip commits that touched only one file.
Memory: a 1000-commit repo with ~10 files/commit = ~50000 pair entries. Use `collections.Counter` keyed on tuples; this is fine. Don't worry about scaling beyond that.
Tests
`tests/test_cochange.py` using `git_repo`:
- Two files committed together 4 times → appears in output (above default min=3).
- Two files committed together 1 time → does NOT appear.
- A pair `(foo, bar)` and a pair `(foo, baz)` — assert ordering by jaccard.
- A commit touching only one file doesn't blow up the algorithm.
Implement `commitlens cochange [--min N] [--top N]` — pairs of files that frequently appear in the same commit.
Deliverable
Module: `src/commitlens/cochange.py`. Register in `_register_subcommands`.
CLI args: `--min N` (minimum co-occurrence count to include; default 3), `--top N` (cap output rows; default 30).
Output schema
```json
{
"_render": "",
"pairs": [
{"a": "src/auth.py", "b": "tests/test_auth.py", "together": 18, "a_total": 22, "b_total": 19, "jaccard": 0.78},
...
]
}
```
`together` is the count of commits that touched BOTH files. `a_total` / `b_total` are the total commits touching each. `jaccard` is `together / (a_total + b_total - together)`, rounded to 2 decimals. Sort by `jaccard` desc, then `together` desc.
Filter out:
Human render
Fixed-width table. Columns: a (truncate to 35), b (truncate to 35), together, jaccard (display as 0.78).
Algorithm
For each commit (`git log --name-only --pretty=format:%H`), collect the set of paths. Increment counters for each unique pair (a, b) with a < b. Aggregate at the end. Skip commits that touched only one file.
Memory: a 1000-commit repo with ~10 files/commit = ~50000 pair entries. Use `collections.Counter` keyed on tuples; this is fine. Don't worry about scaling beyond that.
Tests
`tests/test_cochange.py` using `git_repo`: