Skip to content

Fix unused imports, min_redundant_inds remainder bug, and ClassBalancedSampler index mapping#70

Merged
rhoadesScholar merged 4 commits intorewritefrom
copilot/sub-pr-65
Mar 20, 2026
Merged

Fix unused imports, min_redundant_inds remainder bug, and ClassBalancedSampler index mapping#70
rhoadesScholar merged 4 commits intorewritefrom
copilot/sub-pr-65

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 20, 2026

Addresses several correctness and lint issues flagged in code review on the rewrite PR.

Unused imports

Removed F401 violations across test files:

  • import json from test_dataset.py
  • import pytest from test_geometry.py, test_sampler.py
  • import tempfile, pathlib, import torch from test_multidataset.py
  • import zarr as z from test_image.py

min_redundant_inds remainder bug

When replacement=False and k > n, the previous implementation silently dropped indices when k % n != 0:

# Before: k=7, n=3 → returned 6 indices, not 7
return torch.cat([torch.randperm(n, generator=rng) for _ in range(k // n)])

# After: always returns exactly k indices
full_perms = k // n
remainder = k % n
parts = [torch.randperm(n, generator=rng) for _ in range(full_perms)]
if remainder > 0:
    parts.append(torch.randperm(n, generator=rng)[:remainder])
return torch.cat(parts)

ClassBalancedSampler fixes

Empty active classes guard__init__ now raises ValueError immediately if the crop-class matrix has no annotated classes, rather than crashing with a cryptic error on active_counts.min() at iteration time.

Correct sample index mapping__iter__ previously yielded crop_idx directly, which is a row in the crop-class matrix (one row per dataset/sub-dataset), not a sample index. This meant only indices [0, n_crops) were ever yielded, ignoring the vast majority of samples. Now maps crop rows to actual sample indices:

  • n_crops == len(dataset): direct mapping (no-op, preserves existing behavior)
  • Dataset has datasets + cumulative_sizes (CellMapMultiDataset / ConcatDataset): samples randomly within the matched sub-dataset's global index range
  • Generic fallback: partitions [0, len(dataset)) into n_crops contiguous segments and samples within the crop's segment

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…dSampler

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Agent-Logs-Url: https://github.com/janelia-cellmap/cellmap-data/sessions/1e323ffb-2a14-4aae-99e1-8e70c1d393d3
Copilot AI changed the title [WIP] Rewrite CellMap-Data I/O and test suite to Zarr implementation Fix unused imports, min_redundant_inds remainder bug, and ClassBalancedSampler index mapping Mar 20, 2026
Copilot AI requested a review from rhoadesScholar March 20, 2026 17:40
@rhoadesScholar rhoadesScholar marked this pull request as ready for review March 20, 2026 18:27
@rhoadesScholar rhoadesScholar requested a review from Copilot March 20, 2026 18:27
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 57.14286% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.97%. Comparing base (c7141ce) to head (4448a56).
⚠️ Report is 9 commits behind head on rewrite.

Files with missing lines Patch % Lines
src/cellmap_data/sampler.py 48.27% 15 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           rewrite      #70      +/-   ##
===========================================
- Coverage    81.36%   80.97%   -0.40%     
===========================================
  Files           23       23              
  Lines         1567     1603      +36     
===========================================
+ Hits          1275     1298      +23     
- Misses         292      305      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses lint/correctness issues in CellMap-Data’s sampling utilities and cleans up unused test imports, improving both reliability and code hygiene.

Changes:

  • Remove unused imports in multiple test modules to resolve F401 lint violations.
  • Fix min_redundant_inds to always return exactly k indices when replacement=False and k > n (including remainder handling).
  • Fix ClassBalancedSampler by adding an early “no active classes” guard and mapping crop-matrix rows to actual dataset sample indices during iteration.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_sampler.py Removes unused pytest import.
tests/test_multidataset.py Removes unused imports (torch, tempfile, pathlib).
tests/test_image.py Removes unused local zarr import.
tests/test_geometry.py Removes unused pytest import.
tests/test_dataset.py Removes unused local json import.
src/cellmap_data/utils/misc.py Fixes remainder handling in min_redundant_inds for k > n without replacement.
src/cellmap_data/sampler.py Adds empty-active-class guard and corrects sampler output to yield dataset sample indices instead of crop-matrix row indices.

Comment on lines +95 to +99
# Map crop index (dataset-level row) to an actual sample index.
# If n_crops equals len(dataset), the crop index IS the sample index.
if self.n_crops == len(self.dataset):
sample_idx = crop_idx
elif hasattr(self.dataset, "datasets") and hasattr(
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sampler now yields a mapped sample_idx, but the surrounding terminology still calls this a “crop index” (and the class docstring describes yielding “that crop index”). This makes the API/behavior confusing for callers and future maintainers. Consider renaming crop_idx to something like row_idx (matrix row) and updating the docstring/comments to reflect that __iter__ yields dataset sample indices (global indices for Concat/MultiDataset).

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… fix naming and docstrings

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Agent-Logs-Url: https://github.com/janelia-cellmap/cellmap-data/sessions/b1ce1f6e-fdc4-47b9-aa82-562c86c4800d
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comment on lines +127 to +128
if self.n_crops <= 1 or total <= 0:
start, end = 0, max(total, 1)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the generic fallback mapping, when len(self.dataset) == 0 this branch sets end = max(total, 1) and can yield sample_idx = 0, which is an invalid index for an empty dataset (and will likely crash downstream when DataLoader tries to fetch it). Consider explicitly erroring out when total == 0 (or forcing samples_per_epoch to 0) instead of manufacturing a [0, 1) range.

Suggested change
if self.n_crops <= 1 or total <= 0:
start, end = 0, max(total, 1)
if total <= 0:
raise ValueError(
"ClassBalancedSampler: cannot sample from an empty dataset "
"in generic fallback mapping (len(dataset) == 0). "
"Consider setting samples_per_epoch=0 or removing the "
"sampler for this split."
)
if self.n_crops <= 1:
start, end = 0, total

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +116
def test_raises_when_no_active_classes(self):
"""All-False crop-class matrix must raise ValueError immediately."""
matrix = np.zeros((4, 3), dtype=bool)
with pytest.raises(ValueError, match="no active classes"):
self._make_sampler(matrix, samples_per_epoch=5)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says import pytest was removed from tests/test_sampler.py, but this file still imports (and now uses) pytest in the newly added test_raises_when_no_active_classes. Consider updating the PR description to avoid confusion when reviewing the unused-import cleanup.

Copilot uses AI. Check for mistakes.
@rhoadesScholar rhoadesScholar merged commit 6fb390c into rewrite Mar 20, 2026
15 of 17 checks passed
@rhoadesScholar rhoadesScholar deleted the copilot/sub-pr-65 branch March 20, 2026 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants