Fix class_weights background voxel calculation to use actual data volume sizes by Copilot · Pull Request #68 · janelia-cellmap/cellmap-data

Copilot · 2026-03-13T18:32:53Z

class_weights was computing background as sum(all_fg_counts) - fg_class, which is semantically wrong — that's "other classes' foreground", not background. True background requires total_voxels_in_volume - fg_class.

Changes

CellMapImage.total_voxels: New property reading the s0 array's spatial shape, computing np.prod(spatial_shape), scaled to training resolution via the existing _scale_count helper
EmptyImage.total_voxels: Returns 0 (no annotated data)
CellMapDataset.total_voxels: Aggregates per-class total_voxels from each class's target source
CellMapMultiDataset.class_weights: Aggregates actual total_voxels across all datasets per class; computes bg = total - fg. Adds a warning log when fg > total_voxels to surface upstream counting inconsistencies

# Before: bg = sum(all fg counts) - fg  ← wrong; counts other-class fg as bg
total_voxels = sum(counts.values())
bg = total_voxels - fg

# After: bg = actual data volume voxels - fg
total = total_voxels[cls]   # from s0 array spatial shape, resolution-scaled
bg = max(total - fg, 0)

💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

…lass_weights Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>

codecov · 2026-03-13T18:56:51Z

Codecov Report

❌ Patch coverage is 18.18182% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.73%. Comparing base (675b614) to head (c8151b3).

Files with missing lines	Patch %	Lines
src/cellmap_data/multidataset.py	0.00%	11 Missing ⚠️
src/cellmap_data/image.py	16.66%	10 Missing ⚠️
src/cellmap_data/dataset.py	28.57%	5 Missing ⚠️
src/cellmap_data/empty_image.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           rewrite      #68      +/-   ##
===========================================
- Coverage    80.84%   79.73%   -1.12%     
===========================================
  Files           23       23              
  Lines         1535     1564      +29     
===========================================
+ Hits          1241     1247       +6     
- Misses         294      317      +23

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Fixes CellMapMultiDataset.class_weights background voxel computation by using true per-class volume sizes (total_voxels - fg) rather than “other classes’ foreground”.

Changes:

Add total_voxels to CellMapImage (s0 spatial shape → voxel count, scaled to training resolution) and EmptyImage (0).
Add CellMapDataset.total_voxels aggregation across target sources.
Update CellMapMultiDataset.class_weights to compute bg = max(total - fg, 0) and warn when fg > total.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
src/cellmap_data/multidataset.py	Recomputes class weights using aggregated per-class total voxels and corrected background calculation.
src/cellmap_data/image.py	Introduces `CellMapImage.total_voxels` derived from s0 spatial shape and scaled to training resolution.
src/cellmap_data/empty_image.py	Adds `EmptyImage.total_voxels = 0` for unannotated placeholders.
src/cellmap_data/dataset.py	Adds per-class `CellMapDataset.total_voxels` mapping from target sources.

src/cellmap_data/image.py

+    @property
+    def total_voxels(self) -> int:
+        """Total number of voxels in the s0 data volume, normalised to training-resolution voxels."""
+        try:
+            s0_path = self._level_info[0][0]
+            s0_arr = zarr.open_array(f"{self.path}/{s0_path}", mode="r")
+            n_spatial = len(self.axes)
+            spatial_shape = s0_arr.shape[-n_spatial:]
+            s0_total = int(np.prod(spatial_shape))
+            return self._scale_count(s0_total, s0_idx=0)


src/cellmap_data/multidataset.py

+        fg_counts = self.class_counts["totals"]
+        # Aggregate actual total voxels per class across all datasets
+        total_voxels: dict[str, int] = {cls: 0 for cls in self.classes}
+        for ds in self.datasets:
+            ds_total = ds.total_voxels
+            for cls in self.classes:


src/cellmap_data/multidataset.py

    def class_weights(self) -> dict[str, float]:
-        """Per-class sampling weight: ``bg_voxels / fg_voxels``."""
-        counts = self.class_counts["totals"]
-        total_voxels = sum(counts.values())
+        """Per-class sampling weight: ``bg_voxels / fg_voxels``.
+
+        Background voxels are computed as the total voxels in the data volume
+        minus the foreground voxels for each class.
+        """
+        fg_counts = self.class_counts["totals"]
+        # Aggregate actual total voxels per class across all datasets
+        total_voxels: dict[str, int] = {cls: 0 for cls in self.classes}
+        for ds in self.datasets:
+            ds_total = ds.total_voxels
+            for cls in self.classes:
+                total_voxels[cls] += ds_total.get(cls, 0)
        weights: dict[str, float] = {}
        for cls in self.classes:
-            fg = counts.get(cls, 0)
-            bg = total_voxels - fg
+            fg = fg_counts.get(cls, 0)
+            total = total_voxels.get(cls, 0)
+            if fg > total > 0:
+                logger.warning(
+                    "class_weights: fg (%d) > total_voxels (%d) for class %r; "
+                    "this may indicate a counting error upstream.",
+                    fg,
+                    total,
+                    cls,
+                )
+            bg = max(total - fg, 0)
            weights[cls] = float(bg) / float(max(fg, 1))
        return weights


src/cellmap_data/dataset.py

+        """Total voxels in the data volume per class, normalised to training-resolution voxels."""
+        totals: dict[str, int] = {}
+        for cls in self.classes:
+            src = self.target_sources.get(cls)
+            totals[cls] = src.total_voxels if src is not None else 0


Initial plan

2b31a91

Copilot AI assigned Copilot and rhoadesScholar Mar 13, 2026

Copilot AI mentioned this pull request Mar 13, 2026

Rewrite #65

Merged

Copilot started work on behalf of rhoadesScholar March 13, 2026 18:32 View session

fix: compute bg_voxels from actual total_voxels in data volumes for c…

c8151b3

…lass_weights Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] [WIP] Address feedback on total_voxels calculation~~ Fix class_weights background voxel calculation to use actual data volume sizes Mar 13, 2026

Copilot AI requested a review from rhoadesScholar March 13, 2026 18:41

Copilot finished work on behalf of rhoadesScholar March 13, 2026 18:41

rhoadesScholar requested a review from Copilot March 13, 2026 18:54

Copilot started reviewing on behalf of rhoadesScholar March 13, 2026 18:55 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

rhoadesScholar closed this Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix class_weights background voxel calculation to use actual data volume sizes#68

Fix class_weights background voxel calculation to use actual data volume sizes#68
Copilot wants to merge 2 commits intorewritefrom
copilot/sub-pr-65-another-one

Copilot AI commented Mar 13, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

codecov bot commented Mar 13, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 13, 2026 •

edited

Loading