Skip to content

Fix class_weights background voxel calculation to use actual data volume sizes#68

Closed
Copilot wants to merge 2 commits intorewritefrom
copilot/sub-pr-65-another-one
Closed

Fix class_weights background voxel calculation to use actual data volume sizes#68
Copilot wants to merge 2 commits intorewritefrom
copilot/sub-pr-65-another-one

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 13, 2026

class_weights was computing background as sum(all_fg_counts) - fg_class, which is semantically wrong — that's "other classes' foreground", not background. True background requires total_voxels_in_volume - fg_class.

Changes

  • CellMapImage.total_voxels: New property reading the s0 array's spatial shape, computing np.prod(spatial_shape), scaled to training resolution via the existing _scale_count helper
  • EmptyImage.total_voxels: Returns 0 (no annotated data)
  • CellMapDataset.total_voxels: Aggregates per-class total_voxels from each class's target source
  • CellMapMultiDataset.class_weights: Aggregates actual total_voxels across all datasets per class; computes bg = total - fg. Adds a warning log when fg > total_voxels to surface upstream counting inconsistencies
# Before: bg = sum(all fg counts) - fg  ← wrong; counts other-class fg as bg
total_voxels = sum(counts.values())
bg = total_voxels - fg

# After: bg = actual data volume voxels - fg
total = total_voxels[cls]   # from s0 array spatial shape, resolution-scaled
bg = max(total - fg, 0)

💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

…lass_weights

Co-authored-by: rhoadesScholar <37990507+rhoadesScholar@users.noreply.github.com>
Copilot AI changed the title [WIP] [WIP] Address feedback on total_voxels calculation Fix class_weights background voxel calculation to use actual data volume sizes Mar 13, 2026
Copilot AI requested a review from rhoadesScholar March 13, 2026 18:41
@rhoadesScholar rhoadesScholar requested a review from Copilot March 13, 2026 18:54
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 18.18182% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.73%. Comparing base (675b614) to head (c8151b3).

Files with missing lines Patch % Lines
src/cellmap_data/multidataset.py 0.00% 11 Missing ⚠️
src/cellmap_data/image.py 16.66% 10 Missing ⚠️
src/cellmap_data/dataset.py 28.57% 5 Missing ⚠️
src/cellmap_data/empty_image.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           rewrite      #68      +/-   ##
===========================================
- Coverage    80.84%   79.73%   -1.12%     
===========================================
  Files           23       23              
  Lines         1535     1564      +29     
===========================================
+ Hits          1241     1247       +6     
- Misses         294      317      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes CellMapMultiDataset.class_weights background voxel computation by using true per-class volume sizes (total_voxels - fg) rather than “other classes’ foreground”.

Changes:

  • Add total_voxels to CellMapImage (s0 spatial shape → voxel count, scaled to training resolution) and EmptyImage (0).
  • Add CellMapDataset.total_voxels aggregation across target sources.
  • Update CellMapMultiDataset.class_weights to compute bg = max(total - fg, 0) and warn when fg > total.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/cellmap_data/multidataset.py Recomputes class weights using aggregated per-class total voxels and corrected background calculation.
src/cellmap_data/image.py Introduces CellMapImage.total_voxels derived from s0 spatial shape and scaled to training resolution.
src/cellmap_data/empty_image.py Adds EmptyImage.total_voxels = 0 for unannotated placeholders.
src/cellmap_data/dataset.py Adds per-class CellMapDataset.total_voxels mapping from target sources.

Comment on lines +556 to +565
@property
def total_voxels(self) -> int:
"""Total number of voxels in the s0 data volume, normalised to training-resolution voxels."""
try:
s0_path = self._level_info[0][0]
s0_arr = zarr.open_array(f"{self.path}/{s0_path}", mode="r")
n_spatial = len(self.axes)
spatial_shape = s0_arr.shape[-n_spatial:]
s0_total = int(np.prod(spatial_shape))
return self._scale_count(s0_total, s0_idx=0)
Comment on lines +76 to +81
fg_counts = self.class_counts["totals"]
# Aggregate actual total voxels per class across all datasets
total_voxels: dict[str, int] = {cls: 0 for cls in self.classes}
for ds in self.datasets:
ds_total = ds.total_voxels
for cls in self.classes:
Comment on lines 70 to 97
def class_weights(self) -> dict[str, float]:
"""Per-class sampling weight: ``bg_voxels / fg_voxels``."""
counts = self.class_counts["totals"]
total_voxels = sum(counts.values())
"""Per-class sampling weight: ``bg_voxels / fg_voxels``.

Background voxels are computed as the total voxels in the data volume
minus the foreground voxels for each class.
"""
fg_counts = self.class_counts["totals"]
# Aggregate actual total voxels per class across all datasets
total_voxels: dict[str, int] = {cls: 0 for cls in self.classes}
for ds in self.datasets:
ds_total = ds.total_voxels
for cls in self.classes:
total_voxels[cls] += ds_total.get(cls, 0)
weights: dict[str, float] = {}
for cls in self.classes:
fg = counts.get(cls, 0)
bg = total_voxels - fg
fg = fg_counts.get(cls, 0)
total = total_voxels.get(cls, 0)
if fg > total > 0:
logger.warning(
"class_weights: fg (%d) > total_voxels (%d) for class %r; "
"this may indicate a counting error upstream.",
fg,
total,
cls,
)
bg = max(total - fg, 0)
weights[cls] = float(bg) / float(max(fg, 1))
return weights
Comment on lines +424 to +428
"""Total voxels in the data volume per class, normalised to training-resolution voxels."""
totals: dict[str, int] = {}
for cls in self.classes:
src = self.target_sources.get(cls)
totals[cls] = src.total_voxels if src is not None else 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants