Aggregator: opt-in raw-residual mode for SafaPlusNormalizedLandmark fusion#631
Merged
Conversation
…usion #626 introduced SafaPlusNormalizedLandmarkAggregator with the landmark stream as a per-row /max-normalized residual: r_norm = 1 − sim_t/row_max, Gaussian on r_norm at landmark_sigma. That shape is the correct inverse transform when the landmark matrix is on a different scale than the image matrix (the original normalized-correspondence use case). The OSM-tile baseline produces a SAFA-form similarity matrix on the same raw-cosine scale as the image stream. Applying /max-normalization to it suppresses the informative content; we want the same SAFA-form residual (row_max − sim_t) the image stream uses, with sum-of-streams fusion. Add an opt-in landmark_use_raw_residual flag on the aggregator config (default False, preserves #626 behavior bit-exact). When True: - landmark stream calls wag_observation_log_likelihood_from_similarity_matrix (same helper as the image stream), so the output is the elementwise sum of two SAFA streams. - NaN landmark entries are replaced with the finite row-min before the helper call so its .max() isn't poisoned; lm_finite_mask still zeroes those positions out → image-only at NaN indices. Two new tests cover sum-of-streams equivalence and NaN-finite output. This change is what makes the WAG+OSM evaluation in the paper figure meaningfully distinct from #626's "ours" — without it the /max-normalized residual shape mismatches the OSM-baseline matrix's scale.
The aggregator previously called _replace_nan_with_zero on the final log-likelihood, silently swallowing any NaN/±inf produced by the SAFA helper. The aggregator's inputs are expected to be dense similarity matrices (e.g. safa_dinov3_wag_chicago.pt, early_fusion_*.pt) — NaN there indicates a bug (stale matrix, broken export) or an unsuitable sigma, not a legitimate missing-data signal. Make the error loud so it surfaces immediately instead of degrading the eval silently. Replace the silent _replace_nan_with_zero with the same _raise_if_nonfinite helper SafaPlusNormalizedLandmarkAggregator already uses post-fusion. The helper now takes a cls_name parameter so each aggregator's error message names the right class (callers pass type(self).__name__). Adds three tests for SingleSimilarityMatrixAggregator: pass-through matches the SAFA helper on finite input; NaN and ±inf in the similarity row both raise RuntimeError. (ImageLandmarkPrivilegedInformationFusion and EntropyAdaptiveAggregator still use _replace_nan_with_zero; tightening those is left as a follow-up since their landmark/image-only fall-throughs are a more involved policy question.)
The raw-residual branch previously masked NaN landmark entries to 0 (image-only at those positions) by pre-substituting them with sim_min_lm so the SAFA helper's `.max()` wouldn't propagate the NaN, then `torch.where`-ing them back to 0 after. That dance only existed to keep the helper happy with sparse-with-holes input. The raw-residual mode is intended for two SAFA-form dense similarity matrices on the same scale (e.g. WAG image + WAG-on-OSM landmark). A NaN/inf in that landmark matrix is a bug (stale matrix, broken export), not a "no landmark info for this cell" signal. Mirror the contract we just set for SingleSimilarityMatrixAggregator: throw via _raise_if_nonfinite, drop the substitution and the output mask. The normalized-residual branch (default, #626's original behavior) keeps the old image-only-at-NaN fallback — its landmark matrices are sparse by design. Test changes: - test_raw_residual_mode_zeroes_nan_landmark_entries → renamed and rewritten as test_raw_residual_mode_raises_on_nan_landmark. - Added test_raw_residual_mode_raises_on_inf_landmark for symmetry with the SingleSim inf test. - Added test_normalized_residual_mode_falls_back_to_image_only_on_nan to lock in the asymmetric contract on the other branch. Docstring updated to spell out the asymmetric NaN contract.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add an opt-in `landmark_use_raw_residual` flag on `SafaPlusNormalizedLandmarkAggregatorConfig` (default `False`, preserves #626 behavior bit-exact). When `True`, the landmark stream uses the SAFA-form raw residual (`row_max − sim_t`, same helper as the image stream) instead of the per-row /max-normalized residual — appropriate when the second matrix is itself a SAFA-form similarity matrix (e.g. the OSM-tile baseline) on the same raw-cosine scale as the image stream.
NaN handling preserved by substituting the finite row-min into the helper before masking those positions to image-only output, so the helper's `.max()` isn't poisoned.
Why
#626 introduced `SafaPlusNormalizedLandmarkAggregator` with a single landmark-stream shape: `r_norm = 1 − sim_t/row_max`, Gaussian on `r_norm`. That's the correct inverse transform when the landmark matrix is on a different scale than the image matrix (the original normalized-correspondence case).
The WAG+OSM evaluation in the paper figure uses an OSM-tile-baseline SAFA matrix as the second stream — same raw-cosine scale as the image stream. Applying /max-normalization to it suppresses the informative content and we see degraded performance. The new flag matches the residual shape to the data: effective output becomes `wag_helper(img) + wag_helper(lm)`.
This is what makes the WAG+OSM eval in the paper figure meaningfully distinct from #626's "ours" approach.
Test plan