Skip to content

paper_convergence_plot: 3-method support, variable-length paths#628

Merged
efahnestock merged 6 commits into
mainfrom
paper-plot-multi-method
May 20, 2026
Merged

paper_convergence_plot: 3-method support, variable-length paths#628
efahnestock merged 6 commits into
mainfrom
paper-plot-multi-method

Conversation

@efahnestock
Copy link
Copy Markdown
Collaborator

@efahnestock efahnestock commented May 11, 2026

Summary

Refactor paper_convergence_plot.py so the figure-B convergence-curves plot and its companion summary table can render the current paper figure (NY → SF → Boston → Boston Night → Middletown → Norway → Fort Myers → Framingham, three methods compared, drop Seattle).

  • N-method generalization. The figure and table both supported a 2-method paired comparison; generalize to N. A third method (osm) is wired in alongside the existing safa/ours slots, with subset coverage allowed (envs without that method's results are silently skipped, table shows --). Per-row bolding picks the minimum across methods.
  • Variable-length paths. The previous code asserted every path under a given (env, method) had the same step count; that broke once the eval started writing paths of different lengths. Each path's prob_mass is now interpolated onto a common distance grid; aggregation uses nanmean / nanstd, and grid points where fewer than min_paths_for_ci (=10) paths contributed are truncated from the rendered curve.
  • Env list updated for the current figure. Drops Seattle, adds NewYork, reorders to NY → SF → Boston Snowy → Boston Night → Middletown → Norway → Fort Myers → Framingham. 4×2 grid fits cleanly.
  • CLI labels swappable. New --osm_label and --baseline_label flags so the displayed names (WAG, WAG+OSM, Ours, etc.) can be set without code edits. Existing --osm_dir / --osm_method already cover swapping the third method's source.
  • Defaults point at the current v6 results dirs. No-arg invocation produces a sane figure.

Test plan

  • Render with three current-paper methods: --osm_dir .../260504_160045_osm_baseline --osm_method safa_plus_osm_safa_raw --osm_label WAG+OSM — produces 4×2 grid, 8 envs, all three methods plotted with shaded 95% CIs.
  • LaTeX summary table has the expected per-row bolding and -- cells where a method lacks an env.
  • No-arg invocation still works.

Related

Built on top of the matrix-row-assert fix in #627, which is what allowed the v6 results for Boston / Boston Night to be regenerated and finally show non-saturated curves on this plot.


Update (commit 1478642)

Follow-up on top of the 3-method refactor with everything needed to produce the current paper figure:

  • 4-method support. New --early_dir / --early_method / --early_label CLI flags wire a fourth method (early, green #4CAF50) into method_dirs / method_colors / method_labels with the same subset-coverage semantics as osm. Skipped silently when --early_dir is not provided, so the 3-method invocation still works.
  • Table polish. \begin{table*} for two-column page span; $\downarrow$ markers on metric headers (all metrics are "lower is better"); rename "Error" → "Final Error"; drop CC₅₀ row (CC₁₀₀ + Final Error is the canonical reporting set); CIs to 1 decimal place so very tight bands don't render as the misleading $\pm$ 0.
  • post_hurricane_ian_sw. Swap in ENVIRONMENTS, with a matching DISPLAY_NAMES entry so the visible label stays "Fort Myers". Reflects the newer Fort Myers subset (1073 panos / 471 822 sat patches, slightly shifted bbox).
  • y-axis label "P(mass within r)" → "Probability mass within radius".
  • dataset_statistics.py (2-line addition): PANO_LANDMARK_DIR_MAP gains NewYork and mapillary/post_hurricane_ian_sw so per-pano landmark counts populate for the paper table's NY row and the swapped Fort Myers row.

Example invocation for the final 4-method figure:

bazel run //experimental/overhead_matching/swag/scripts:paper_convergence_plot -- \
  --output_dir=/data/overhead_matching/evaluation \
  --osm_dir=/data/overhead_matching/evaluation/results/260504_160045_osm_baseline \
  --osm_method=safa_plus_osm_safa_raw --osm_label="WAG+OSM" \
  --early_dir=/data/overhead_matching/evaluation/results/260513_early_fusion_no_positions_v1 \
  --early_method=early_fusion_no_positions_v1 --early_label="Early Fusion"

…pable labels

- Generalize the figure and table to N methods (was paired-2). Adds an OSM
  third method that can cover a subset of envs and skips silently for envs
  it's missing. Bolding picks the per-row minimum.
- Replace fixed-length-path assumption with per-path interpolation onto a
  common distance grid; mean/CI use nan-aware reductions. Resolves the eval
  format change to variable-length paths.
- Env list updated for the current paper figure: drop Seattle, add NewYork,
  order NY → SF → Boston → Boston Night → Middletown → Norway → Fort Myers
  → Framingham.
- CLI: --osm_dir, --osm_method, --osm_label, --baseline_label for swapping
  the third-method source and the displayed names. Defaults: WAG baseline,
  DINOv3+OSM third method (override per call as needed).
- Defaults point at the current v6 results dirs.
…e_ian_sw

Builds on the 3-method refactor at the head of this branch.

- 4th method 'early' wired into method_dirs/method_colors (green #4CAF50)
  /method_labels with subset-coverage semantics matching 'osm'; new
  --early_dir/--early_method/--early_label CLI flags. The 'early' slot is
  skipped when --early_dir is not provided so the no-arg / 3-method
  invocation still works.
- Table polish:
  - \begin{table*} so the table spans both columns in a 2-column paper
    layout
  - down-arrow markers on metric headers (all metrics are "lower is better")
  - rename "Error" → "Final Error"
  - drop CC50 row; CC100 + Final Error is the canonical reporting set
  - CI to 1 decimal place (e.g. 9±0.1 instead of 9±0)
- y-axis label "P(mass within r)" → "Probability mass within radius"
- ENVIRONMENTS: swap post_hurricane_ian → post_hurricane_ian_sw to match
  the newer Fort Myers subset (1073 panos / 471 822 sat patches, slightly
  shifted bbox); DISPLAY_NAMES gains a post_hurricane_ian_sw → "Fort Myers"
  entry so the visible label is unchanged.
- dataset_statistics.py: add NewYork and mapillary/post_hurricane_ian_sw
  to PANO_LANDMARK_DIR_MAP so per-pano landmark counts populate for the
  paper table's NY row and the swapped Fort Myers row.
Comment thread experimental/overhead_matching/swag/scripts/paper_convergence_plot.py Outdated
@efahnestock efahnestock merged commit 870a336 into main May 20, 2026
3 checks passed
@efahnestock efahnestock deleted the paper-plot-multi-method branch May 20, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant