From fa9c229621158ec2ca5192021eddf784d3273baa Mon Sep 17 00:00:00 2001
From: Joel Natividad <1980690+jqnatividad@users.noreply.github.com>
Date: Sun, 28 Jun 2026 01:03:19 -0400
Subject: [PATCH 1/6] feat(viz): point-in-polygon choropleth from user GeoJSON
(smart + standalone)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Bin each row's lat/lon directly into a user-supplied GeoJSON's polygons and
use the matched feature id as the location — exact, works at any admin level,
and needs no geocoding or GeoNames lookup. Even-odd ray casting handles holes
and MultiPolygon. Points outside every region snap to the nearest feature by
default; --no-snap drops them. Either way a stderr coverage note reports how
many points missed every polygon (pip_assign distinguishes Inside/Snapped/
Outside so snapped points are visible, not silently absorbed).
Wired into `viz smart` as the "Regions" panel when --geojson is supplied, and
into standalone `viz choropleth`. Zero new Cargo.lock crates (hand-rolled on
geojson 0.24, default-features off). Gallery: seismic dashboard now leads with
a Japan-prefecture choropleth (examples/viz/japan_prefectures.geojson).
Why PIP and not a code/name join: GeoNames admin1 `code` is alphabetical
(JP.01=Aichi), not ISO 3166-2 (JP46=Kagoshima), and no dump carries ISO
3166-2, so the only sub-national key is the locale-fragile UTF-8 name. PIP on
lat/lon sidesteps the join entirely.
Closes part of #302.
Co-Authored-By: Claude Opus 4.8 (1M context)
---
CHANGELOG.md | 1 +
Cargo.lock | 1 +
Cargo.toml | 9 +-
docs/help/viz.md | 11 +-
examples/viz/README.md | 13 +-
examples/viz/gallery.html | 4 +-
examples/viz/gen_gallery.py | 14 +-
examples/viz/japan_prefectures.geojson | 1 +
examples/viz/smart_geospatial.html | 30 +-
src/cmd/viz.rs | 586 ++++++++++++++++++++++++-
tests/test_viz.rs | 119 +++++
11 files changed, 742 insertions(+), 47 deletions(-)
create mode 100644 examples/viz/japan_prefectures.geojson
diff --git a/CHANGELOG.md b/CHANGELOG.md
index e45d82f63..3253d8cb1 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `viz smart` now auto-wires the new chart types: a **3D scatter** of the strongest-correlation triple when there are 3+ numeric columns; a **2D density contour** instead of the correlated-pair scatter for large datasets (where a scatter overplots); and an offline **ScatterGeo projection** world-overview instead of mapbox tiles when the coordinates span a continental/global extent ([#302](https://github.com/dathere/qsv/issues/302)).
- `viz smart` box plots now overlay sample points via a size-based heuristic — all points for small data, Tukey outliers for medium, none for large (a fast cache-only quartile box) — overridable with `--box-points` (now accepted by `smart`, not just `box`) ([#302](https://github.com/dathere/qsv/issues/302)).
- `viz smart` frequency bar charts now show a `(NULL)` bar for empty cells and an `Other (N)` aggregate bar for the categories beyond `--limit` (N = the count of distinct categories rolled up), matching `qsv frequency`'s default output. Both aggregate bars are drawn in a muted grey so they read as summaries rather than real categories. New `--no-nulls` and `--no-other` flags suppress them ([#302](https://github.com/dathere/qsv/issues/302)).
+- `viz choropleth` & `viz smart` can now build a choropleth from a user-supplied GeoJSON by **point-in-polygon binning**: each row's `--lat`/`--lon` is tested directly against the GeoJSON polygons (even-odd ray casting, handling holes & MultiPolygon) and the matched feature id becomes the location — exact, works for any country/admin level, and needs no geocoding or GeoNames lookup. Points outside every region snap to the nearest feature by default (`--no-snap` drops them instead); either way a coverage note reports how many points missed every polygon. Wired into the `viz smart` dashboard as the "Regions" panel when a `--geojson` is supplied. Zero new dependencies ([#302](https://github.com/dathere/qsv/issues/302)).
### Changed
- `viz smart`: the leading **overview panels** (map/geo, correlation heatmap and its scatter/contour/3D drill-downs, and the time-series trend) now each span the **full dashboard width** on their own row, instead of being squeezed into a half-width grid cell. The per-column box/bar/histogram panels still flow in the `--grid-cols`-wide grid below. Applies to all render paths (typed subplot grid, raw-JSON static export, and the inline-div HTML grid) ([#302](https://github.com/dathere/qsv/issues/302)).
diff --git a/Cargo.lock b/Cargo.lock
index 1501818b6..419db8269 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -6841,6 +6841,7 @@ dependencies = [
"futures",
"futures-util",
"gender_guesser",
+ "geojson",
"geosuggest-core",
"geosuggest-utils",
"geozero",
diff --git a/Cargo.toml b/Cargo.toml
index 9a2b2a110..09e296efd 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -209,6 +209,7 @@ flexi_logger = { version = "0.31", features = [
futures = "0.3"
futures-util = "0.3"
gender_guesser = { version = "0.2", optional = true }
+geojson = { version = "0.24", default-features = false, optional = true }
geosuggest-core = { version = "0.8", features = ["geoip2"], optional = true }
geosuggest-utils = { version = "0.8", optional = true }
geozero = { version = "0.15", features = [
@@ -632,7 +633,13 @@ to = ["csvs_convert"]
# (`viz smart`) from CSV data via the plotly crate. Base feature = self-contained
# interactive HTML output (plotly_embed_js); NO polars dependency. `viz smart`
# reuses qsv's in-process stats + frequency caches. See `qsv viz --help` and #302.
-viz = ["dep:plotly", "dep:opener", "plotly/plotly_embed_js", "base64-simd"]
+viz = [
+ "dep:plotly",
+ "dep:opener",
+ "dep:geojson",
+ "plotly/plotly_embed_js",
+ "base64-simd",
+]
# viz_static: adds static PNG/SVG/PDF/JPEG/WebP export to `viz` via plotly_static,
# which drives a headless Chromium/Firefox (webdriver auto-downloaded). Requires a
# browser at runtime; keep out of big-endian / headless-only publish targets.
diff --git a/docs/help/viz.md b/docs/help/viz.md
index 1612deb8e..418847539 100644
--- a/docs/help/viz.md
+++ b/docs/help/viz.md
@@ -259,6 +259,12 @@ qsv viz choropleth counties.csv --locations fips --value pop --map --geojson cou
qsv viz choropleth stops.csv --geocode --lat lat --lon lon -o by_country.html
```
+> Point-in-polygon: bin lat/lon points into custom GeoJSON regions by count (no geocode)
+
+```console
+qsv viz choropleth quakes.csv --lat lat --lon lon --geojson prefectures.geojson --feature-id-key properties.id -o by_pref.html
+```
+
For more examples, see [tests](https://github.com/dathere/qsv/blob/master/tests/test_viz.rs).
See also
@@ -346,9 +352,10 @@ qsv viz --help
| `‑‑location‑mode` | string | How --locations values are matched to regions. One of: iso3 (the default, ISO-3166-1 alpha-3 country codes), usa-states (2-letter US state codes), country-names (full country names), geojson-id (match a --geojson feature id). | `iso3` |
| `‑‑color‑scale` | string | Colorscale for the region fill. One of: viridis (the default), cividis, greys, greens, blues, reds, ylgnbu, ylorrd, bluered, rdbu, portland, electric, jet, hot, blackbody, earth, picnic, rainbow. | `viridis` |
| `‑‑map` | flag | Render on a token-free MapLibre tile basemap (a ChoroplethMap) instead of the default projection basemap. Requires --geojson and --feature-id-key. Reuses --style for the basemap. | |
-| `‑‑geojson` | string | Custom region polygons as a local file path or an http(s) URL to a GeoJSON FeatureCollection. Required for --map, and for the geojson-id location mode. | |
-| `‑‑feature‑id‑key` | string | Property path in each GeoJSON feature whose value matches an entry in the locations column (e.g. id, properties.fips). | `id` |
+| `‑‑geojson` | string | Custom region polygons as a local file path or an http(s) URL to a GeoJSON FeatureCollection. Required for --map, and for the geojson-id location mode. Also enables point-in-polygon binning: with --lat/--lon (and without --geocode), each row's point is binned into the region whose polygon contains it (exact, no geocoding) and colored by --value/--agg or counts. | |
+| `‑‑feature‑id‑key` | string | Property path in each GeoJSON feature whose value matches an entry in the locations column, or that labels each binned region (e.g. id, properties.fips). | `id` |
| `‑‑geocode` | flag | Derive the region codes by reusing qsv's geocode engine (needs a build with the geocode feature). Either reverse-geocode the lat/lon points, or forward-geocode the locations name column. Only valid with location modes iso3 or usa-states. `viz choropleth` also reuses --value, --agg, --style and the lat/lon options. | |
+| `‑‑no‑snap` | flag | For point-in-polygon binning (lat/lon points binned into a custom GeoJSON without geocoding): drop points that fall outside every region instead of snapping each to its nearest region (the default). A stderr note reports coverage either way. | |
diff --git a/examples/viz/README.md b/examples/viz/README.md
index 76314ff74..1afec5547 100644
--- a/examples/viz/README.md
+++ b/examples/viz/README.md
@@ -56,7 +56,7 @@ as `text/plain`, so a browser won't render it):
| `world_cities.csv` | 33 major cities spanning all **seven continents** (incl. two Antarctic stations): `country`, `continent`, `lat`/`lon`, `metro_population_m`, `elevation_m`, `avg_annual_temp_c` | `smart --dictionary infer` (global geo map + per-COUNTRY choropleth via `fitbounds` with `geocode` + a seven-continent bar + box panels) |
| `us_cities.csv` | 54 US cities across ~35 states: `lat`/`lon`, `census_region`, `population_m`, `median_age` | `smart` (US point map + per-US-STATE choropleth with `geocode` + box/bar/correlation panels) |
| `customer_spend.csv` | 300 customers: a bimodal `monthly_spend`, a right-skewed `account_age_days`, plan/region categoricals, an ID | `smart --smarter` (moarstats-informed: histogram + box hints) |
-| `seismic_events.csv` | 417 synthetic Japanese earthquakes: `timestamp`, `lat`/`lon`, a bimodal `depth_km`, a right-skewed `magnitude` correlated with `felt_reports`, a `tsunami` boolean, `region`, an ID | `smart --smarter` (the full geospatial dashboard: map + time-series + correlation + scatter + histogram + boxes + bars) |
+| `seismic_events.csv` + `japan_prefectures.geojson` | 417 synthetic Japanese earthquakes (`timestamp`, `lat`/`lon`, a bimodal `depth_km`, a right-skewed `magnitude` correlated with `felt_reports`, a `tsunami` boolean, `region`, an ID), plus a GeoJSON of the 47 prefectures keyed by `properties.id` (ISO 3166-2) | `smart --smarter --geojson japan_prefectures.geojson --feature-id-key properties.id` (the full geospatial dashboard: map + **prefecture choropleth via point-in-polygon binning** + time-series + correlation + scatter + histogram + boxes + bars) |
| `delivery_stops.csv` | 90 delivery stops clustered in metro Denver + 4 bad-geocode strays in neighboring states, with `zone`/`vehicle` categoricals, `packages`, and correlated `weight_kg`/`distance_km`/`delivery_minutes` numerics over a `delivered_date` | `smart` (geographic outlier markers + core/full extent boxes, Core/Full zoom buttons & spatial-extent call-out with `geocode`; plus boxes, bars, correlation heatmap, strongest-pair scatter & a time-series — no `--smarter` needed) |
## The smart dashboard
@@ -162,11 +162,16 @@ qsv viz smart quakes.csv -o quakes_dashboard.html
# them out in the spatial-extent label, e.g. "... — 4 outliers (Wyoming, Kansas & Nebraska)"
qsv viz smart delivery_stops.csv -o delivery_dashboard.html
-# the full geospatial dashboard: a map, a time-series, a correlation heatmap + drill-down
-# scatter, a bimodal-depth histogram, annotated boxes and frequency bars — all auto-chosen.
+# the full geospatial dashboard: a map, a prefecture choropleth, a time-series, a correlation
+# heatmap + drill-down scatter, a bimodal-depth histogram, annotated boxes and frequency bars —
+# all auto-chosen. --geojson + --feature-id-key add a point-in-polygon prefecture choropleth: each
+# quake is binned into the GeoJSON region that contains it (no geocoding). This catalog is mostly
+# offshore, so each such quake snaps to its nearest prefecture; add --no-snap to drop offshore
+# points and color on-land prefectures only. A stderr note reports coverage either way.
# Recognized lat/lon columns are charted on the map only, not as redundant distribution panels.
# Rendered with the built-in plotly_dark theme (--theme works on every chart type, incl. smart).
-qsv viz smart seismic_events.csv --smarter --theme plotly_dark --grid-cols 3 -o seismic_dashboard.html
+qsv viz smart seismic_events.csv --smarter --theme plotly_dark --grid-cols 3 \
+ --geojson japan_prefectures.geojson --feature-id-key properties.id -o seismic_dashboard.html
```
### dictionary-guided hierarchy panels (treemap / sunburst)
diff --git a/examples/viz/gallery.html b/examples/viz/gallery.html
index 2bf81ccf0..b9872f3cf 100644
--- a/examples/viz/gallery.html
+++ b/examples/viz/gallery.html
@@ -29,7 +29,7 @@
qsv viz — chart gallery
examples/viz/. Generated with the viz feature; each chart is fully interactive.
-smart dashboard (--smarter, geospatial)One `qsv viz smart seismic_events.csv --smarter --theme plotly_dark --grid-cols 3` command, 10 auto-chosen panels — nearly every panel type at once on a synthetic catalog of Japanese earthquakes. Things the raw table hides but the dashboard makes obvious: depth_km is bimodal (two populations — shallow interplate quakes ~20 km and the deep Wadati-Benioff slab ~450 km — so --smarter draws a histogram, not a box that would average the peaks away); the points trace Japan's subduction arcs on the map; magnitude vs felt_reports is almost perfectly correlated (r=0.95); magnitude and felt_reports are right-skewed with flagged outliers; and the magnitude-over-time trend spikes during a September aftershock sequence. Coordinate columns are shown on the map only, not re-charted as distributions. Rendered with the built-in plotly_dark theme.
+smart dashboard (--smarter, geospatial)One `qsv viz smart seismic_events.csv --smarter --theme plotly_dark --grid-cols 3 --geojson japan_prefectures.geojson --feature-id-key properties.id` command, 11 auto-chosen panels — nearly every panel type at once on a synthetic catalog of Japanese earthquakes. Things the raw table hides but the dashboard makes obvious: depth_km is bimodal (two populations — shallow interplate quakes ~20 km and the deep Wadati-Benioff slab ~450 km — so --smarter draws a histogram, not a box that would average the peaks away); the points trace Japan's subduction arcs on the map; and a prefecture choropleth bins each quake into the GeoJSON region that contains it (point-in-polygon, no geocoding) — most of this catalog is offshore Pacific seismicity, so each such quake is snapped to its nearest prefecture (coloring the Tōhoku/Hokkaidō coast; pass --no-snap for an on-land-only view). magnitude vs felt_reports is almost perfectly correlated (r=0.95); magnitude and felt_reports are right-skewed with flagged outliers; and the magnitude-over-time trend spikes during a September aftershock sequence. Coordinate columns are shown on the map only, not re-charted as distributions. Rendered with the built-in plotly_dark theme.smart dashboard (geographic outliers)`qsv viz smart delivery_stops.csv` — delivery stops clustered in metro Denver with four bad-geocode strays. Points far from the cluster centroid (beyond the Tukey far-out fence of their distances) are flagged as geographic outliers: drawn as distinct amber markers, drawn outside the purple (filled) spatial-extent box, and excluded from the auto-zoom — so the default view stays tight on the core cluster. A second, dashed-magenta no-fill box marks the full extent (core + outliers); use the Core extent / Full extent buttons at the top-left of the map to jump between the tight core view and the full spread (where the strays and the magenta box become visible). In the full qsv viz smart HTML output the spatial-extent label calls them out — Colorado, United States — 4 outliers (Wyoming, Kansas & Nebraska) — while strays within the core's own jurisdiction are folded back in silently instead. Each stop also carries delivery attributes (packages, weight_kg, distance_km, delivery_minutes, a vehicle class and a delivered_date), so beyond the map the auto-profiler fills the dashboard out with box plots, frequency bars, a correlation heatmap, the strongest-pair scatter (packages vs weight_kg) and a delivered-over-time trend — all without --smarter.smart dashboardAuto-profiled overview: correlation heatmap + box plots + frequency bars, led by a drill-down sunburst. `viz smart` now SKIPS an auto hierarchy when the candidate dimensions are statistically independent (nesting them would just replicate each level's marginal); sales_sample's region/payment_method/product_category are independent, so `--hierarchy-style sunburst` is passed to deliberately showcase the interactive sunburst.smart dashboard (--smarter)Same auto-profiler with `--smarter`, which runs `qsv moarstats --advanced` itself to enrich the stats cache in one step: the bimodal monthly_spend column renders as a histogram (a box plot would hide its two peaks), and the skewed account_age_days box is annotated with its skew direction and outlier share.
@@ -65,7 +65,7 @@
qsv viz — chart gallery
smart dashboard (--dictionary infer, world choropleth)`qsv viz smart world_cities.csv --dictionary infer` — cities across all seven continents: `viz smart` reverse-geocodes the points and adds a per-country choropleth (cities-per-country, ISO-3) framed to the filled-country geometries via Plotly fitbounds — so the regions are never clipped at the viewport edge — beside the natural-earth point map (crimson markers so coastal/island points read against the ocean), plus a seven-continent breakdown. A describegpt-inferred Data Dictionary supplies the friendly field labels (e.g. Metro Population, Avg Annual Temp). Note: the choropleth is reverse-geocoded from lat/lon, so the two Antarctic stations — which have no sovereign country — snap to the nearest administering territory (McMurdo → NZ's Ross Dependency, Rothera → the Argentine sector); the seven-continent grouping instead comes from the dataset's own continent column. Requires a local LLM; the committed HTML is reused on regen.
@@ -51,7 +51,7 @@
seismic_events.csv — data overview
@@ -59,7 +59,7 @@
seismic_events.csv — data overview
@@ -67,15 +67,15 @@
seismic_events.csv — data overview
-
-
+
+
@@ -83,7 +83,7 @@
seismic_events.csv — data overview
@@ -91,7 +91,7 @@
seismic_events.csv — data overview
@@ -99,7 +99,7 @@
seismic_events.csv — data overview
@@ -107,7 +107,15 @@
seismic_events.csv — data overview
+
+
+
+
+
+
diff --git a/src/cmd/viz.rs b/src/cmd/viz.rs
index dbf31fd8d..1378025c0 100644
--- a/src/cmd/viz.rs
+++ b/src/cmd/viz.rs
@@ -162,6 +162,9 @@ Examples:
# Reverse-geocode lat/lon points to ISO-3 codes, then count per country (needs geocode feature)
qsv viz choropleth stops.csv --geocode --lat lat --lon lon -o by_country.html
+ # Point-in-polygon: bin lat/lon points into custom GeoJSON regions by count (no geocode)
+ qsv viz choropleth quakes.csv --lat lat --lon lon --geojson prefectures.geojson --feature-id-key properties.id -o by_pref.html
+
For more examples, see https://github.com/dathere/qsv/blob/master/tests/test_viz.rs.
See also https://github.com/dathere/qsv/wiki/Visualization
@@ -288,16 +291,23 @@ choropleth options:
and --feature-id-key. Reuses --style for the basemap.
--geojson Custom region polygons as a local file path or an http(s) URL
to a GeoJSON FeatureCollection. Required for --map, and for
- the geojson-id location mode.
+ the geojson-id location mode. Also enables point-in-polygon
+ binning: with --lat/--lon (and without --geocode), each row's
+ point is binned into the region whose polygon contains it
+ (exact, no geocoding) and colored by --value/--agg or counts.
--feature-id-key Property path in each GeoJSON feature whose value matches an
- entry in the locations column (e.g. id, properties.fips).
- [default: id]
+ entry in the locations column, or that labels each binned
+ region (e.g. id, properties.fips). [default: id]
--geocode Derive the region codes by reusing qsv's geocode engine
(needs a build with the geocode feature). Either reverse-geocode
the lat/lon points, or forward-geocode the locations name
column. Only valid with location modes iso3 or usa-states.
`viz choropleth` also reuses --value, --agg, --style and the
lat/lon options.
+ --no-snap For point-in-polygon binning (lat/lon points binned into a
+ custom GeoJSON without geocoding): drop points that fall
+ outside every region instead of snapping each to its nearest
+ region (the default). A stderr note reports coverage either way.
smart options:
--max-charts Maximum number of panels in the dashboard. 0 (the default)
@@ -814,6 +824,7 @@ struct Args {
flag_geojson: Option,
flag_feature_id_key: Option,
flag_geocode: bool,
+ flag_no_snap: bool,
flag_bins: Option,
flag_agg: Option,
flag_box_points: Option,
@@ -2271,6 +2282,257 @@ fn load_geojson(spec: &str) -> CliResult {
.map_err(|e| crate::CliError::Other(format!("--geojson '{spec}' is not valid JSON: {e}")))
}
+/// One GeoJSON feature reduced to its polygon rings for point-in-polygon binning. `polygons` holds
+/// one entry per polygon (a MultiPolygon yields several); each polygon is a list of linear rings
+/// (ring 0 is the exterior, the rest are holes); each ring is a closed list of `[lon, lat]`
+/// vertices. `bbox` is `[min_lon, min_lat, max_lon, max_lat]` over all vertices, for cheap
+/// candidate prefiltering.
+struct PipFeature {
+ id: String,
+ polygons: Vec>>,
+ bbox: [f64; 4],
+}
+
+/// Resolve a GeoJSON feature's id by a dotted `--feature-id-key` path. Supports the top-level
+/// `"id"` and `"properties.<...>"` paths (mirroring plotly's `featureidkey` convention). Strings
+/// and numbers both coerce to `String` (CSV cells and plotly match feature ids as strings).
+/// Returns `None` when the path is absent or the value isn't a string/number.
+fn feature_id_by_path(feature: &geojson::Feature, key: &str) -> Option {
+ let coerce = |v: &serde_json::Value| -> Option {
+ match v {
+ serde_json::Value::String(s) => Some(s.clone()),
+ serde_json::Value::Number(n) => Some(n.to_string()),
+ _ => None,
+ }
+ };
+ if key == "id" {
+ return feature.id.as_ref().map(|id| match id {
+ geojson::feature::Id::String(s) => s.clone(),
+ geojson::feature::Id::Number(n) => n.to_string(),
+ });
+ }
+ let rest = key.strip_prefix("properties.")?;
+ let props = feature.properties.as_ref()?;
+ let mut segs = rest.split('.');
+ let mut cur = props.get(segs.next()?)?;
+ for seg in segs {
+ cur = cur.get(seg)?;
+ }
+ coerce(cur)
+}
+
+/// Convert a `geojson::Value::Polygon` ring set to closed `[lon, lat]` rings (appending the first
+/// vertex when a ring isn't already closed, so even-odd ray-casting via `windows(2)` covers every
+/// edge). Rings with fewer than 3 distinct vertices are dropped.
+fn geojson_rings_to_closed(poly: &[Vec]) -> Vec> {
+ poly.iter()
+ .filter_map(|ring| {
+ let mut pts: Vec<[f64; 2]> = ring
+ .iter()
+ .filter(|p| p.len() >= 2)
+ .map(|p| [p[0], p[1]])
+ .collect();
+ if pts.len() < 3 {
+ return None;
+ }
+ if pts.first() != pts.last() {
+ pts.push(pts[0]);
+ }
+ Some(pts)
+ })
+ .collect()
+}
+
+/// Flatten a `geojson::Value` into a list of polygons (each = exterior + hole rings). Handles
+/// Polygon, MultiPolygon, and nested GeometryCollection; other geometry types yield nothing.
+fn geojson_value_to_polygons(value: &geojson::Value) -> Vec>> {
+ match value {
+ geojson::Value::Polygon(poly) => {
+ let rings = geojson_rings_to_closed(poly);
+ if rings.is_empty() {
+ vec![]
+ } else {
+ vec![rings]
+ }
+ },
+ geojson::Value::MultiPolygon(mp) => mp
+ .iter()
+ .map(|poly| geojson_rings_to_closed(poly))
+ .filter(|rings| !rings.is_empty())
+ .collect(),
+ geojson::Value::GeometryCollection(geoms) => geoms
+ .iter()
+ .flat_map(|g| geojson_value_to_polygons(&g.value))
+ .collect(),
+ _ => vec![],
+ }
+}
+
+/// Parse a GeoJSON FeatureCollection into [`PipFeature`]s keyed by `feature_id_key`. Features
+/// missing the id key or lacking a Polygon/MultiPolygon geometry are skipped (and counted in a
+/// stderr note). Errors when the input isn't a FeatureCollection or yields no usable features.
+fn build_pip_features(
+ geojson: &serde_json::Value,
+ feature_id_key: &str,
+) -> CliResult> {
+ let fc = geojson::FeatureCollection::from_json_value(geojson.clone()).map_err(|e| {
+ crate::CliError::Other(format!(
+ "--geojson is not a valid GeoJSON FeatureCollection: {e}"
+ ))
+ })?;
+ let mut out: Vec = Vec::with_capacity(fc.features.len());
+ let mut skipped = 0_usize;
+ for feature in &fc.features {
+ let Some(id) = feature_id_by_path(feature, feature_id_key) else {
+ skipped += 1;
+ continue;
+ };
+ let polygons = match &feature.geometry {
+ Some(g) => geojson_value_to_polygons(&g.value),
+ None => Vec::new(),
+ };
+ if polygons.is_empty() {
+ skipped += 1;
+ continue;
+ }
+ let mut bbox = [
+ f64::INFINITY,
+ f64::INFINITY,
+ f64::NEG_INFINITY,
+ f64::NEG_INFINITY,
+ ];
+ for ring in polygons.iter().flatten() {
+ for &[lon, lat] in ring {
+ bbox[0] = bbox[0].min(lon);
+ bbox[1] = bbox[1].min(lat);
+ bbox[2] = bbox[2].max(lon);
+ bbox[3] = bbox[3].max(lat);
+ }
+ }
+ out.push(PipFeature { id, polygons, bbox });
+ }
+ if out.is_empty() {
+ return fail_clierror!(
+ "--geojson has no usable Polygon/MultiPolygon features with a '{feature_id_key}' id. \
+ Check --feature-id-key (e.g. 'id' or 'properties.')."
+ );
+ }
+ if skipped > 0 {
+ eprintln!(
+ "viz: skipped {skipped} GeoJSON feature(s) lacking a '{feature_id_key}' id or polygon \
+ geometry."
+ );
+ }
+ Ok(out)
+}
+
+/// Even-odd ray-casting test: is `(lon, lat)` inside this single polygon (exterior + holes)? A
+/// point in a hole crosses an even number of edges and is correctly reported as outside. Rings are
+/// pre-closed by [`geojson_rings_to_closed`], so `windows(2)` enumerates every edge.
+fn point_in_polygon(polygon: &[Vec<[f64; 2]>], lon: f64, lat: f64) -> bool {
+ let mut inside = false;
+ for ring in polygon {
+ for edge in ring.windows(2) {
+ let &[[xi, yi], [xj, yj]] = edge else {
+ continue;
+ };
+ if ((yi > lat) != (yj > lat)) && (lon < (xj - xi) * (lat - yi) / (yj - yi) + xi) {
+ inside = !inside;
+ }
+ }
+ }
+ inside
+}
+
+/// Is `(lon, lat)` inside this feature (bbox prefilter, then any constituent polygon)?
+fn feature_contains(feature: &PipFeature, lon: f64, lat: f64) -> bool {
+ if lon < feature.bbox[0]
+ || lon > feature.bbox[2]
+ || lat < feature.bbox[1]
+ || lat > feature.bbox[3]
+ {
+ return false;
+ }
+ feature
+ .polygons
+ .iter()
+ .any(|poly| point_in_polygon(poly, lon, lat))
+}
+
+/// Squared Euclidean (degree-space) distance from a point to a line segment.
+fn point_seg_dist2(px: f64, py: f64, ax: f64, ay: f64, bx: f64, by: f64) -> f64 {
+ let (dx, dy) = (bx - ax, by - ay);
+ let len2 = dx * dx + dy * dy;
+ let t = if len2 <= f64::EPSILON {
+ 0.0
+ } else {
+ (((px - ax) * dx + (py - ay) * dy) / len2).clamp(0.0, 1.0)
+ };
+ let (cx, cy) = (ax + t * dx, ay + t * dy);
+ let (ex, ey) = (px - cx, py - cy);
+ ex * ex + ey * ey
+}
+
+/// Squared distance from `(lon, lat)` to the nearest edge of any of the feature's rings.
+fn feature_dist2(feature: &PipFeature, lon: f64, lat: f64) -> f64 {
+ let mut best = f64::INFINITY;
+ for ring in feature.polygons.iter().flatten() {
+ for edge in ring.windows(2) {
+ let &[[ax, ay], [bx, by]] = edge else {
+ continue;
+ };
+ let d = point_seg_dist2(lon, lat, ax, ay, bx, by);
+ if d < best {
+ best = d;
+ }
+ }
+ }
+ best
+}
+
+/// Lower-bound squared distance from `(lon, lat)` to a feature's bbox (0 when inside the bbox).
+fn bbox_dist2(bbox: &[f64; 4], lon: f64, lat: f64) -> f64 {
+ let dx = (bbox[0] - lon).max(0.0).max(lon - bbox[2]);
+ let dy = (bbox[1] - lat).max(0.0).max(lat - bbox[3]);
+ dx * dx + dy * dy
+}
+
+/// Result of binning one point into a GeoJSON feature set.
+#[derive(Debug, PartialEq, Eq)]
+enum PipOutcome {
+ /// Contained by feature at this index.
+ Inside(usize),
+ /// Outside every polygon; snapped to the nearest feature at this index.
+ Snapped(usize),
+ /// Outside every polygon and not snapped (dropped).
+ Outside,
+}
+
+/// Assign a point to a feature: exact containment first; if none and `snap`, the nearest feature by
+/// edge distance (bbox lower-bound pruned); if none and `!snap`, [`PipOutcome::Outside`]. The
+/// `Inside`/`Snapped` distinction lets callers report how many points missed every polygon.
+fn pip_assign(features: &[PipFeature], lat: f64, lon: f64, snap: bool) -> PipOutcome {
+ if let Some(i) = features.iter().position(|f| feature_contains(f, lon, lat)) {
+ return PipOutcome::Inside(i);
+ }
+ if !snap {
+ return PipOutcome::Outside;
+ }
+ let mut best_i = None;
+ let mut best_d2 = f64::INFINITY;
+ for (i, f) in features.iter().enumerate() {
+ if bbox_dist2(&f.bbox, lon, lat) >= best_d2 {
+ continue;
+ }
+ let d2 = feature_dist2(f, lon, lat);
+ if d2 < best_d2 {
+ best_d2 = d2;
+ best_i = Some(i);
+ }
+ }
+ best_i.map_or(PipOutcome::Outside, PipOutcome::Snapped)
+}
+
/// Collect every `[lon, lat]` vertex from a GeoJSON value into parallel `(lats, lons)` vectors, so
/// a `--map` choropleth can frame the MapLibre basemap to its regions instead of opening at
/// plotly's whole-world default (where county/city polygons are effectively invisible). Descends
@@ -2322,7 +2584,20 @@ fn geojson_lat_lons(geojson: &serde_json::Value) -> (Vec, Vec) {
/// — with `--geocode` — are derived from `--lat`/`--lon` (reverse) or a `--locations` name column
/// (forward) by reusing qsv's geocode engine.
fn build_choropleth_plot(args: &Args, out_format: OutFormat) -> CliResult {
- let mode = parse_location_mode(args.flag_location_mode.as_deref().unwrap_or("iso3"))?;
+ // Point-in-polygon binning: lat/lon points + a custom --geojson, without --geocode. It colors
+ // regions by the GeoJSON feature that CONTAINS each point, so it always uses the geojson-id
+ // location mode regardless of any --location-mode default. --geocode (when also present) takes
+ // precedence — it is an explicit request for the geocode engine.
+ let pip = args.flag_geojson.is_some()
+ && args.flag_lat.is_some()
+ && args.flag_lon.is_some()
+ && !args.flag_geocode;
+ let snap = !args.flag_no_snap;
+ let mode = if pip {
+ LocationMode::GeoJsonId
+ } else {
+ parse_location_mode(args.flag_location_mode.as_deref().unwrap_or("iso3"))?
+ };
let palette = parse_color_scale(args.flag_color_scale.as_deref().unwrap_or("viridis"))?;
// --value drives the colored measure; aggregation defaults to sum when a --value is given,
@@ -2357,8 +2632,16 @@ fn build_choropleth_plot(args: &Args, out_format: OutFormat) -> CliResult
won't match a GeoJSON feature id. Use the default geo basemap with --geocode."
);
}
+ if args.flag_no_snap && !pip {
+ return fail_incorrectusage_clierror!(
+ "--no-snap only applies to point-in-polygon binning (--lat/--lon + --geojson without \
+ --geocode)."
+ );
+ }
- let (locations, z, measure_label) = if args.flag_geocode {
+ let (locations, z, measure_label) = if pip {
+ choropleth_pip_locations(args, agg, snap)?
+ } else if args.flag_geocode {
choropleth_geocoded_locations(args, mode.clone(), agg)?
} else {
choropleth_literal_locations(args, agg)?
@@ -2497,6 +2780,81 @@ fn choropleth_literal_locations(
Ok((locs, z, measure_label))
}
+/// Resolve choropleth `(locations, z, measure_label)` by point-in-polygon binning: each row's
+/// `--lat`/`--lon` point is assigned to the GeoJSON region whose polygon contains it (or, unless
+/// `snap` is false, to the nearest region), and the `--value` measure (or row counts) is
+/// aggregated per region id. Emits a stderr coverage note when points fall outside every region.
+fn choropleth_pip_locations(
+ args: &Args,
+ agg: Agg,
+ snap: bool,
+) -> CliResult<(Vec, Vec, String)> {
+ let (mut rdr, headers, nh) = reader_and_headers(args)?;
+ let lat_idx = resolve_one(args.flag_lat.as_ref(), &headers, nh, "lat")?;
+ let lon_idx = resolve_one(args.flag_lon.as_ref(), &headers, nh, "lon")?;
+ let value_idx = match args.flag_value.as_ref() {
+ Some(s) => Some(resolve_one(Some(s), &headers, nh, "value")?),
+ None => None,
+ };
+ let measure_label = match value_idx {
+ Some(i) => col_label(&headers, i, nh),
+ None => "count".to_string(),
+ };
+
+ let feature_id_key = args.flag_feature_id_key.as_deref().unwrap_or("id");
+ let geojson = load_geojson(args.flag_geojson.as_deref().unwrap())?;
+ let features = build_pip_features(&geojson, feature_id_key)?;
+
+ let mut raw_locs: Vec = Vec::new();
+ let mut values: Vec = Vec::new();
+ let mut total = 0_usize;
+ let mut outside = 0_usize;
+ let mut record = csv::ByteRecord::new();
+ while rdr.read_byte_record(&mut record)? {
+ let (Some(lat), Some(lon)) = (
+ parse_f64(record.get(lat_idx)),
+ parse_f64(record.get(lon_idx)),
+ ) else {
+ continue;
+ };
+ if !((-90.0..=90.0).contains(&lat) && (-180.0..=180.0).contains(&lon)) {
+ continue;
+ }
+ let value = match value_idx {
+ Some(i) => match parse_f64(record.get(i)) {
+ Some(v) => v,
+ None => continue,
+ },
+ None => 1.0,
+ };
+ total += 1;
+ match pip_assign(&features, lat, lon, snap) {
+ PipOutcome::Inside(fi) => {
+ raw_locs.push(features[fi].id.clone());
+ values.push(value);
+ },
+ PipOutcome::Snapped(fi) => {
+ raw_locs.push(features[fi].id.clone());
+ values.push(value);
+ outside += 1;
+ },
+ PipOutcome::Outside => outside += 1,
+ }
+ }
+
+ if outside > 0 {
+ let how = if snap {
+ "snapped to nearest region"
+ } else {
+ "dropped"
+ };
+ eprintln!("viz choropleth: {outside} of {total} points fell outside every region ({how}).");
+ }
+
+ let (locs, z) = aggregate(raw_locs, values, agg);
+ Ok((locs, z, measure_label))
+}
+
/// Resolve choropleth `(locations, z)` via qsv's geocode engine: reverse-geocode `--lat`/`--lon`
/// points, or forward-geocode a `--locations` name column, into ISO-3 / US-state codes per
/// `--location-mode`, then aggregate the `--value` measure (or row counts) per region.
@@ -4464,9 +4822,13 @@ enum PanelKind {
/// and `z` (counts). Like `Geo`, the `geo` subplot doesn't compose with the typed x/y grid,
/// so a dashboard containing this panel always renders via the inline path.
Choropleth {
- locations: Vec,
- z: Vec,
- location_mode: LocationMode,
+ locations: Vec,
+ z: Vec,
+ location_mode: LocationMode,
+ /// User GeoJSON for a point-in-polygon (`geojson-id`) panel; `None` for the built-in
+ /// geocode-derived iso3/usa-states path. Carried as a value so render does no I/O.
+ geojson: Option,
+ feature_id_key: Option,
},
/// Categorical part-to-whole hierarchy (`Treemap` or `Sunburst`, per `style`) over 2–3
/// nested low-cardinality dimensions. Carries the fully precomputed flat plotly arrays
@@ -5793,6 +6155,76 @@ fn build_smart_choropleth_panel(lats: &[f64], lons: &[f64]) -> Option {
locations: order,
z,
location_mode,
+ geojson: None,
+ feature_id_key: None,
+ },
+ ))
+}
+
+/// Build a `viz smart` choropleth panel by point-in-polygon binning the core lat/lon points into a
+/// user-supplied GeoJSON (`--geojson`). Each point is assigned to the region whose polygon contains
+/// it (or, unless `snap` is false, the nearest region); the panel is colored by per-region counts.
+/// Unlike [`build_smart_choropleth_panel`] this needs no geocode engine. Returns `None` when the
+/// GeoJSON can't be parsed/binned or fewer than 2 regions receive points.
+fn build_smart_pip_choropleth_panel(
+ geojson_spec: &str,
+ feature_id_key: &str,
+ lats: &[f64],
+ lons: &[f64],
+ snap: bool,
+) -> Option {
+ let geojson = load_geojson(geojson_spec).ok()?;
+ let features = build_pip_features(&geojson, feature_id_key).ok()?;
+ let mut order: Vec = Vec::new();
+ let mut counts: std::collections::HashMap = std::collections::HashMap::new();
+ let mut total = 0_usize;
+ let mut outside = 0_usize;
+ for (&lat, &lon) in lats.iter().zip(lons.iter()) {
+ if !((-90.0..=90.0).contains(&lat) && (-180.0..=180.0).contains(&lon)) {
+ continue;
+ }
+ total += 1;
+ let fi = match pip_assign(&features, lat, lon, snap) {
+ PipOutcome::Inside(fi) => fi,
+ PipOutcome::Snapped(fi) => {
+ outside += 1;
+ fi
+ },
+ PipOutcome::Outside => {
+ outside += 1;
+ continue;
+ },
+ };
+ let id = &features[fi].id;
+ if let Some(c) = counts.get_mut(id) {
+ *c += 1.0;
+ } else {
+ counts.insert(id.clone(), 1.0);
+ order.push(id.clone());
+ }
+ }
+ if order.len() < 2 {
+ return None;
+ }
+ // only report after we know a panel will actually render, so the note never describes a map the
+ // dashboard drops.
+ if outside > 0 {
+ let how = if snap {
+ "snapped to nearest region"
+ } else {
+ "dropped"
+ };
+ eprintln!("viz smart: {outside} of {total} points fell outside every region ({how}).");
+ }
+ let z: Vec = order.iter().map(|key| counts[key]).collect();
+ Some(Panel::new(
+ "Regions".to_string(),
+ PanelKind::Choropleth {
+ locations: order,
+ z,
+ location_mode: LocationMode::GeoJsonId,
+ geojson: Some(geojson),
+ feature_id_key: Some(feature_id_key.to_string()),
},
))
}
@@ -5883,13 +6315,27 @@ fn build_map_panel(
// per-country from what actually resolved — accurate even above `MAX_SMART_POINTS`,
// embedding only the aggregates (never the raw points), and its own 2-region check drops
// single-region data.
- #[cfg(feature = "geocode")]
- let choropleth_panel = (lon_span >= SMART_CHOROPLETH_MIN_SPAN_DEG
- || lat_span >= SMART_CHOROPLETH_MIN_SPAN_DEG)
- .then(|| build_smart_choropleth_panel(&core_lats, &core_lons))
- .flatten();
- #[cfg(not(feature = "geocode"))]
- let choropleth_panel: Option = None;
+ // A user `--geojson` switches the companion to a point-in-polygon choropleth (ungated — no
+ // geocode engine needed). It bypasses the span gate: an explicit `--geojson` is explicit
+ // intent, so a small custom-district file shouldn't be span-suppressed. Without
+ // `--geojson`, fall back to the geocode-derived iso3/US-state panel (gated, span-gated as
+ // before).
+ let choropleth_panel = if let Some(spec) = args.flag_geojson.as_deref() {
+ let snap = !args.flag_no_snap;
+ let key = args.flag_feature_id_key.as_deref().unwrap_or("id");
+ build_smart_pip_choropleth_panel(spec, key, &core_lats, &core_lons, snap)
+ } else {
+ #[cfg(feature = "geocode")]
+ {
+ (lon_span >= SMART_CHOROPLETH_MIN_SPAN_DEG || lat_span >= SMART_CHOROPLETH_MIN_SPAN_DEG)
+ .then(|| build_smart_choropleth_panel(&core_lats, &core_lons))
+ .flatten()
+ }
+ #[cfg(not(feature = "geocode"))]
+ {
+ None
+ }
+ };
let kind = if global {
PanelKind::Geo {
@@ -8169,17 +8615,23 @@ fn smart_inline_panel_plot(
locations,
z,
location_mode,
+ geojson,
+ feature_id_key,
} = &panel.kind
{
let mut plot = Plot::new();
- plot.add_trace(
- Choropleth::new(locations.clone(), z.clone())
- .location_mode(location_mode.clone())
- .color_scale(ColorScale::Palette(ColorScalePalette::Viridis))
- .show_scale(true)
- .color_bar(ColorBar::new().title("count"))
- .marker(ChoroplethMarker::new().line(Line::new().width(0.5))),
- );
+ let mut trace = Choropleth::new(locations.clone(), z.clone())
+ .location_mode(location_mode.clone())
+ .color_scale(ColorScale::Palette(ColorScalePalette::Viridis))
+ .show_scale(true)
+ .color_bar(ColorBar::new().title("count"))
+ .marker(ChoroplethMarker::new().line(Line::new().width(0.5)));
+ // a point-in-polygon panel carries its own GeoJSON polygons (geojson-id mode); the
+ // built-in geocode-derived panels (iso3 / usa-states) carry neither.
+ if let (Some(gj), Some(key)) = (geojson, feature_id_key) {
+ trace = trace.geojson(gj.clone()).feature_id_key(key.clone());
+ }
+ plot.add_trace(trace);
// frame to the FILLED REGION GEOMETRIES, not the source points (whose bounding box clips
// the countries/states at the edges — e.g. a city near a country's center).
// US-states use the albers-usa composite (CONUS + AK/HI insets), which self-frames
@@ -11374,4 +11826,92 @@ mod tests {
"max-pair V={v} should catch nested level"
);
}
+
+ #[test]
+ fn pip_inside_outside_and_snap() {
+ // one 0..10 square keyed by properties.id
+ let gj = serde_json::json!({
+ "type": "FeatureCollection",
+ "features": [{
+ "type": "Feature",
+ "properties": {"id": "A"},
+ "geometry": {"type": "Polygon", "coordinates":
+ [[[0.0, 0.0], [0.0, 10.0], [10.0, 10.0], [10.0, 0.0], [0.0, 0.0]]]}
+ }]
+ });
+ let feats = build_pip_features(&gj, "properties.id").unwrap();
+ assert_eq!(feats.len(), 1);
+ assert_eq!(feats[0].id, "A");
+ // (lat, lon) inside the square
+ assert_eq!(pip_assign(&feats, 5.0, 5.0, false), PipOutcome::Inside(0));
+ // far outside, no snap -> dropped
+ assert_eq!(pip_assign(&feats, 50.0, 50.0, false), PipOutcome::Outside);
+ // far outside, snap -> nearest (the only) feature
+ assert_eq!(pip_assign(&feats, 50.0, 50.0, true), PipOutcome::Snapped(0));
+ }
+
+ #[test]
+ fn pip_multipolygon_and_holes() {
+ let gj = serde_json::json!({
+ "type": "FeatureCollection",
+ "features": [
+ {"type": "Feature", "properties": {"id": "M"}, "geometry": {
+ "type": "MultiPolygon", "coordinates": [
+ [[[0.0, 0.0], [0.0, 2.0], [2.0, 2.0], [2.0, 0.0], [0.0, 0.0]]],
+ [[[10.0, 10.0], [10.0, 12.0], [12.0, 12.0], [12.0, 10.0], [10.0, 10.0]]]
+ ]}},
+ {"type": "Feature", "properties": {"id": "H"}, "geometry": {
+ "type": "Polygon", "coordinates": [
+ [[20.0, 20.0], [20.0, 30.0], [30.0, 30.0], [30.0, 20.0], [20.0, 20.0]],
+ [[23.0, 23.0], [23.0, 27.0], [27.0, 27.0], [27.0, 23.0], [23.0, 23.0]]
+ ]}}
+ ]
+ });
+ let feats = build_pip_features(&gj, "properties.id").unwrap();
+ let m = feats.iter().position(|f| f.id == "M").unwrap();
+ let h = feats.iter().position(|f| f.id == "H").unwrap();
+ // inside the SECOND polygon of the MultiPolygon
+ assert_eq!(pip_assign(&feats, 11.0, 11.0, false), PipOutcome::Inside(m));
+ // inside H's filled band (between exterior and hole)
+ assert_eq!(pip_assign(&feats, 21.0, 21.0, false), PipOutcome::Inside(h));
+ // inside H's HOLE -> not contained; no snap -> dropped
+ assert_eq!(pip_assign(&feats, 25.0, 25.0, false), PipOutcome::Outside);
+ // same hole point, snap -> snaps back to H (nearest boundary)
+ assert_eq!(pip_assign(&feats, 25.0, 25.0, true), PipOutcome::Snapped(h));
+ }
+
+ #[test]
+ fn pip_feature_id_top_level_numeric_and_missing_skipped() {
+ // first feature has a numeric top-level id; second has no top-level id (skipped under "id")
+ let gj = serde_json::json!({
+ "type": "FeatureCollection",
+ "features": [
+ {"type": "Feature", "id": 7, "properties": {}, "geometry": {
+ "type": "Polygon", "coordinates":
+ [[[0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0], [0.0, 0.0]]]}},
+ {"type": "Feature", "properties": {"name": "x"}, "geometry": {
+ "type": "Polygon", "coordinates":
+ [[[5.0, 5.0], [5.0, 6.0], [6.0, 6.0], [6.0, 5.0], [5.0, 5.0]]]}}
+ ]
+ });
+ let feats = build_pip_features(&gj, "id").unwrap();
+ assert_eq!(feats.len(), 1, "feature without a top-level id is skipped");
+ assert_eq!(feats[0].id, "7", "numeric id coerces to string");
+ assert_eq!(pip_assign(&feats, 0.5, 0.5, false), PipOutcome::Inside(0));
+ }
+
+ #[test]
+ fn pip_build_errors_when_no_feature_matches_key() {
+ let gj = serde_json::json!({
+ "type": "FeatureCollection",
+ "features": [{
+ "type": "Feature",
+ "properties": {"name": "x"},
+ "geometry": {"type": "Polygon", "coordinates":
+ [[[0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0], [0.0, 0.0]]]}
+ }]
+ });
+ // no feature has properties.id -> usable set is empty -> error
+ assert!(build_pip_features(&gj, "properties.id").is_err());
+ }
}
diff --git a/tests/test_viz.rs b/tests/test_viz.rs
index 4ab7fadb3..2dc868ca8 100644
--- a/tests/test_viz.rs
+++ b/tests/test_viz.rs
@@ -4243,6 +4243,125 @@ fn viz_choropleth_map_frames_ignore_properties() {
);
}
+// point-in-polygon binning: lat/lon points + a custom --geojson (no --geocode) bins each point into
+// the region whose polygon contains it; the location IS the feature id (exact, no name/code match).
+#[test]
+fn viz_choropleth_pip_bins_points() {
+ let wrk = Workdir::new("viz_choropleth_pip_bins_points");
+ // A = lon 0..10, B = lon 10..20 (both lat 0..10); one point far outside
+ wrk.create_from_string("pts.csv", "lat,lon\n5,5\n5,15\n5,15\n50,50\n");
+ wrk.create_from_string(
+ "regions.geojson",
+ r#"{"type":"FeatureCollection","features":[{"type":"Feature","properties":{"id":"A"},"geometry":{"type":"Polygon","coordinates":[[[0,0],[0,10],[10,10],[10,0],[0,0]]]}},{"type":"Feature","properties":{"id":"B"},"geometry":{"type":"Polygon","coordinates":[[[10,0],[10,10],[20,10],[20,0],[10,0]]]}}]}"#,
+ );
+
+ let mut cmd = wrk.command("viz");
+ cmd.args([
+ "choropleth",
+ "pts.csv",
+ "--lat",
+ "lat",
+ "--lon",
+ "lon",
+ "--geojson",
+ "regions.geojson",
+ "--feature-id-key",
+ "properties.id",
+ ]);
+ let out = wrk.output(&mut cmd);
+ assert!(out.status.success());
+ let html = String::from_utf8_lossy(&out.stdout);
+ // a geo Choropleth in geojson-id mode, matched on properties.id, with the geojson embedded
+ assert!(html.contains(r#""type":"choropleth""#));
+ assert!(html.contains(r#""locationmode":"geojson-id""#));
+ assert!(html.contains(r#""featureidkey":"properties.id""#));
+ assert!(html.contains(r#""geojson":{"type":"FeatureCollection""#));
+ // A gets the one (5,5) point; B gets two (5,15) + the snapped (50,50) outlier (snap is default)
+ assert!(html.contains(r#""locations":["A","B"]"#));
+ assert!(html.contains(r#""z":[1.0,3.0]"#));
+ // the snapped outlier is reported on stderr so the user knows a point missed every polygon
+ let stderr = String::from_utf8_lossy(&out.stderr);
+ assert!(
+ stderr.contains("1 of 4 points fell outside every region (snapped to nearest region)"),
+ "missing snap coverage note; stderr was: {stderr}"
+ );
+}
+
+// --no-snap drops points outside every region (instead of snapping to nearest) and reports coverage
+// on stderr.
+#[test]
+fn viz_choropleth_pip_no_snap_drops_and_reports() {
+ let wrk = Workdir::new("viz_choropleth_pip_no_snap_drops_and_reports");
+ wrk.create_from_string("pts.csv", "lat,lon\n5,5\n5,15\n5,15\n50,50\n");
+ wrk.create_from_string(
+ "regions.geojson",
+ r#"{"type":"FeatureCollection","features":[{"type":"Feature","properties":{"id":"A"},"geometry":{"type":"Polygon","coordinates":[[[0,0],[0,10],[10,10],[10,0],[0,0]]]}},{"type":"Feature","properties":{"id":"B"},"geometry":{"type":"Polygon","coordinates":[[[10,0],[10,10],[20,10],[20,0],[10,0]]]}}]}"#,
+ );
+
+ let mut cmd = wrk.command("viz");
+ cmd.args([
+ "choropleth",
+ "pts.csv",
+ "--lat",
+ "lat",
+ "--lon",
+ "lon",
+ "--geojson",
+ "regions.geojson",
+ "--feature-id-key",
+ "properties.id",
+ "--no-snap",
+ ]);
+ let out = wrk.output(&mut cmd);
+ assert!(out.status.success());
+ let html = String::from_utf8_lossy(&out.stdout);
+ // the (50,50) point is dropped: B keeps only its two contained points
+ assert!(html.contains(r#""z":[1.0,2.0]"#));
+ let stderr = String::from_utf8_lossy(&out.stderr);
+ assert!(
+ stderr.contains("1 of 4 points fell outside every region (dropped)"),
+ "missing coverage note; stderr was: {stderr}"
+ );
+}
+
+// --no-snap is only meaningful on the point-in-polygon path; reject it otherwise.
+#[test]
+fn viz_choropleth_no_snap_requires_pip() {
+ let wrk = Workdir::new("viz_choropleth_no_snap_requires_pip");
+ wrk.create_from_string("rg.csv", "iso3,val\nUSA,10\nCAN,5\n");
+ let mut cmd = wrk.command("viz");
+ cmd.args(["choropleth", "rg.csv", "--locations", "iso3", "--no-snap"]);
+ wrk.assert_err(&mut cmd);
+}
+
+// `viz smart` builds a point-in-polygon prefecture/region choropleth panel when given a --geojson,
+// with no geocode engine involved.
+#[test]
+fn viz_smart_pip_choropleth_panel() {
+ let wrk = Workdir::new("viz_smart_pip_choropleth_panel");
+ wrk.create_from_string("pts.csv", "lat,lon,mag\n5,5,1\n6,6,2\n5,15,3\n6,16,4\n");
+ wrk.create_from_string(
+ "regions.geojson",
+ r#"{"type":"FeatureCollection","features":[{"type":"Feature","properties":{"id":"A"},"geometry":{"type":"Polygon","coordinates":[[[0,0],[0,10],[10,10],[10,0],[0,0]]]}},{"type":"Feature","properties":{"id":"B"},"geometry":{"type":"Polygon","coordinates":[[[10,0],[10,10],[20,10],[20,0],[10,0]]]}}]}"#,
+ );
+
+ let mut cmd = wrk.command("viz");
+ cmd.args([
+ "smart",
+ "pts.csv",
+ "--geojson",
+ "regions.geojson",
+ "--feature-id-key",
+ "properties.id",
+ ]);
+ let out = wrk.output(&mut cmd);
+ assert!(out.status.success());
+ let html = String::from_utf8_lossy(&out.stdout);
+ assert!(html.contains(r#""type":"choropleth""#));
+ assert!(html.contains(r#""locationmode":"geojson-id""#));
+ assert!(html.contains(r#""featureidkey":"properties.id""#));
+}
+
// the projection (non-`--map`) path must frame the `geo` subplot to a custom GeoJSON extent —
// plotly only auto-scopes its built-in location modes, so without framing the polygons would sit
// tiny on the default whole-world view.
From 0e05b98d84619e1b9a3660189c7ba16624604021 Mon Sep 17 00:00:00 2001
From: Joel Natividad <1980690+jqnatividad@users.noreply.github.com>
Date: Sun, 28 Jun 2026 05:57:57 -0400
Subject: [PATCH 2/6] feat(viz): richer choropleth hover (name, labeled value,
% of total, rank)
Choropleth hover previously showed only a cryptic feature id and a bare
number. Each region now shows a human-readable name + id (Kagoshima (JP46)),
the value labeled with its measure (count: 65), the share of total
(15.6% of total, for count/sum aggregations only), and the rank
(rank 1 of 47).
The whole label is pre-rendered in Rust and attached via hover_text_array +
HoverInfo::Text (the proven ScatterGeo pattern), so it sidesteps plotly
hovertemplate token-binding and handles the Rust-computed pct/rank uniformly.
Region names come from the new --feature-name-key flag, or are auto-detected
from common name properties (properties.name, etc.). PipFeature now retains an
HTML-escaped name; the resolvers realign names to aggregate's output order and
return the hover array, which PanelKind::Choropleth carries to the smart render
arm. Applies to all paths (point-in-polygon, literal --locations, geocoded) and
both the geo Choropleth and MapLibre ChoroplethMap (--map) basemaps.
Browser-verified the rendered tooltip; regenerated the gallery so the showcase
dashboards reflect the new hover. 6 new unit + 4 new integration tests.
Part of #302.
Co-Authored-By: Claude Opus 4.8 (1M context)
---
CHANGELOG.md | 1 +
docs/help/viz.md | 3 +-
examples/viz/gallery.html | 2 +-
examples/viz/smart_geospatial.html | 2 +-
examples/viz/smart_us_choropleth.html | 2 +-
src/cmd/viz.rs | 320 +++++++++++++++++++++++---
tests/test_viz.rs | 142 ++++++++++++
7 files changed, 441 insertions(+), 31 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3253d8cb1..ddf06a472 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `viz smart` box plots now overlay sample points via a size-based heuristic — all points for small data, Tukey outliers for medium, none for large (a fast cache-only quartile box) — overridable with `--box-points` (now accepted by `smart`, not just `box`) ([#302](https://github.com/dathere/qsv/issues/302)).
- `viz smart` frequency bar charts now show a `(NULL)` bar for empty cells and an `Other (N)` aggregate bar for the categories beyond `--limit` (N = the count of distinct categories rolled up), matching `qsv frequency`'s default output. Both aggregate bars are drawn in a muted grey so they read as summaries rather than real categories. New `--no-nulls` and `--no-other` flags suppress them ([#302](https://github.com/dathere/qsv/issues/302)).
- `viz choropleth` & `viz smart` can now build a choropleth from a user-supplied GeoJSON by **point-in-polygon binning**: each row's `--lat`/`--lon` is tested directly against the GeoJSON polygons (even-odd ray casting, handling holes & MultiPolygon) and the matched feature id becomes the location — exact, works for any country/admin level, and needs no geocoding or GeoNames lookup. Points outside every region snap to the nearest feature by default (`--no-snap` drops them instead); either way a coverage note reports how many points missed every polygon. Wired into the `viz smart` dashboard as the "Regions" panel when a `--geojson` is supplied. Zero new dependencies ([#302](https://github.com/dathere/qsv/issues/302)).
+- `viz choropleth` & `viz smart` choropleths now have **richer hover tooltips**. Instead of a bare feature id and value, each region shows a human-readable name + id (e.g. `Kagoshima (JP46)`), the value labeled with its measure (`count: 65`), the share of total (`15.6% of total`, for count/sum aggregations only), and the rank (`rank 1 of 47`). Region names are read from the GeoJSON via the new `--feature-name-key` flag, or auto-detected from common name properties (`properties.name`, etc.) when omitted. Applies to all paths (point-in-polygon, literal `--locations`, geocoded) and both the geo and MapLibre (`--map`) basemaps ([#302](https://github.com/dathere/qsv/issues/302)).
### Changed
- `viz smart`: the leading **overview panels** (map/geo, correlation heatmap and its scatter/contour/3D drill-downs, and the time-series trend) now each span the **full dashboard width** on their own row, instead of being squeezed into a half-width grid cell. The per-column box/bar/histogram panels still flow in the `--grid-cols`-wide grid below. Applies to all render paths (typed subplot grid, raw-JSON static export, and the inline-div HTML grid) ([#302](https://github.com/dathere/qsv/issues/302)).
diff --git a/docs/help/viz.md b/docs/help/viz.md
index 418847539..3a5331275 100644
--- a/docs/help/viz.md
+++ b/docs/help/viz.md
@@ -346,7 +346,7 @@ qsv viz --help
## Choropleth Options [↩](#nav)
-| Option | Type | Description | Default |
+| Option | Type | Description | Default |
|--------|------|-------------|--------|
| `‑‑locations` | string | Column holding the region key for each row (an ISO-3 country code, a 2-letter US state code, a country name, or a GeoJSON feature id, per --location-mode). With --geocode, this instead names a place-name column to forward-geocode into region codes. | |
| `‑‑location‑mode` | string | How --locations values are matched to regions. One of: iso3 (the default, ISO-3166-1 alpha-3 country codes), usa-states (2-letter US state codes), country-names (full country names), geojson-id (match a --geojson feature id). | `iso3` |
@@ -354,6 +354,7 @@ qsv viz --help
| `‑‑map` | flag | Render on a token-free MapLibre tile basemap (a ChoroplethMap) instead of the default projection basemap. Requires --geojson and --feature-id-key. Reuses --style for the basemap. | |
| `‑‑geojson` | string | Custom region polygons as a local file path or an http(s) URL to a GeoJSON FeatureCollection. Required for --map, and for the geojson-id location mode. Also enables point-in-polygon binning: with --lat/--lon (and without --geocode), each row's point is binned into the region whose polygon contains it (exact, no geocoding) and colored by --value/--agg or counts. | |
| `‑‑feature‑id‑key` | string | Property path in each GeoJSON feature whose value matches an entry in the locations column, or that labels each binned region (e.g. id, properties.fips). | `id` |
+| `‑‑feature‑name‑key` | string | GeoJSON property path whose value is shown as the human-readable region label in choropleth hover (e.g. properties.name). When omitted, common name keys are auto-detected; falls back to the feature id when absent. | |
| `‑‑geocode` | flag | Derive the region codes by reusing qsv's geocode engine (needs a build with the geocode feature). Either reverse-geocode the lat/lon points, or forward-geocode the locations name column. Only valid with location modes iso3 or usa-states. `viz choropleth` also reuses --value, --agg, --style and the lat/lon options. | |
| `‑‑no‑snap` | flag | For point-in-polygon binning (lat/lon points binned into a custom GeoJSON without geocoding): drop points that fall outside every region instead of snapping each to its nearest region (the default). A stderr note reports coverage either way. | |
diff --git a/examples/viz/gallery.html b/examples/viz/gallery.html
index b9872f3cf..c0453ec54 100644
--- a/examples/viz/gallery.html
+++ b/examples/viz/gallery.html
@@ -65,7 +65,7 @@
qsv viz — chart gallery
smart dashboard (--dictionary infer, world choropleth)`qsv viz smart world_cities.csv --dictionary infer` — cities across all seven continents: `viz smart` reverse-geocodes the points and adds a per-country choropleth (cities-per-country, ISO-3) framed to the filled-country geometries via Plotly fitbounds — so the regions are never clipped at the viewport edge — beside the natural-earth point map (crimson markers so coastal/island points read against the ocean), plus a seven-continent breakdown. A describegpt-inferred Data Dictionary supplies the friendly field labels (e.g. Metro Population, Avg Annual Temp). Note: the choropleth is reverse-geocoded from lat/lon, so the two Antarctic stations — which have no sovereign country — snap to the nearest administering territory (McMurdo → NZ's Ross Dependency, Rothera → the Argentine sector); the seven-continent grouping instead comes from the dataset's own continent column. Requires a local LLM; the committed HTML is reused on regen.
diff --git a/examples/viz/smart_us_choropleth.html b/examples/viz/smart_us_choropleth.html
index 29e557a7e..99b8c6fc0 100644
--- a/examples/viz/smart_us_choropleth.html
+++ b/examples/viz/smart_us_choropleth.html
@@ -43,7 +43,7 @@
us_cities.csv — data overview
diff --git a/src/cmd/viz.rs b/src/cmd/viz.rs
index 1378025c0..e93228d5d 100644
--- a/src/cmd/viz.rs
+++ b/src/cmd/viz.rs
@@ -298,6 +298,10 @@ choropleth options:
--feature-id-key Property path in each GeoJSON feature whose value matches an
entry in the locations column, or that labels each binned
region (e.g. id, properties.fips). [default: id]
+ --feature-name-key GeoJSON property path whose value is shown as the
+ human-readable region label in choropleth hover (e.g.
+ properties.name). When omitted, common name keys are
+ auto-detected; falls back to the feature id when absent.
--geocode Derive the region codes by reusing qsv's geocode engine
(needs a build with the geocode feature). Either reverse-geocode
the lat/lon points, or forward-geocode the locations name
@@ -823,6 +827,7 @@ struct Args {
flag_map: bool,
flag_geojson: Option,
flag_feature_id_key: Option,
+ flag_feature_name_key: Option,
flag_geocode: bool,
flag_no_snap: bool,
flag_bins: Option,
@@ -2289,6 +2294,9 @@ fn load_geojson(spec: &str) -> CliResult {
/// candidate prefiltering.
struct PipFeature {
id: String,
+ /// Human-readable region label (e.g. `properties.name` → "Kagoshima") for hover, when a name
+ /// key is given or auto-detected; HTML-escaped at construction. `None` when no name resolves.
+ name: Option,
polygons: Vec>>,
bbox: [f64; 4],
}
@@ -2321,6 +2329,77 @@ fn feature_id_by_path(feature: &geojson::Feature, key: &str) -> Option {
coerce(cur)
}
+/// HTML-escape a string for use inside a plotly hover label (which renders ``/` ` as markup).
+/// Escapes `&` first so region names/measure labels with `&`, `<`, `>` show as literal text instead
+/// of being parsed as tags. Note: this does not neutralize a pathological `%{...}` in a label (out
+/// of scope — choropleth hover here is pre-rendered text, not a plotly template).
+fn escape_hover(s: &str) -> String {
+ s.replace('&', "&")
+ .replace('<', "<")
+ .replace('>', ">")
+}
+
+/// Format a measure value for a hover label: a whole number prints without a decimal point
+/// (`65`), otherwise up to 3 decimals with trailing zeros trimmed (`12.5`, `0.333`).
+fn fmt_measure(v: f64) -> String {
+ if v.is_finite() && v.fract() == 0.0 && v.abs() < 1e15 {
+ format!("{v:.0}")
+ } else {
+ let s = format!("{v:.3}");
+ s.trim_end_matches('0').trim_end_matches('.').to_string()
+ }
+}
+
+/// Build the per-region hover text for a choropleth trace, aligned 1:1 to `locs`. Each label has:
+/// a bold region name + id (`Kagoshima (JP46)`) when `names[i]` is non-empty, else the bold
+/// id alone; the value labeled with its measure (`count: 65`); the share of the total
+/// (`15.6% of total`) when `include_pct` (count/sum aggregations only); and the rank by descending
+/// value (`rank 1 of 47`). `names` (when given) and `measure_label` are HTML-escaped by the caller
+/// / here respectively; ids are escaped here. Attached via `.hover_text_array(..)` +
+/// `HoverInfo::Text`, so the whole string is pre-rendered (no plotly template tokens).
+fn choropleth_hover_text(
+ locs: &[String],
+ z: &[f64],
+ names: Option<&[String]>,
+ measure_label: &str,
+ include_pct: bool,
+) -> Vec {
+ let n = locs.len();
+ let label = escape_hover(measure_label);
+ let total: f64 = if include_pct {
+ z.iter().copied().filter(|v| v.is_finite()).sum()
+ } else {
+ 0.0
+ };
+ // rank by descending z (positional, 1-based; ties break by position).
+ let mut order: Vec = (0..n).collect();
+ order.sort_by(|&a, &b| z[b].partial_cmp(&z[a]).unwrap_or(std::cmp::Ordering::Equal));
+ let mut rank = vec![0_usize; n];
+ for (pos, &i) in order.iter().enumerate() {
+ rank[i] = pos + 1;
+ }
+ (0..n)
+ .map(|i| {
+ let id = escape_hover(&locs[i]);
+ let name = names
+ .and_then(|ns| ns.get(i))
+ .map(String::as_str)
+ .filter(|s| !s.is_empty());
+ let mut lines: Vec = Vec::with_capacity(4);
+ match name {
+ Some(name) => lines.push(format!("{name} ({id})")),
+ None => lines.push(format!("{id}")),
+ }
+ lines.push(format!("{label}: {}", fmt_measure(z[i])));
+ if include_pct && total > 0.0 {
+ lines.push(format!("{:.1}% of total", z[i] / total * 100.0));
+ }
+ lines.push(format!("rank {} of {n}", rank[i]));
+ lines.join(" ")
+ })
+ .collect()
+}
+
/// Convert a `geojson::Value::Polygon` ring set to closed `[lon, lat]` rings (appending the first
/// vertex when a ring isn't already closed, so even-odd ray-casting via `windows(2)` covers every
/// edge). Rings with fewer than 3 distinct vertices are dropped.
@@ -2374,12 +2453,29 @@ fn geojson_value_to_polygons(value: &geojson::Value) -> Vec>>
fn build_pip_features(
geojson: &serde_json::Value,
feature_id_key: &str,
+ feature_name_key: Option<&str>,
) -> CliResult> {
let fc = geojson::FeatureCollection::from_json_value(geojson.clone()).map_err(|e| {
crate::CliError::Other(format!(
"--geojson is not a valid GeoJSON FeatureCollection: {e}"
))
})?;
+ // Resolve the name key once: an explicit --feature-name-key, else auto-detect by probing common
+ // name properties on the FIRST feature (a heterogeneous collection whose first feature lacks
+ // the property won't auto-detect — use --feature-name-key to force it).
+ let name_key: Option = feature_name_key.map(str::to_string).or_else(|| {
+ let first = fc.features.first()?;
+ [
+ "properties.name",
+ "properties.NAME",
+ "properties.Name",
+ "properties.NAME_1",
+ "name",
+ ]
+ .into_iter()
+ .find(|k| feature_id_by_path(first, k).is_some())
+ .map(str::to_string)
+ });
let mut out: Vec = Vec::with_capacity(fc.features.len());
let mut skipped = 0_usize;
for feature in &fc.features {
@@ -2387,6 +2483,10 @@ fn build_pip_features(
skipped += 1;
continue;
};
+ let name = name_key
+ .as_deref()
+ .and_then(|k| feature_id_by_path(feature, k))
+ .map(|n| escape_hover(&n));
let polygons = match &feature.geometry {
Some(g) => geojson_value_to_polygons(&g.value),
None => Vec::new(),
@@ -2409,7 +2509,12 @@ fn build_pip_features(
bbox[3] = bbox[3].max(lat);
}
}
- out.push(PipFeature { id, polygons, bbox });
+ out.push(PipFeature {
+ id,
+ name,
+ polygons,
+ bbox,
+ });
}
if out.is_empty() {
return fail_clierror!(
@@ -2639,7 +2744,7 @@ fn build_choropleth_plot(args: &Args, out_format: OutFormat) -> CliResult
);
}
- let (locations, z, measure_label) = if pip {
+ let (locations, z, measure_label, hover_text) = if pip {
choropleth_pip_locations(args, agg, snap)?
} else if args.flag_geocode {
choropleth_geocoded_locations(args, mode.clone(), agg)?
@@ -2666,7 +2771,9 @@ fn build_choropleth_plot(args: &Args, out_format: OutFormat) -> CliResult
.color_scale(ColorScale::Palette(palette))
.show_scale(true)
.color_bar(ColorBar::new().title(measure_label))
- .marker(ChoroplethMarker::new().line(Line::new().width(0.5)));
+ .marker(ChoroplethMarker::new().line(Line::new().width(0.5)))
+ .hover_text_array(hover_text)
+ .hover_info(HoverInfo::Text);
plot.add_trace(trace);
// --style carries a global docopt default of open-street-map (a token-free MapLibre style).
let style =
@@ -2695,7 +2802,9 @@ fn build_choropleth_plot(args: &Args, out_format: OutFormat) -> CliResult
.color_scale(ColorScale::Palette(palette))
.show_scale(true)
.color_bar(ColorBar::new().title(measure_label))
- .marker(ChoroplethMarker::new().line(Line::new().width(0.5)));
+ .marker(ChoroplethMarker::new().line(Line::new().width(0.5)))
+ .hover_text_array(hover_text)
+ .hover_info(HoverInfo::Text);
let mut geo = LayoutGeo::new()
.resolution(GeoResolution::OneOverFiftyMillion)
.showland(true)
@@ -2741,12 +2850,12 @@ fn build_choropleth_plot(args: &Args, out_format: OutFormat) -> CliResult
Ok(plot)
}
-/// Resolve choropleth `(locations, z, measure_label)` from a literal `--locations` region-key
-/// column, aggregating the `--value` measure (or row counts) per region.
+/// Resolve choropleth `(locations, z, measure_label, hover_text)` from a literal `--locations`
+/// region-key column, aggregating the `--value` measure (or row counts) per region.
fn choropleth_literal_locations(
args: &Args,
agg: Agg,
-) -> CliResult<(Vec, Vec, String)> {
+) -> CliResult<(Vec, Vec, String, Vec)> {
let (mut rdr, headers, nh) = reader_and_headers(args)?;
let loc_idx = resolve_one(args.flag_locations.as_ref(), &headers, nh, "locations")?;
let value_idx = match args.flag_value.as_ref() {
@@ -2777,18 +2886,20 @@ fn choropleth_literal_locations(
values.push(value);
}
let (locs, z) = aggregate(raw_locs, values, agg);
- Ok((locs, z, measure_label))
+ let include_pct = matches!(agg, Agg::Count | Agg::Sum);
+ let hover_text = choropleth_hover_text(&locs, &z, None, &measure_label, include_pct);
+ Ok((locs, z, measure_label, hover_text))
}
-/// Resolve choropleth `(locations, z, measure_label)` by point-in-polygon binning: each row's
-/// `--lat`/`--lon` point is assigned to the GeoJSON region whose polygon contains it (or, unless
-/// `snap` is false, to the nearest region), and the `--value` measure (or row counts) is
+/// Resolve choropleth `(locations, z, measure_label, hover_text)` by point-in-polygon binning: each
+/// row's `--lat`/`--lon` point is assigned to the GeoJSON region whose polygon contains it (or,
+/// unless `snap` is false, to the nearest region), and the `--value` measure (or row counts) is
/// aggregated per region id. Emits a stderr coverage note when points fall outside every region.
fn choropleth_pip_locations(
args: &Args,
agg: Agg,
snap: bool,
-) -> CliResult<(Vec, Vec, String)> {
+) -> CliResult<(Vec, Vec, String, Vec)> {
let (mut rdr, headers, nh) = reader_and_headers(args)?;
let lat_idx = resolve_one(args.flag_lat.as_ref(), &headers, nh, "lat")?;
let lon_idx = resolve_one(args.flag_lon.as_ref(), &headers, nh, "lon")?;
@@ -2803,7 +2914,11 @@ fn choropleth_pip_locations(
let feature_id_key = args.flag_feature_id_key.as_deref().unwrap_or("id");
let geojson = load_geojson(args.flag_geojson.as_deref().unwrap())?;
- let features = build_pip_features(&geojson, feature_id_key)?;
+ let features = build_pip_features(
+ &geojson,
+ feature_id_key,
+ args.flag_feature_name_key.as_deref(),
+ )?;
let mut raw_locs: Vec = Vec::new();
let mut values: Vec = Vec::new();
@@ -2852,18 +2967,40 @@ fn choropleth_pip_locations(
}
let (locs, z) = aggregate(raw_locs, values, agg);
- Ok((locs, z, measure_label))
+ // realign region names to aggregate's output order; "" (empty) where a region has no name, so
+ // choropleth_hover_text falls back to the bold id. `None` overall when NO region has a name.
+ let names: Option> = if features.iter().any(|f| f.name.is_some()) {
+ let map: std::collections::HashMap<&str, &str> = features
+ .iter()
+ .filter_map(|f| f.name.as_deref().map(|n| (f.id.as_str(), n)))
+ .collect();
+ Some(
+ locs.iter()
+ .map(|id| {
+ map.get(id.as_str())
+ .map_or(String::new(), |n| (*n).to_string())
+ })
+ .collect(),
+ )
+ } else {
+ None
+ };
+ let include_pct = matches!(agg, Agg::Count | Agg::Sum);
+ let hover_text =
+ choropleth_hover_text(&locs, &z, names.as_deref(), &measure_label, include_pct);
+ Ok((locs, z, measure_label, hover_text))
}
-/// Resolve choropleth `(locations, z)` via qsv's geocode engine: reverse-geocode `--lat`/`--lon`
-/// points, or forward-geocode a `--locations` name column, into ISO-3 / US-state codes per
-/// `--location-mode`, then aggregate the `--value` measure (or row counts) per region.
+/// Resolve choropleth `(locations, z, measure_label, hover_text)` via qsv's geocode engine:
+/// reverse-geocode `--lat`/`--lon` points, or forward-geocode a `--locations` name column, into
+/// ISO-3 / US-state codes per `--location-mode`, then aggregate the `--value` measure (or row
+/// counts) per region.
#[cfg(feature = "geocode")]
fn choropleth_geocoded_locations(
args: &Args,
mode: LocationMode,
agg: Agg,
-) -> CliResult<(Vec, Vec, String)> {
+) -> CliResult<(Vec, Vec, String, Vec)> {
if !matches!(mode, LocationMode::Iso3 | LocationMode::UsaStates) {
return fail_incorrectusage_clierror!(
"--geocode only resolves --location-mode iso3 or usa-states (the codes geocode can \
@@ -2951,7 +3088,9 @@ fn choropleth_geocoded_locations(
}
}
let (locs, z) = aggregate(locations, kept_values, agg);
- Ok((locs, z, measure_label))
+ let include_pct = matches!(agg, Agg::Count | Agg::Sum);
+ let hover_text = choropleth_hover_text(&locs, &z, None, &measure_label, include_pct);
+ Ok((locs, z, measure_label, hover_text))
}
/// Non-geocode build: `--geocode` is unsupported, so reject it with an actionable message.
@@ -2960,7 +3099,7 @@ fn choropleth_geocoded_locations(
_args: &Args,
_mode: LocationMode,
_agg: Agg,
-) -> CliResult<(Vec, Vec, String)> {
+) -> CliResult<(Vec, Vec, String, Vec)> {
fail_incorrectusage_clierror!(
"--geocode requires a qsv build with the `geocode` feature (or a prebuilt qsv binary). \
Supply ready-made region codes via --locations instead."
@@ -4829,6 +4968,9 @@ enum PanelKind {
/// geocode-derived iso3/usa-states path. Carried as a value so render does no I/O.
geojson: Option,
feature_id_key: Option,
+ /// Pre-rendered per-region hover label (aligned to `locations`): name+id, labeled count,
+ /// share of total, rank. Attached via `hover_text_array` + `HoverInfo::Text` at render.
+ hover_text: Vec,
},
/// Categorical part-to-whole hierarchy (`Treemap` or `Sunburst`, per `style`) over 2–3
/// nested low-cardinality dimensions. Carries the fully precomputed flat plotly arrays
@@ -6149,6 +6291,7 @@ fn build_smart_choropleth_panel(lats: &[f64], lons: &[f64]) -> Option {
};
let z: Vec = order.iter().map(|key| counts[key]).collect();
+ let hover_text = choropleth_hover_text(&order, &z, None, "count", true);
Some(Panel::new(
name.to_string(),
PanelKind::Choropleth {
@@ -6157,6 +6300,7 @@ fn build_smart_choropleth_panel(lats: &[f64], lons: &[f64]) -> Option {
location_mode,
geojson: None,
feature_id_key: None,
+ hover_text,
},
))
}
@@ -6169,12 +6313,13 @@ fn build_smart_choropleth_panel(lats: &[f64], lons: &[f64]) -> Option {
fn build_smart_pip_choropleth_panel(
geojson_spec: &str,
feature_id_key: &str,
+ feature_name_key: Option<&str>,
lats: &[f64],
lons: &[f64],
snap: bool,
) -> Option {
let geojson = load_geojson(geojson_spec).ok()?;
- let features = build_pip_features(&geojson, feature_id_key).ok()?;
+ let features = build_pip_features(&geojson, feature_id_key, feature_name_key).ok()?;
let mut order: Vec = Vec::new();
let mut counts: std::collections::HashMap = std::collections::HashMap::new();
let mut total = 0_usize;
@@ -6217,6 +6362,25 @@ fn build_smart_pip_choropleth_panel(
eprintln!("viz smart: {outside} of {total} points fell outside every region ({how}).");
}
let z: Vec = order.iter().map(|key| counts[key]).collect();
+ // realign region names to the panel's location order; "" where a region has no name.
+ let names: Option> = if features.iter().any(|f| f.name.is_some()) {
+ let map: std::collections::HashMap<&str, &str> = features
+ .iter()
+ .filter_map(|f| f.name.as_deref().map(|n| (f.id.as_str(), n)))
+ .collect();
+ Some(
+ order
+ .iter()
+ .map(|id| {
+ map.get(id.as_str())
+ .map_or(String::new(), |n| (*n).to_string())
+ })
+ .collect(),
+ )
+ } else {
+ None
+ };
+ let hover_text = choropleth_hover_text(&order, &z, names.as_deref(), "count", true);
Some(Panel::new(
"Regions".to_string(),
PanelKind::Choropleth {
@@ -6225,6 +6389,7 @@ fn build_smart_pip_choropleth_panel(
location_mode: LocationMode::GeoJsonId,
geojson: Some(geojson),
feature_id_key: Some(feature_id_key.to_string()),
+ hover_text,
},
))
}
@@ -6323,7 +6488,8 @@ fn build_map_panel(
let choropleth_panel = if let Some(spec) = args.flag_geojson.as_deref() {
let snap = !args.flag_no_snap;
let key = args.flag_feature_id_key.as_deref().unwrap_or("id");
- build_smart_pip_choropleth_panel(spec, key, &core_lats, &core_lons, snap)
+ let name_key = args.flag_feature_name_key.as_deref();
+ build_smart_pip_choropleth_panel(spec, key, name_key, &core_lats, &core_lons, snap)
} else {
#[cfg(feature = "geocode")]
{
@@ -8617,6 +8783,7 @@ fn smart_inline_panel_plot(
location_mode,
geojson,
feature_id_key,
+ hover_text,
} = &panel.kind
{
let mut plot = Plot::new();
@@ -8625,7 +8792,9 @@ fn smart_inline_panel_plot(
.color_scale(ColorScale::Palette(ColorScalePalette::Viridis))
.show_scale(true)
.color_bar(ColorBar::new().title("count"))
- .marker(ChoroplethMarker::new().line(Line::new().width(0.5)));
+ .marker(ChoroplethMarker::new().line(Line::new().width(0.5)))
+ .hover_text_array(hover_text.clone())
+ .hover_info(HoverInfo::Text);
// a point-in-polygon panel carries its own GeoJSON polygons (geojson-id mode); the
// built-in geocode-derived panels (iso3 / usa-states) carry neither.
if let (Some(gj), Some(key)) = (geojson, feature_id_key) {
@@ -11839,7 +12008,7 @@ mod tests {
[[[0.0, 0.0], [0.0, 10.0], [10.0, 10.0], [10.0, 0.0], [0.0, 0.0]]]}
}]
});
- let feats = build_pip_features(&gj, "properties.id").unwrap();
+ let feats = build_pip_features(&gj, "properties.id", None).unwrap();
assert_eq!(feats.len(), 1);
assert_eq!(feats[0].id, "A");
// (lat, lon) inside the square
@@ -11867,7 +12036,7 @@ mod tests {
]}}
]
});
- let feats = build_pip_features(&gj, "properties.id").unwrap();
+ let feats = build_pip_features(&gj, "properties.id", None).unwrap();
let m = feats.iter().position(|f| f.id == "M").unwrap();
let h = feats.iter().position(|f| f.id == "H").unwrap();
// inside the SECOND polygon of the MultiPolygon
@@ -11894,7 +12063,7 @@ mod tests {
[[[5.0, 5.0], [5.0, 6.0], [6.0, 6.0], [6.0, 5.0], [5.0, 5.0]]]}}
]
});
- let feats = build_pip_features(&gj, "id").unwrap();
+ let feats = build_pip_features(&gj, "id", None).unwrap();
assert_eq!(feats.len(), 1, "feature without a top-level id is skipped");
assert_eq!(feats[0].id, "7", "numeric id coerces to string");
assert_eq!(pip_assign(&feats, 0.5, 0.5, false), PipOutcome::Inside(0));
@@ -11912,6 +12081,103 @@ mod tests {
}]
});
// no feature has properties.id -> usable set is empty -> error
- assert!(build_pip_features(&gj, "properties.id").is_err());
+ assert!(build_pip_features(&gj, "properties.id", None).is_err());
+ }
+
+ #[test]
+ fn escape_hover_escapes_markup() {
+ assert_eq!(escape_hover("A & "), "A & <B>");
+ // & must be escaped first so an already-escaped entity isn't double-escaped wrongly
+ assert_eq!(escape_hover(""), "<b>");
+ }
+
+ #[test]
+ fn fmt_measure_whole_vs_decimal() {
+ assert_eq!(fmt_measure(65.0), "65");
+ assert_eq!(fmt_measure(3.27), "3.27");
+ assert_eq!(fmt_measure(3.3630), "3.363");
+ assert_eq!(fmt_measure(0.5), "0.5");
+ }
+
+ #[test]
+ fn build_pip_features_autodetects_and_overrides_name() {
+ // properties.name is auto-detected when no key is given
+ let gj = serde_json::json!({
+ "type": "FeatureCollection",
+ "features": [{
+ "type": "Feature",
+ "properties": {"id": "A", "name": "Alpha", "label": "Other"},
+ "geometry": {"type": "Polygon", "coordinates":
+ [[[0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0], [0.0, 0.0]]]}
+ }]
+ });
+ let feats = build_pip_features(&gj, "properties.id", None).unwrap();
+ assert_eq!(feats[0].name.as_deref(), Some("Alpha"));
+ // explicit --feature-name-key overrides auto-detect
+ let feats = build_pip_features(&gj, "properties.id", Some("properties.label")).unwrap();
+ assert_eq!(feats[0].name.as_deref(), Some("Other"));
+ // no name property anywhere -> None (universal hover, no name line)
+ let gj2 = serde_json::json!({
+ "type": "FeatureCollection",
+ "features": [{
+ "type": "Feature",
+ "properties": {"id": "A"},
+ "geometry": {"type": "Polygon", "coordinates":
+ [[[0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0], [0.0, 0.0]]]}
+ }]
+ });
+ let feats = build_pip_features(&gj2, "properties.id", None).unwrap();
+ assert_eq!(feats[0].name, None);
+ }
+
+ #[test]
+ fn build_pip_features_escapes_name() {
+ let gj = serde_json::json!({
+ "type": "FeatureCollection",
+ "features": [{
+ "type": "Feature",
+ "properties": {"id": "A", "name": "R&D "},
+ "geometry": {"type": "Polygon", "coordinates":
+ [[[0.0, 0.0], [0.0, 1.0], [1.0, 1.0], [1.0, 0.0], [0.0, 0.0]]]}
+ }]
+ });
+ let feats = build_pip_features(&gj, "properties.id", None).unwrap();
+ assert_eq!(feats[0].name.as_deref(), Some("R&D <x>"));
+ }
+
+ #[test]
+ fn choropleth_hover_text_content_and_alignment() {
+ // z deliberately unsorted so rank (by descending z) differs from index order
+ let locs = vec!["JP01".to_string(), "JP02".to_string(), "JP03".to_string()];
+ let z = vec![10.0, 30.0, 60.0];
+ let names = vec!["Hokkaido".to_string(), String::new(), "Okinawa".to_string()];
+ let h = choropleth_hover_text(&locs, &z, Some(&names), "count", true);
+ assert_eq!(h.len(), 3);
+ // named region: name (id), labeled count, share, rank (10/100 = 10%, lowest -> rank 3)
+ assert_eq!(
+ h[0],
+ "Hokkaido (JP01) count: 10 10.0% of total rank 3 of 3"
+ );
+ // empty name -> bold id fallback; 30/100 = 30%, rank 2
+ assert_eq!(
+ h[1],
+ "JP02 count: 30 30.0% of total rank 2 of 3"
+ );
+ // highest z -> rank 1
+ assert_eq!(
+ h[2],
+ "Okinawa (JP03) count: 60 60.0% of total rank 1 of 3"
+ );
+ }
+
+ #[test]
+ fn choropleth_hover_text_no_pct_when_not_count_or_sum() {
+ let locs = vec!["A".to_string(), "B".to_string()];
+ let z = vec![3.5, 1.5];
+ // include_pct = false (mean/min/max): no "% of total" line, value still labeled + ranked
+ let h = choropleth_hover_text(&locs, &z, None, "magnitude", false);
+ assert_eq!(h[0], "A magnitude: 3.5 rank 1 of 2");
+ assert_eq!(h[1], "B magnitude: 1.5 rank 2 of 2");
+ assert!(!h[0].contains("% of total"));
}
}
diff --git a/tests/test_viz.rs b/tests/test_viz.rs
index 2dc868ca8..8633b0f6f 100644
--- a/tests/test_viz.rs
+++ b/tests/test_viz.rs
@@ -4362,6 +4362,148 @@ fn viz_smart_pip_choropleth_panel() {
assert!(html.contains(r#""featureidkey":"properties.id""#));
}
+// PIP choropleth hover shows the human-readable region name (auto-detected from properties.name),
+// the labeled count, the share of total, and the rank.
+#[test]
+fn viz_choropleth_pip_hover_names() {
+ let wrk = Workdir::new("viz_choropleth_pip_hover_names");
+ // 1 point in A, 3 in B
+ wrk.create_from_string("pts.csv", "lat,lon\n5,5\n5,15\n5,15\n6,16\n");
+ wrk.create_from_string(
+ "regions.geojson",
+ r#"{"type":"FeatureCollection","features":[{"type":"Feature","properties":{"id":"A","name":"Alpha"},"geometry":{"type":"Polygon","coordinates":[[[0,0],[0,10],[10,10],[10,0],[0,0]]]}},{"type":"Feature","properties":{"id":"B","name":"Bravo"},"geometry":{"type":"Polygon","coordinates":[[[10,0],[10,10],[20,10],[20,0],[10,0]]]}}]}"#,
+ );
+
+ let mut cmd = wrk.command("viz");
+ cmd.args([
+ "choropleth",
+ "pts.csv",
+ "--lat",
+ "lat",
+ "--lon",
+ "lon",
+ "--geojson",
+ "regions.geojson",
+ "--feature-id-key",
+ "properties.id",
+ ]);
+ let out = wrk.output(&mut cmd);
+ assert!(out.status.success());
+ let html = String::from_utf8_lossy(&out.stdout);
+ assert!(html.contains(r#""hovertext":["#), "hovertext array missing");
+ assert!(
+ html.contains(r#""hoverinfo":"text""#),
+ "hoverinfo:text missing"
+ );
+ // names auto-detected from properties.name; labeled count, share, and rank present
+ assert!(html.contains("Alpha"), "region name Alpha missing");
+ assert!(html.contains("Bravo"), "region name Bravo missing");
+ assert!(html.contains("count: 1"), "labeled count missing");
+ assert!(html.contains("% of total"), "share-of-total missing");
+ assert!(html.contains("rank 1 of 2"), "rank missing");
+}
+
+// literal choropleth with a non-count aggregation (mean): hover is labeled and ranked, but the
+// share-of-total line is suppressed (a share is meaningless for a mean).
+#[test]
+fn viz_choropleth_literal_hover_labeled() {
+ let wrk = Workdir::new("viz_choropleth_literal_hover_labeled");
+ wrk.create_from_string("rg.csv", "region,mag\nUSA,2\nUSA,4\nCAN,5\n");
+ let mut cmd = wrk.command("viz");
+ cmd.args([
+ "choropleth",
+ "rg.csv",
+ "--locations",
+ "region",
+ "--value",
+ "mag",
+ "--agg",
+ "mean",
+ ]);
+ let out = wrk.output(&mut cmd);
+ assert!(out.status.success());
+ let html = String::from_utf8_lossy(&out.stdout);
+ assert!(html.contains(r#""hovertext":["#), "hovertext array missing");
+ assert!(html.contains("mag: 3"), "labeled mean value missing");
+ assert!(html.contains("rank "), "rank missing");
+ assert!(
+ !html.contains("% of total"),
+ "share-of-total must be suppressed for mean agg"
+ );
+}
+
+// the --map (MapLibre ChoroplethMap) path also carries the enriched hover.
+#[test]
+fn viz_choropleth_map_hover() {
+ let wrk = Workdir::new("viz_choropleth_map_hover");
+ wrk.create_from_string("pts.csv", "lat,lon\n5,5\n5,15\n");
+ wrk.create_from_string(
+ "regions.geojson",
+ r#"{"type":"FeatureCollection","features":[{"type":"Feature","properties":{"id":"A","name":"Alpha"},"geometry":{"type":"Polygon","coordinates":[[[0,0],[0,10],[10,10],[10,0],[0,0]]]}},{"type":"Feature","properties":{"id":"B","name":"Bravo"},"geometry":{"type":"Polygon","coordinates":[[[10,0],[10,10],[20,10],[20,0],[10,0]]]}}]}"#,
+ );
+ let mut cmd = wrk.command("viz");
+ cmd.args([
+ "choropleth",
+ "pts.csv",
+ "--lat",
+ "lat",
+ "--lon",
+ "lon",
+ "--geojson",
+ "regions.geojson",
+ "--feature-id-key",
+ "properties.id",
+ "--map",
+ ]);
+ let out = wrk.output(&mut cmd);
+ assert!(out.status.success());
+ let html = String::from_utf8_lossy(&out.stdout);
+ assert!(
+ html.contains(r#""type":"choroplethmap""#),
+ "not a choroplethmap"
+ );
+ assert!(html.contains(r#""hovertext":["#), "hovertext array missing");
+ assert!(
+ html.contains("Alpha") || html.contains("Bravo"),
+ "region name missing"
+ );
+ assert!(html.contains("rank "), "rank missing");
+}
+
+// `viz smart` PIP choropleth panel carries the enriched hover (names + count + share + rank).
+#[test]
+fn viz_smart_pip_choropleth_hover_names() {
+ let wrk = Workdir::new("viz_smart_pip_choropleth_hover_names");
+ wrk.create_from_string("pts.csv", "lat,lon,mag\n5,5,1\n6,6,2\n5,15,3\n6,16,4\n");
+ wrk.create_from_string(
+ "regions.geojson",
+ r#"{"type":"FeatureCollection","features":[{"type":"Feature","properties":{"id":"A","name":"Alpha"},"geometry":{"type":"Polygon","coordinates":[[[0,0],[0,10],[10,10],[10,0],[0,0]]]}},{"type":"Feature","properties":{"id":"B","name":"Bravo"},"geometry":{"type":"Polygon","coordinates":[[[10,0],[10,10],[20,10],[20,0],[10,0]]]}}]}"#,
+ );
+ let mut cmd = wrk.command("viz");
+ cmd.args([
+ "smart",
+ "pts.csv",
+ "--geojson",
+ "regions.geojson",
+ "--feature-id-key",
+ "properties.id",
+ ]);
+ let out = wrk.output(&mut cmd);
+ assert!(out.status.success());
+ let html = String::from_utf8_lossy(&out.stdout);
+ assert!(html.contains(r#""hovertext":["#), "hovertext array missing");
+ assert!(
+ html.contains(r#""hoverinfo":"text""#),
+ "hoverinfo:text missing"
+ );
+ assert!(
+ html.contains("Alpha") && html.contains("Bravo"),
+ "region names missing"
+ );
+ assert!(html.contains("% of total"), "share-of-total missing");
+ assert!(html.contains("rank "), "rank missing");
+}
+
// the projection (non-`--map`) path must frame the `geo` subplot to a custom GeoJSON extent —
// plotly only auto-scopes its built-in location modes, so without framing the polygons would sit
// tiny on the default whole-world view.
From ee11e9a30c6da4d6e05321cd254d02b45f107cda Mon Sep 17 00:00:00 2001
From: Joel Natividad <1980690+jqnatividad@users.noreply.github.com>
Date: Sun, 28 Jun 2026 06:15:35 -0400
Subject: [PATCH 3/6] fix(viz): disambiguate choropleth inputs + propagate
smart GeoJSON errors
Address roborev job #3243 (two Medium findings on the PIP choropleth commit):
1. build_choropleth_plot selected point-in-polygon mode whenever --geojson +
--lat/--lon were present, silently ignoring an also-supplied --locations
column. lat/lon binning and a pre-keyed --locations column are two different
ways to identify regions; supplying both without --geocode now errors as
ambiguous instead of quietly dropping --locations.
2. build_smart_pip_choropleth_panel used `.ok()?`, swallowing GeoJSON
load/parse errors and an unmatched --feature-id-key so `viz smart --geojson`
succeeded with no Regions panel and no diagnostic. It now returns
CliResult