Skip to content

feat(analyses): drop missing/NaN rows, error on Inf#757

Open
jkrumbiegel wants to merge 11 commits into
masterfrom
jk/analysis-missings
Open

feat(analyses): drop missing/NaN rows, error on Inf#757
jkrumbiegel wants to merge 11 commits into
masterfrom
jk/analysis-missings

Conversation

@jkrumbiegel
Copy link
Copy Markdown
Member

@jkrumbiegel jkrumbiegel commented May 18, 2026

Summary

  • Analyses (density, histogram, linear, smooth, frequency, expectation) now silently drop rows where any positional input is missing or NaN. Multi-input analyses drop a row if any of its columns is bad. This matches Makie's behavior to show nothing for NaNs where appropriate (no scatter marker, gap in a line, etc). You just can't meaningfully include a NaN or missing in a histogram etc. but it's too inconvenient to not have it do anything by default. NaN/missing handling is of course still the responsibility of the user, erroring all the time when exploring data is not a good solution.
  • Inf/-Inf in any positional input throws a clean ArgumentError naming the column, replacing inconsistent downstream errors ("Bandwidth must be positive", "start and stop must be finite") and silent garbage-in/garbage-out behavior (frequency counting Inf as a category, expectation propagating it into the mean, histogram quietly dropping it outside bins).

jkrumbiegel and others added 11 commits May 18, 2026 14:55
Analyses (density, histogram, linear, smooth, frequency, expectation) now
silently drop rows where any positional input is missing or NaN. Inf/-Inf in
any positional input or retained weights throws an explicit ArgumentError
naming the offending column. Weights that are missing/NaN on a retained
data row also error explicitly.
…oring

Symmetric with how positional inputs are handled: missing/NaN drops the
row, Inf/-Inf still throws.
…umn barriers

Passing positional as a Tuple specialized the filter on every tuple shape.
Take the positional as a plain collection and delegate per-column work to
small barrier functions so only those specialize per column type.
Categorical inputs (Vector{<:Union{Missing,String}}, etc.) keep `missing`
as a value rather than dropping the row — matching how AoG already treats
categorical missings as a category elsewhere. Numeric columns still drop
missing/NaN and error on Inf.

Drop the `weight_keys` parameter: positional and named columns are now
filtered uniformly via the same column-type rule, no special casing for
weights.

Also collapses the row loop to `keep .&= .!_is_missing_or_nan.(col)`.
Replaces the ad-hoc `<: Number` check with AoG's existing `iscontinuous`
helper, which also recognises `TimeType` (Date/DateTime) as continuous and
treats Bool as categorical. `_is_missing_or_nan` and `_is_inf` guard their
`isnan`/`isinf` calls with `v isa AbstractFloat` so they don't error on Date
values now that those flow through.
Unitful.Quantity and DynamicQuantities.Quantity are <: Number but not <:
AbstractFloat. The earlier AbstractFloat guard let NaN*u"m" and Inf*u"m"
slip past the filter. `v isa Number` covers them while still excluding
TimeType (which is <: TimeType, not <: Number) so Date columns don't
trigger an undefined `isnan(::Date)`.
`groupreduce` is a generic helper; filtering rows is a behavior the
calling analyses choose. Move the filter into `frequency` and
`expectation` (the only two callers) so `groupreduce` stays neutral.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant