feat(analyses): drop missing/NaN rows, error on Inf#757
Open
jkrumbiegel wants to merge 11 commits into
Open
Conversation
Analyses (density, histogram, linear, smooth, frequency, expectation) now silently drop rows where any positional input is missing or NaN. Inf/-Inf in any positional input or retained weights throws an explicit ArgumentError naming the offending column. Weights that are missing/NaN on a retained data row also error explicitly.
…oring Symmetric with how positional inputs are handled: missing/NaN drops the row, Inf/-Inf still throws.
…umn barriers Passing positional as a Tuple specialized the filter on every tuple shape. Take the positional as a plain collection and delegate per-column work to small barrier functions so only those specialize per column type.
Categorical inputs (Vector{<:Union{Missing,String}}, etc.) keep `missing`
as a value rather than dropping the row — matching how AoG already treats
categorical missings as a category elsewhere. Numeric columns still drop
missing/NaN and error on Inf.
Drop the `weight_keys` parameter: positional and named columns are now
filtered uniformly via the same column-type rule, no special casing for
weights.
Also collapses the row loop to `keep .&= .!_is_missing_or_nan.(col)`.
Replaces the ad-hoc `<: Number` check with AoG's existing `iscontinuous` helper, which also recognises `TimeType` (Date/DateTime) as continuous and treats Bool as categorical. `_is_missing_or_nan` and `_is_inf` guard their `isnan`/`isinf` calls with `v isa AbstractFloat` so they don't error on Date values now that those flow through.
Unitful.Quantity and DynamicQuantities.Quantity are <: Number but not <: AbstractFloat. The earlier AbstractFloat guard let NaN*u"m" and Inf*u"m" slip past the filter. `v isa Number` covers them while still excluding TimeType (which is <: TimeType, not <: Number) so Date columns don't trigger an undefined `isnan(::Date)`.
`groupreduce` is a generic helper; filtering rows is a behavior the calling analyses choose. Move the filter into `frequency` and `expectation` (the only two callers) so `groupreduce` stays neutral.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
density,histogram,linear,smooth,frequency,expectation) now silently drop rows where any positional input ismissingorNaN. Multi-input analyses drop a row if any of its columns is bad. This matches Makie's behavior to show nothing for NaNs where appropriate (no scatter marker, gap in a line, etc). You just can't meaningfully include a NaN or missing in a histogram etc. but it's too inconvenient to not have it do anything by default. NaN/missing handling is of course still the responsibility of the user, erroring all the time when exploring data is not a good solution.Inf/-Infin any positional input throws a cleanArgumentErrornaming the column, replacing inconsistent downstream errors ("Bandwidth must be positive","start and stop must be finite") and silent garbage-in/garbage-out behavior (frequencycountingInfas a category,expectationpropagating it into the mean,histogramquietly dropping it outside bins).