- Started testing
lamberttype transforms again. - adding
lambertof type "hh" as option whenallow_lambert_his set toTRUE, per request from issue 24 - fixing deprecation of
dplyr::progress_estimated(#26)
- New function,
bestLogConstant, that uses the same machinery to pick the best value of a constant to use when logging a variable, e.g. the one that makes the distribution look the most normal, especially useful for non-positive or zero-inflated data. Currently experimental. - Taking out tests that failed due to dependent package update (does not impact default bestNormalize behavior). See (issue)[gmgeorg/LambertW#3].
- Add S3 methods that helps
step_orderNorm()to work with parallel processing. - Add S3 methods that helps
step_best_normalize()to work with parallel processing. - Add a new transformation: the double reversed log (@rempsyc #18)
- Fix issues in CRAN checks
- updating print functionality to remain compatible with recipes.
- updated term selection machinery to remain compatible with recipes.
- improving scalability of
boxcoxin response to issue 10; thank you to Krzysztof Dyba (kadyb) for the suggestions. - improved scalability of
yeojohnson, thanks to Emil Hvitfeldt (EmilHvitfeldt) for his work on this problem for therecipespackage here. - updated tests to remain compatible with new recipes package (>0.1.16)
- update citation (new R Journal publication!)
- fix/add features to
tidymethod to work more generally, provide easy access to chosen transformations (responding to issue 9)
- added packagedown website here: https://petersonr.github.io/bestNormalize
- Implemented GH actions (code coverage and R CMD check) via
usethisin response to issue 7 - Improved scalability of ORQ transformation via
n_logit_fitargument, with default of 10000. This should substantially decrease memory use oforderNormwhile only minimally affecting the out-of-domain approximations. - Updated documentation
- changed
step_bestNormalizetostep_best_normalize, responding to 8 - Fixed error in documentation regarding
LambertWtransformation types (thank you to Georg M. Goerg, the author ofLambertW, for pointing this out). - Add
center_scaletransform as default whenstandardize == TRUE - Added error when trying to use repeated CV with much too small of folds
- Changed a few
TandFtoTRUEandFALSE - Added documentation of how one can use
scalesandggplot2to visualize all transformations. - Added
butcherandaxefunctionality in order to improve scalability ofstep_*functions - Improved
tidyfunctionality with bestNormalize andstep_best_normalize
- Fixed bug that was causing simple transforms to fail in
bestNormalize - Updated to new LambertW version in dependencies (request from CRAN)
- Added ability to supply user-defined transformations and associated vignette
- Added in ability to supply user-defined normalization statistics and (the same) associated vignette
- Take out
standardizeoption fromno_transformsox.talways matches input vector. - Minor programming improvements
- Added
step_bestNormalizeandstep_orderNormfunctions for implementation withinrecipes. - Changed default to
warn = FALSEwhen callingbestNormalize. If a transformation doesn't work, warnings will no longer be shown by default unlesswarnis set toTRUE.
- Allow options to be passed through bestNormalize to specific transformation functions
- Slight bug fix to square root transformation (a = 0 by default, not .001)
- Slight bug fix in the "quiet" argument for bestNormalize with LOO
- Slight bug fix to
plot.bestNormalizewhich was improperly labeling transformations exp_xhaving trouble withstandardizeoption, so added optionallow_exp_xtobestNormalizeto allow a workaround, and changed it so if any infinite values are produced during the transformation, exp_x will not work (that way,bestNormalizewill not include this in its results).- Progress bar will now only displayed if
quietisFALSEandlength(x) > 2000
- Update citation to point to newly published work.
- Update maintainer email to new address (same person, new affiliation).
- Correctly subtract 1/2 from ranks in ORQ transformation to make quantile estimation unbiased (this was a bug in 1.3.0, as ranks start at 1, not zero). Divides by n instead of n+1.
- Specify the weights for the GLM in the ORQ transformation to be the number of observations. This doesn't change the transformation but seems to have a bit faster computational speed, and it's more mathematically tractable.
- Other various bug fixes to tests and to plotting functions.
- Add 1/2 to ranks in ORQ transformation to make quantile estimation unbiased (should have minimal impact)
- Add option
loofor leave-one-out cross-validation - Add progress bar for cross-validation methods (both with/without parallel)
- Add "no_transform" function - does the same thing as I(x) but in the syntax of other transformations (this allows the normalization statistics to also be calculated if no transformation is performed).
- Add support for lambert transforms of type "h" in the
bestNormalizefunction viaallow_lambert_hargument. - Add "before standardization" to printout of different transforms' means and sds to clarify output
- Added other transformations commonly used to normalize a vector
- exponential, log, square root, arcsinh
- Lambert WxF is no longer done by default by bestNormalize since it is unstable on certain OS (Linux, Solaris), and does not abide by the CRAN policy.
- Clarified that the transformations are standardized by default, and providing option to not standardize in transformations
- Updated tests to run a bit faster and to use proper S3 classes
- Added references for original papers (Van der Waerden, Bartlett) that cite the basis for the orderNorm transformation, as well as discussion in Beasley (2009)
- Edited description to clarify that this procedure is a new adaptation of an older technique rather than a new technique in itself
-
Added feature to estimate out-of-sample normality statistics in bestNormalize instead of in-sample ones via repeated cross-validation
- Note: set
out_of_sample = FALSEto maintain backward-compatibility with prior versions and setallow_orderNorm = FALSEas well so that it isn't automatically selected
- Note: set
-
Improved extrapolation of the ORQ (orderNorm) method
- Instead of linear extrapolation, it uses binomial (logit-link) model on ranks
- No more issues with Cauchy transformation
-
Added plotting feature for transformation objects
-
Cleared up some documentation
- Changed the name of the orderNorm technique to "Ordered Quantile normalization".
- Made description more clear in response to comments from CRAN