Conversation
|
Thanks @matejklemen 👍, will look at the PR as soon as I find time for it. |
* swap argument order in private case class to prevent conflicts between auto-generated apply method and custom apply method that is exposed to the public
|
Alright, no rush. I've fixed the compilation error that was previously present and tried to address Codacy's issues with the PR, though it seems that I don't understand what the issue "Found unnamed result or parameter" actually means 😅 |
inejc
left a comment
There was a problem hiding this comment.
Thanks again for the PR @matejklemen 🙂. This is definitely in the right direction but requires some small changes. Let me know if my comments aren't clear or you have any further questions.
src/main/scala/io/picnicml/doddlemodel/preprocessing/Binarizer.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/Binarizer.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/Binarizer.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/Binarizer.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/RangeScaler.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/RangeScaler.scala
Outdated
Show resolved
Hide resolved
src/test/scala/io/picnicml/doddlemodel/preprocessing/RangeScalerTest.scala
Show resolved
Hide resolved
src/test/scala/io/picnicml/doddlemodel/preprocessing/RangeScalerTest.scala
Outdated
Show resolved
Hide resolved
* no-op instead of exception when no numerical columns in input to Binarizer and RangeScaler * fix tests accordingly * expose norms in preprocessing.Norms * style fixes
|
Alright, I think I got all of them. Thanks for the helpful comments 🙂 I was a bit confused about sealed traits, so I did it slightly differently than what you said. It seemed redundant to me to define available |
src/main/scala/io/picnicml/doddlemodel/preprocessing/Normalizer.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/Binarizer.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/Binarizer.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/RangeScaler.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/RangeScaler.scala
Outdated
Show resolved
Hide resolved
src/main/scala/io/picnicml/doddlemodel/preprocessing/RangeScaler.scala
Outdated
Show resolved
Hide resolved
src/test/scala/io/picnicml/doddlemodel/preprocessing/RangeScalerTest.scala
Show resolved
Hide resolved
* Avoid unnecessarily copying data in preprocessors (remove toDenseMatrix) * Make some constructors and values public * Add docs for RangeScaler, Binarizer, Normalizer * Add RangeScaler test with different range than [0, 1] * Optimize imports
|
New batch of changes:
|
|
@matejklemen everything LGTM except the comment I wrote regarding compilation (a single and simple change). I will merge once this one is pushed 😄. |
Description of Changes
Addresses #7. Implements 3 new preprocessors:
Binarizer: convert numerical columns into binary columns based on a threshold. Works with either a
Doublescalar or aDenseVector[Double]. In the second case, each value of vector represents a threshold for a separate column, e.g. 1st value is used as threshold for 1st numeric column.Motivated by sklearn's Binarizer.
Normalizer: converts rows to unit norm. Currently, there are 3 norms supported (L1, L2, max). These should probably be exposed in some kind of distance module for further code reuse, I am open to ideas on where to put these norm functions.
Motivated by sklearn's Normalizer.
RangeScaler: converts numerical columns to a certain range, e.g. from 0 to 1.
Motivated by sklearn's MinMaxScaler (RangeScaler or IntervalScaler seems like a more intuitive name to me, though I can change it if min-max scaling is a "standardized" name for this).
Includes