Robust irls for regression#130
Conversation
| if self.solver == "SGD": | ||
| base_estimator.partial_fit(X, y) | ||
| else: | ||
| base_estimator.fit(X, y) |
There was a problem hiding this comment.
Why is this difference? If it's OK to call fit, then can't we call fit for all estimators? For SGD it's mostly equivalent.
There was a problem hiding this comment.
partial_fit only does one iteration, fit do not, I could use fit with max_iter=1 alternatively, would this be better ?
It is important to use only one iteration because there may be outliers in the data and training on the whole dataset would imply that a lot of steps are non-robust and for SGD with a decreasing step-size it may never recover. This is different for IRLS.
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
|
The IRLS algorithm is a bit implicit I agree. The principle is that we do least squares at each iteration (cf line 392) and this least squares is reweighted using the custom robust weights: for epoch in range(self.max_iter):
weights = ...
base_estimator.fit(X, y, sample_weight=weights)So this is exactly an IRLS algorithm. |
|
I see your point. I just find calling this option Wouldn't it be clearer to accept So ideally, |
Yes you are right, although we can use alternative losses for SGD. Maybe I can replace "solver" by "iterative_scheme" or something like that, it could take as input 'SGD' or 'OLS' (or 'Ridge') but the algorithm is a little different for the two approaches because SGD you do only one iteration in each loop while for least squares, we do the whole optimization at each step. Maybe I can use a base_estimator, and then I can check, if max_iter is in the parameters I set it to 1 otherwise I do the whole optimization at each (outer) step.
Yes it is doable but I really want to keep the SGD one iteration per loop approach because it helps with stability. If you do the whole optimization at each iteration then there can be instability because in the first (outer) iterations, the weights are not robust yet. Ok for Ridge, but I don't understand why you say that this would remove the need for sgd_kwargs ?
I will check. |
It wouldn't. I mean if we go with supporting multiple base estimators, we probably should rename that to something that's independent from SGD (i.e. |
New optimization scheme for RobustWeightedRegressor.
Use IRLS (iterative reweighted least squares). Faster in low dimension than the SGD counterpart.
More extensive experiments would be needed to compare IRLS solver to SGD solver, but on simple examples IRLS seems to perform rather well.