Description of the issue
Currently, if user inadvertently provide formulas inducing a singular model matrix $X$, the initialization is still carried out using lm.fit with a low rank model. This is problematic since the residual matrix of this model is used to initialize the variational means matrix $M$, but it is not full rank.
This causes the first iteration of the C++ optimization will then fail when inverting $MM^\top$ with a rather obscure C++ error not linked to the origin of the problem which is singularity of the model matrix.
|
arma::mat Omega = w_bar * inv_sympd(M.t() * (M.each_col() % w) + diagmat(w.t() * S2)); |
Possible improvements
I see two different approaches to deal with the issue
- Better handling of error messages when model matrix is singular. (Preferred option)
- Using regularization (ridge, lasso ?) in the initial linear regression.
Arguably, causing an error with singular model is a desirable feature (this can help users detect problems with their model). Hence, I will open an MR going with option 1, but we can still discuss here if you'd like to implement option 2 as well.
Description of the issue
Currently, if user inadvertently provide formulas inducing a singular model matrix$X$ , the initialization is still carried out using $M$ , but it is not full rank.
lm.fitwith a low rank model. This is problematic since the residual matrix of this model is used to initialize the variational means matrixThis causes the first iteration of the C++ optimization will then fail when inverting$MM^\top$ with a rather obscure C++ error not linked to the origin of the problem which is singularity of the model matrix.
PLNmodels/src/optim_full_cov.cpp
Line 50 in 15b2107
Possible improvements
I see two different approaches to deal with the issue
Arguably, causing an error with singular model is a desirable feature (this can help users detect problems with their model). Hence, I will open an MR going with option 1, but we can still discuss here if you'd like to implement option 2 as well.