[API 2]: Desparsified Lasso#381
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #381 +/- ##
==========================================
- Coverage 99.19% 98.94% -0.25%
==========================================
Files 24 23 -1
Lines 1364 1424 +60
==========================================
+ Hits 1353 1409 +56
- Misses 11 15 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bthirion
left a comment
There was a problem hiding this comment.
Looks good overall. I left a few suggestions.
| clf=clone(self.lasso).set_params( | ||
| alpha=alphas[i], | ||
| precompute=np.delete(np.delete(gram, i, axis=0), i, axis=1), | ||
| random_state=np.random.RandomState( | ||
| np.random.default_rng(streams[i]).bit_generator | ||
| ), |
There was a problem hiding this comment.
It will not be intuitive for the user that the parameters of the model are overwritten inside the method. IMO, the model's parameter should be left unchanged. The _alpha_max trick could be shown in an example as a way to find the list of alphas, but not hidden in the function.
There was a problem hiding this comment.
I don't see the problem with working with a given random generator or seed, since the problems are different.
I agree that we should not hide parameter settings in the functions/methods.
There was a problem hiding this comment.
I added an option to change alphas.
|
I made a pass to solve the last discussion points. I also added an example using EnCluDL on the MNIST dataset. By writing it, I got a bit confused by the output of |
bthirion
left a comment
There was a problem hiding this comment.
There is mostly the one_minus_pvalue_ thing to fix. It's looking good otherwise.
| selected_dl = np.logical_or( | ||
| pval_corr < fwer_target / 2, one_minus_pval_corr < fwer_target / 2 | ||
| desparsified_lasso.pvalues_corr_ < fwer_target / 2, | ||
| 1 - desparsified_lasso.pvalues_corr_ < fwer_target / 2, |
There was a problem hiding this comment.
No this is incorrect I'm afraid, because due to the correction one_minus_pval_corr is not 1 - desparsified_lasso.pvalues_corr_
| ) | ||
|
|
||
| beta_hat_1_7, selected_ecdl_1_7 = ensemble_clustered_inference_pvalue( | ||
| X_1_7.shape[0], False, ward_1_7, desparsified_lassos_1_7, fdr=fdr |
There was a problem hiding this comment.
I'd rather use fwer control rather than fdr control. I think it makes more sense for EncluDL
| # %% | ||
| # References | ||
| # ---------- | ||
| # TODO |
| interior_support = support_size - margin_size | ||
| extended_support = support_size + margin_size | ||
| n_bootstraps = 4 | ||
| fdr = 0.3 |
There was a problem hiding this comment.
This is a bit lenient as a control.
And we should rather check that FWER control is rigorous.
There was a problem hiding this comment.
Is it possible to change this ?
| from hidimstat.ensemble_clustered_inference import ensemble_clustered_inference_pvalue | ||
|
|
||
| n_jobs = 5 | ||
| fdr = 0.5 |
There was a problem hiding this comment.
sounds too weak: "in average, half of your discoveries are garbage"
|
bthirion
left a comment
There was a problem hiding this comment.
LGTM, thx. I have 2 relatively minor comments pending.
| interior_support = support_size - margin_size | ||
| extended_support = support_size + margin_size | ||
| n_bootstraps = 4 | ||
| fdr = 0.3 |
There was a problem hiding this comment.
Is it possible to change this ?
|
Done. |
No description provided.