Add selection with fdr and associate test by lionelkusch · Pull Request #361 · mind-inria/hidimstat

lionelkusch · 2025-08-28T17:44:04Z

Add the method for FDR and the test associate test for these two methods.
Fix bug in selection

codecov · 2025-08-29T10:46:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.43%. Comparing base (7d642a4) to head (f10bf06).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #361      +/-   ##
==========================================
+ Coverage   97.87%   99.43%   +1.56%     
==========================================
  Files          22       22              
  Lines        1223     1247      +24     
==========================================
+ Hits         1197     1240      +43     
+ Misses         26        7      -19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bthirion

fdr control should be based on p-values or e-values only.
LGTM otherwise.

bthirion

oops, I had a few comments not pushed yet. HTH.

bthirion

Just started a pass.

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr>

bthirion

Thx for taking care of that.
I have a few simplications suggestions.

bthirion · 2025-10-10T06:05:43Z

-            Selects features with importance scores above the specified threshold.
-        threshold_pvalue : float, optional, default=None
-            Selects features with p-values below the specified threshold.
+        threshold_max : float, default=None


I'm not sure whether this argument really makes sense ?
I think I would have a unique threshold argument for this function.

It's because sometimes, we want to have the maximum or the minimum.

see issue ##481

bthirion · 2025-10-10T06:06:36Z

+            Selects features based on a specified percentile of p-values.
+        threshold_max : float, default=0.05
+            Selects features with p-values below the specified maximum threshold (0 to 1).
+        threshold_min : float, default=None


similarly, I don't see any use case for threshold_min here.

The first idea is to propose a generic way of selection.
My first idea of using it is to have a selecting the feature to discard.

see issue ##481

bthirion · 2025-10-10T06:07:36Z

+            Selects features with p-values below the specified maximum threshold (0 to 1).
+        threshold_min : float, default=None
+            Selects features with p-values above the specified minimum threshold (0 to 1).
+        alternative_hypothesis : bool, default=False


I don't see the use case for alternative hypothesis.

This was present in the EnCluDL, I add the option for keeping the same possibilities.

see issue ##481

bthirion · 2025-10-10T06:10:06Z

+        reshaping_function: callable or None, default=None
+            Optional reshaping function for FDR control methods.
+            If None, defaults to sum of reciprocals for 'bhy'.
+        alternative_hippothesis: bool or None, default=False


Same thing here, I don't see any reason to consider an alternative hypothesis. This is because importance tests are all one-sided tests that test whether importance is greater 0 (=significantly different from 0, in that case).

This was present in the EnCluDL, I add the option for keeping the same possibilities.

Yes, but there are good reasons for that: EncluDL yields a signed statistic, not dCRT.

see issue ##481

bthirion · 2025-10-10T06:13:30Z

    random_state=None,
    reuse_screening_model=True,
-    k_best=None,
+    k_lowest=None,


k_lowest is hard to interpret: it only makes sense because we're considering p-values.

Users won't use it if they can't interpret.

For DCRT, only pvalue is considered.

jpaillard

Looks almost ready.

I agree with the comments regarding simplifying the signature of selection functions.
I suggest simplifying the smoke test: one test per function with multiple asserts to explore branching seems enough to me and would cut duplicated code.
It would be good to add an example illustrating how to use the new functions. No need to do it here, but could you open an issue for that?

jpaillard · 2025-10-10T10:47:08Z

+    [0, 2],
+    ids=["default_seed", "another seed"],
+)
+class TestSelection:


The tests in this class are smoke tests and have a lot of duplicated code. I think it would be ok to gather all the smoke tests that explore the different selections in one test, or maybe 2 to separate importance_selection and p_value_selection.

There are not smoke tests because they test the result directly, the values into the array and not only the shape.

It's better to have only one assertion by test. This type of test is a call unit test and they shouldn't be gathered. To group them, I use classes for it.
I don't see the duplication of the code. Each test, test one specific parameter.

see issue #483

jpaillard · 2025-10-10T10:49:35Z

I think it would also be good to have a "behaviour test" with simulated data.
Ideally, in high dimensions, with a method that is not computationally costly, to show that it reduces the number of false discoveries.

In issue #375, the "behaviour tests", also call system test/user acceptance test, were not defined for the moment.
I will open an issue in regard to it.

see issue #484

lionelkusch added 5 commits August 28, 2025 10:48

add method for selection base on FDR

9c2d77c

fix default of the qunatile aggragation

5314c37

fix selection

be837e0

update docstring

1a42592

fix docstring

5854f2e

This was referenced Aug 29, 2025

[FEAT] Add Conditional Randomization Test #359

Draft

[API 2]: Model X knockoff #367

Merged

Add test for 1 test_score

3c08f75

bthirion reviewed Aug 30, 2025

View reviewed changes

Comment thread src/hidimstat/base_variable_importance.py Outdated

lionelkusch commented Sep 1, 2025

View reviewed changes

Comment thread src/hidimstat/base_variable_importance.py Outdated

change the usage of test fdr without aggregation

7f3a117

lionelkusch commented Sep 1, 2025

View reviewed changes

Comment thread src/hidimstat/base_variable_importance.py Outdated

Comment thread src/hidimstat/base_variable_importance.py Outdated

lionelkusch added 2 commits September 1, 2025 18:51

remove a print in test

21250b4

Update selection

17d9d95

lionelkusch added the API 2 label Sep 9, 2025

lionelkusch added 3 commits September 9, 2025 18:53

remove function for knockoff

e8134d8

update selection_fdr

51685e8

fix selection

39ec78f

lionelkusch commented Sep 9, 2025

View reviewed changes

Comment thread src/hidimstat/base_variable_importance.py Outdated

lionelkusch added 7 commits September 10, 2025 11:21

improve selection

f3ff485

fix some part of the selection

817af11

Merge branch 'main' into PR_selection

846296a

fix test

7e256c2

try to fix test

5cc731c

fix seed in generation of data

90e1425

fix docstring

21d0614

bthirion reviewed Sep 10, 2025

View reviewed changes

Comment thread src/hidimstat/base_variable_importance.py Outdated

Comment thread src/hidimstat/base_variable_importance.py Outdated

Comment thread src/hidimstat/base_variable_importance.py Outdated

lionelkusch added 2 commits September 11, 2025 11:17

Fix attribute in base_variable_importance

5e19e1b

change name

c0af81a

lionelkusch requested a review from bthirion October 2, 2025 17:13

change defautl value

3b89e1e

bthirion reviewed Oct 2, 2025

View reviewed changes

jpaillard reviewed Oct 3, 2025

View reviewed changes

Comment thread src/hidimstat/base_variable_importance.py Outdated

Comment thread src/hidimstat/base_variable_importance.py

lionelkusch and others added 12 commits October 3, 2025 14:30

Update src/hidimstat/base_variable_importance.py

79a58b6

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

Update src/hidimstat/base_variable_importance.py

7e5442b

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

Update src/hidimstat/base_variable_importance.py

9da3607

Co-authored-by: bthirion <bertrand.thirion@inria.fr>

Update src/hidimstat/base_variable_importance.py

ed39b3d

Co-authored-by: Joseph Paillard <joseph.paillard@inria.fr>

update following the comments

d86644d

fix bug

9812660

Merge branch 'main' into PR_selection

626e47a

selection one criteria

b28965c

fix tests

c7e8d69

fix format

529d28a

fix k_lowest

b633e15

Merge branch 'main' into PR_selection

b02a2e9

lionelkusch mentioned this pull request Oct 9, 2025

Selection of importance or pvalue with more than 1 dimension. #480

Open

lionelkusch requested review from bthirion and jpaillard October 9, 2025 16:21

bthirion reviewed Oct 10, 2025

View reviewed changes

jpaillard reviewed Oct 10, 2025

View reviewed changes

lionelkusch added 3 commits October 10, 2025 16:35

remove randomization in tests

246bfb6

move all the tests for base importance in one file

62f71a4

fix seed

f10bf06

This was referenced Oct 10, 2025

Simplification of selections #481

Open

Add an example of selection #482

Open

lionelkusch merged commit 894ff9d into mind-inria:main Oct 10, 2025
24 checks passed

lionelkusch deleted the PR_selection branch October 10, 2025 15:05

This was referenced Oct 10, 2025

Group unit test of selection #483

Open

Add "behaviour test" with simulated data for selection #484

Open

Conversation

lionelkusch commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bthirion left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpaillard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lionelkusch Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lionelkusch commented Aug 28, 2025 •

edited

Loading

codecov Bot commented Aug 29, 2025 •

edited

Loading

lionelkusch Oct 10, 2025 •

edited

Loading