wip: add statistical tests between histograms#503
wip: add statistical tests between histograms#503ikrommyd wants to merge 23 commits intoscikit-hep:mainfrom
Conversation
|
I think so far all 4 implemented tests have a decent core functionality. I'm saying this by running toys and looking at the pvalue distributions because they should be uniform(0, 1) under the null hypothesis. |
|
@nsmith- I don't think having an option to consider flow bins in ks_1samp tests makes sense right? |
|
I'm thinking having |
Hmmm, for chi2 tests it does. You are including the observed number of counts in the flow bins in your tests statistic and comparing vs the expected number of counts (in one sample tests) or with the observed number of counts of another histogram (in two sample tests). |
|
By the way, pre-commit is complaining for "Too many public methods (23/20)" in basehist. |
|
@fabriceMUKARAGE Why did you change it from draft? |
|
@iasonkrom. oooh, that was a mistake I made when checking the unsuccessful checks. My bad, can we bring it back to the draft? |
|
Yup it's back to draft. I was just scared about an accidental merge. Don't worry about failing pytests though. I'm changing things in the statistics and therefore the tests fail if I don't update the values there. But the "real" tests are ran offline where I run toy models to see if my statistics are correct. Proper pytests will be written later since I'm tired of chaging them all the time to pass |
|
I see it's back to draft mode now. That's true, the proper pytests can be done later, thanks for the insights. |
Added pytest.importorskip("mplhep")
Added pytest.importorskip("mplhep")
| ) | ||
|
|
||
| if mode == "exact": | ||
| success, d, pvalue = _attempt_exact_2kssamp(n1, n2, g, d, alternative) |
There was a problem hiding this comment.
FYI, this really worries me. Using a hidden object is a really bad idea - see copier-org/copier#1225 for the most recent time I've seen someone broken by this! Is it possible to avoid this?
There was a problem hiding this comment.
If you mean the function written by scipy, I was trying to avoid writing too many lines since after defining the test statistic from the histograms, performing the test is the same. I could write a similar function and have it here in hist but it would be mainly done by copying code from scipy
There was a problem hiding this comment.
I've just added the code in the stats.py file instead of importing from scipy. Do you think this is resolved?
I'm starting an early PR to add statistical tests to histograms after this feature was requested in #500 and I actually believe it is a good and useful feature.
This is an early WIP PR but I'm starting it now in order to make it publicly visible and ask for feedback on the statistics.