Edit: I double checked and found (0.5×⍳+⍸)⍨ is not the same as TamStat.Util.rank, but have updated text below to use APLCart's alternative that does return the same result for ties.
I'm reviewing the MannWhitney function for my own interest. I tried implementing the procedure on Wikipedia (Calculations, method 2) and got:
MannWhitneyU←{
r←(2÷⍨⍋∘⍋+⍒∘⍋∘⌽)⍵,⍺
r1←+/r↑⍨≢⍵
r1-0.5×(≢×1+≢)⍵
}
which I've verified with TamStat as obtaining the correct U statistic. Part of the code includes cardinal ranking with an averaged score for ties. From APLCart:
Comparing with TamStat.Util.rank
C← 1 3 3 3 3 3 3 3 6
]runtime -c "(2÷⍨⍋∘⍋+⍒∘⍋∘⌽)C" "rank C"
(2÷⍨⍋∘⍋+⍒∘⍋∘⌽)C → 1.0E¯6 | 0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
rank C → 1.0E¯6 | +4% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
C← 1e4⍴0
]runtime -c "(2÷⍨⍋∘⍋+⍒∘⍋∘⌽)C" "rank C"
(2÷⍨⍋∘⍋+⍒∘⍋∘⌽)C → 7.2E¯5 | 0% ⎕
rank C → 5.0E¯3 | +6881% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
the APLCart-based formulation is much faster. It also now works with actually any array thanks to Total Array Ordering and use of grade instead of comparison functions.
Original translation with averaged rank for ties
MannWhitneyU←{
r←(0.5×⍳+⍸)⍨s←s[g←⍋s←⍵,⍺]
r1←+/r[(≢⍵)↑⍋g]
r1-0.5×(≢×1+≢)⍵
}
which I've verified with TamStat as obtaining the correct U statistic. Part of the code includes cardinal ranking with an averaged score for ties. From APLCart:
Use of interval-index (⍺⍸⍵) requires the argument is sorted ascending. Extra work is needed to return the ranks in corresponding values for unordered input:
Rank←{(⊂⍋g)⌷(0.5×⍳+⍸)⍨⍵[g←⍋⍵]}
Comparing this with TamStat.Util.rank:
With integers:
C←?1e4⍴9
]runtime -c "(⊂⍋g)⌷(0.5×⍳+⍸)⍨C[g←⍋C]" "rank C"
(⊂⍋g)⌷(0.5×⍳+⍸)⍨C[g←⍋C] → 6.3E¯5 | 0%
rank C → 6.8E¯3 | +10796% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
with floats:
C←?1e4⍴0
]runtime -c "(⊂⍋g)⌷(0.5×⍳+⍸)⍨C[g←⍋C]" "rank C"
(⊂⍋g)⌷(0.5×⍳+⍸)⍨C[g←⍋C] → 1.5E¯3 | 0% ⎕⎕
rank C → 3.6E¯2 | +2270% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕
the APLCart-based formulation is much faster. It also now works with actually any array thanks to Total Array Ordering and use of grade and set functions instead of comparison functions.
Edit: I double checked and found
(0.5×⍳+⍸)⍨is not the same asTamStat.Util.rank, but have updated text below to use APLCart's alternative that does return the same result for ties.I'm reviewing the MannWhitney function for my own interest. I tried implementing the procedure on Wikipedia (Calculations, method 2) and got:
which I've verified with TamStat as obtaining the correct U statistic. Part of the code includes cardinal ranking with an averaged score for ties. From APLCart:
Comparing with
TamStat.Util.rankthe APLCart-based formulation is much faster. It also now works with actually any array thanks to Total Array Ordering and use of grade instead of comparison functions.
Original translation with averaged rank for ties
which I've verified with TamStat as obtaining the correct U statistic. Part of the code includes cardinal ranking with an averaged score for ties. From APLCart:
Use of interval-index (
⍺⍸⍵) requires the argument is sorted ascending. Extra work is needed to return the ranks in corresponding values for unordered input:Comparing this with
TamStat.Util.rank:With integers:
with floats:
the APLCart-based formulation is much faster. It also now works with actually any array thanks to Total Array Ordering and use of grade and set functions instead of comparison functions.