Rank function could be faster

**Edit:** I double checked and found `(0.5×⍳+⍸)⍨` is not the same as `TamStat.Util.rank`, but have updated text below to use APLCart's alternative that does return the same result for ties.

I'm reviewing the MannWhitney function for my own interest. I tried implementing the procedure [on Wikipedia (Calculations, method 2)](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test#Calculations) and got:

```apl
 MannWhitneyU←{
     r←(2÷⍨⍋∘⍋+⍒∘⍋∘⌽)⍵,⍺
     r1←+/r↑⍨≢⍵
     r1-0.5×(≢×1+≢)⍵
 }
```

which I've verified with TamStat as obtaining the correct U statistic. Part of the code includes cardinal ranking with an averaged score for ties. From APLCart:

```apl
Rank←2÷⍨⍋∘⍋+⍒∘⍋∘⌽
```

Comparing with `TamStat.Util.rank`

```
      C← 1 3 3 3 3 3 3 3 6
      ]runtime -c "(2÷⍨⍋∘⍋+⍒∘⍋∘⌽)C" "rank C"
                                                                          
  (2÷⍨⍋∘⍋+⍒∘⍋∘⌽)C → 1.0E¯6 |  0% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕   
  rank C          → 1.0E¯6 | +4% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
      C← 1e4⍴0
      ]runtime -c "(2÷⍨⍋∘⍋+⍒∘⍋∘⌽)C" "rank C"
                                                                             
  (2÷⍨⍋∘⍋+⍒∘⍋∘⌽)C → 7.2E¯5 |     0% ⎕                                        
  rank C          → 5.0E¯3 | +6881% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
```

the APLCart-based formulation is much faster. It also now works with actually any array thanks to Total Array Ordering and use of grade instead of comparison functions.

---

**Original translation with averaged rank for ties**

```apl
 MannWhitneyU←{
     r←(0.5×⍳+⍸)⍨s←s[g←⍋s←⍵,⍺]
     r1←+/r[(≢⍵)↑⍋g]
     r1-0.5×(≢×1+≢)⍵
 }
```
which I've verified with TamStat as obtaining the correct U statistic. Part of the code includes cardinal ranking with an averaged score for ties. From APLCart:

```apl
Rank←(0.5×⍳+⍸)⍨
```

Use of interval-index (`⍺⍸⍵`) requires the argument is sorted ascending. Extra work is needed to return the ranks in corresponding values for unordered input:

```apl
Rank←{(⊂⍋g)⌷(0.5×⍳+⍸)⍨⍵[g←⍋⍵]}
```

Comparing this with `TamStat.Util.rank`:

With integers:
```
      C←?1e4⍴9
      ]runtime -c "(⊂⍋g)⌷(0.5×⍳+⍸)⍨C[g←⍋C]" "rank C"
                                                                                      
  (⊂⍋g)⌷(0.5×⍳+⍸)⍨C[g←⍋C] → 6.3E¯5 |      0%                                          
  rank C                  → 6.8E¯3 | +10796% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
```

with floats:
```
      C←?1e4⍴0
      ]runtime -c "(⊂⍋g)⌷(0.5×⍳+⍸)⍨C[g←⍋C]" "rank C"
                                                                                     
  (⊂⍋g)⌷(0.5×⍳+⍸)⍨C[g←⍋C] → 1.5E¯3 |     0% ⎕⎕                                       
  rank C                  → 3.6E¯2 | +2270% ⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕⎕ 
```
the APLCart-based formulation is much faster. It also now works with actually any array thanks to Total Array Ordering and use of grade and set functions instead of comparison functions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rank function could be faster #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Rank function could be faster #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions