Commit 7f0e146
authored
Optimize gpu reductions (#27)
* Add reduction clause to target_teams_distribute
* Add reductions tests for nested for under parallel
* Optimize GPU reductions
- Use a 2-level approach with atomics
- Support DSA_REDUCTION_MUL for nested for directices
* Clean up code1 parent 05827a9 commit 7f0e146
5 files changed
Lines changed: 290 additions & 250 deletions
File tree
- src/numba/openmp
- libs/pass
- tests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7448 | 7448 | | |
7449 | 7449 | | |
7450 | 7450 | | |
7451 | | - | |
| 7451 | + | |
7452 | 7452 | | |
7453 | 7453 | | |
7454 | 7454 | | |
| |||
0 commit comments