Skip to content

Commit a89002f

Browse files
ggml webgpu: support for backend sampling (ggml-org#18880)
* ggml webgpu: add SOFTPLUS unary operator Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern * ggml webgpu: add EXPM1 unary operator Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add FLOOR unary operator Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add CEIL unary operator Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add ROUND unary operator Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add TRUNC unary operator Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS) * Updates to webgpu get_memory * Add argmax * Add argmax,cumsum,sum,sum_rows * Add necessary CPY/GET_ROWS operators * Support for argsort using multi-pass strategy * Update set_rows for i32 indices, move to pre-wgsl * Port unary operators to pre-wgsl and support FILL * Implement PAD * Add support for top-k * clean up, scope pipeline init mutex * fix newline * Add support for log * Update LOG for better precision, and ops doc --------- Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com>
1 parent 388ce82 commit a89002f

14 files changed

Lines changed: 9453 additions & 8509 deletions

docs/ops.md

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@ Legend:
2020
| ADD1 ||||||||||||
2121
| ADD_ID ||||||||||||
2222
| ARANGE ||||||||||||
23-
| ARGMAX ||||||||| |||
24-
| ARGSORT |||||| 🟡 | 🟡 || |||
23+
| ARGMAX ||||||||| |||
24+
| ARGSORT |||||| 🟡 | 🟡 || |||
2525
| CEIL |||| 🟡 ||| 🟡 | 🟡 ||||
26-
| CLAMP ||||| 🟡 | 🟡 || 🟡 | |||
26+
| CLAMP ||||| 🟡 | 🟡 || 🟡 | |||
2727
| CONCAT |||| 🟡 || 🟡 ||||||
2828
| CONT || 🟡 |||| 🟡 | 🟡 || 🟡 |||
2929
| CONV_2D ||||||||||||
@@ -36,17 +36,17 @@ Legend:
3636
| CPY || 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 |||
3737
| CROSS_ENTROPY_LOSS ||||||||||||
3838
| CROSS_ENTROPY_LOSS_BACK ||||||||||||
39-
| CUMSUM ||||||||| |||
39+
| CUMSUM ||||||||| |||
4040
| DIAG ||||||||||||
4141
| DIAG_MASK_INF |||||| 🟡 ||||||
4242
| DIV ||||| 🟡 |||||||
4343
| DUP |||| 🟡 | 🟡 | 🟡 ||||||
4444
| ELU |||| 🟡 | 🟡 |||||||
4545
| EXP |||| 🟡 | 🟡 ||| 🟡 ||||
46-
| EXPM1 |||| 🟡 | 🟡 |||| |||
47-
| FILL ||||||||| |||
48-
| FLASH_ATTN_EXT || 🟡 || 🟡 | 🟡 | 🟡 || 🟡 | |||
49-
| FLOOR |||| 🟡 ||| 🟡 | 🟡 | |||
46+
| EXPM1 |||| 🟡 | 🟡 |||| |||
47+
| FILL ||||||||| |||
48+
| FLASH_ATTN_EXT || 🟡 || 🟡 | 🟡 | 🟡 || 🟡 | 🟡 |||
49+
| FLOOR |||| 🟡 ||| 🟡 | 🟡 | |||
5050
| GATED_LINEAR_ATTN ||||||||||||
5151
| GEGLU ||||| 🟡 ||| 🟡 ||||
5252
| GEGLU_ERF ||||| 🟡 ||| 🟡 ||||
@@ -63,7 +63,7 @@ Legend:
6363
| IM2COL_3D ||||||||||||
6464
| L2_NORM ||||||||||||
6565
| LEAKY_RELU ||||| 🟡 ||| 🟡 ||||
66-
| LOG ||||| 🟡 |||| |||
66+
| LOG ||||| 🟡 |||| |||
6767
| MEAN ||||||||||||
6868
| MUL ||||| 🟡 |||||||
6969
| MUL_MAT | 🟡 | 🟡 | 🟡 | 🟡 || 🟡 | 🟡 | 🟡 | 🟡 | 🟡 | 🟡 |
@@ -73,8 +73,9 @@ Legend:
7373
| OPT_STEP_ADAMW ||||||||||||
7474
| OPT_STEP_SGD ||||||||||||
7575
| OUT_PROD | 🟡 | 🟡 | 🟡 | 🟡 ||| 🟡 |||| 🟡 |
76-
| PAD || 🟡 || 🟡 | 🟡 | 🟡 | 🟡 || |||
76+
| PAD || 🟡 || 🟡 | 🟡 | 🟡 | 🟡 || |||
7777
| PAD_REFLECT_1D ||||||||||||
78+
| POOL_1D ||||||||||||
7879
| POOL_2D || 🟡 ||||||||||
7980
| REGLU ||||| 🟡 ||| 🟡 ||||
8081
| RELU |||| 🟡 | 🟡 | 🟡 || 🟡 ||||
@@ -85,7 +86,7 @@ Legend:
8586
| ROLL ||||||||||||
8687
| ROPE ||||||||||||
8788
| ROPE_BACK ||||||||||||
88-
| ROUND |||| 🟡 ||| 🟡 | 🟡 | |||
89+
| ROUND |||| 🟡 ||| 🟡 | 🟡 | |||
8990
| RWKV_WKV6 ||||||||||||
9091
| RWKV_WKV7 ||||||||||||
9192
| SCALE || 🟡 ||||||||||
@@ -96,7 +97,7 @@ Legend:
9697
| SILU |||| 🟡 | 🟡 | 🟡 || 🟡 ||||
9798
| SILU_BACK ||||||||||||
9899
| SIN ||||| 🟡 ||| 🟡 ||||
99-
| SOFTPLUS |||| 🟡 | 🟡 ||| 🟡 | |||
100+
| SOFTPLUS |||| 🟡 | 🟡 ||| 🟡 | |||
100101
| SOFT_MAX || 🟡 ||||||||||
101102
| SOFT_MAX_BACK ||| 🟡 | 🟡 ||| 🟡 |||||
102103
| SOLVE_TRI |||| 🟡 |||| 🟡 ||||
@@ -106,14 +107,14 @@ Legend:
106107
| SSM_SCAN |||||||| 🟡 ||||
107108
| STEP |||| 🟡 | 🟡 ||| 🟡 ||||
108109
| SUB ||||| 🟡 |||||||
109-
| SUM || 🟡 || 🟡 | 🟡 || 🟡 | 🟡 | |||
110-
| SUM_ROWS |||| 🟡 || 🟡 | 🟡 || |||
110+
| SUM || 🟡 || 🟡 | 🟡 || 🟡 | 🟡 | 🟡 |||
111+
| SUM_ROWS |||| 🟡 || 🟡 | 🟡 || |||
111112
| SWIGLU ||||| 🟡 ||| 🟡 ||||
112113
| SWIGLU_OAI |||||||| 🟡 ||||
113114
| TANH |||| 🟡 | 🟡 ||| 🟡 ||||
114115
| TIMESTEP_EMBEDDING ||||||||||||
115-
| TOP_K |||||||| 🟡 | |||
116+
| TOP_K |||||||| 🟡 | |||
116117
| TRI ||||||||||||
117-
| TRUNC |||| 🟡 ||| 🟡 | 🟡 | |||
118+
| TRUNC |||| 🟡 ||| 🟡 | 🟡 | |||
118119
| UPSCALE || 🟡 ||| 🟡 | 🟡 | 🟡 | 🟡 ||||
119120
| XIELU ||||||||||||

0 commit comments

Comments
 (0)