Skip to content

[RVV] f16 softmax for rvv#9812

Merged
copybara-service[bot] merged 2 commits intogoogle:masterfrom
ken-unger:f16-softmax2-rvv
Apr 1, 2026
Merged

[RVV] f16 softmax for rvv#9812
copybara-service[bot] merged 2 commits intogoogle:masterfrom
ken-unger:f16-softmax2-rvv

Conversation

@ken-unger
Copy link
Copy Markdown
Contributor

Add rvv kernel for f16-raddstoreexpminusmax (u4v productized)

Tested on qemu & bpi-f3.

f16_raddstoreexpminusmax/rvvfp16arith_rr2_p2_u1v/N:7680/real_time       18150 ns        18098 ns        38459 bytes=1.69258G/s cpufreq=1.6G elements=423.145M/s
f16_raddstoreexpminusmax/rvvfp16arith_rr2_p2_u1v/N:65280/real_time     136772 ns       136277 ns         5118 bytes=1.90916G/s cpufreq=1.6G elements=477.29M/s
f16_raddstoreexpminusmax/rvvfp16arith_rr2_p2_u2v/N:7680/real_time       11620 ns        11601 ns        60322 bytes=2.64369G/s cpufreq=1.6G elements=660.924M/s
f16_raddstoreexpminusmax/rvvfp16arith_rr2_p2_u2v/N:65280/real_time      91274 ns        90820 ns         7689 bytes=2.86085G/s cpufreq=1.6G elements=715.212M/s
f16_raddstoreexpminusmax/rvvfp16arith_rr2_p2_u4v/N:7680/real_time        9968 ns         9956 ns        70297 bytes=3.082G/s cpufreq=1.6G elements=770.5M/s
f16_raddstoreexpminusmax/rvvfp16arith_rr2_p2_u4v/N:65280/real_time      89592 ns        89278 ns         7082 bytes=2.91456G/s cpufreq=1.6G elements=728.639M/s

Note that the change to avx_microkernels cmake/bzl (avx-rr2-p5-nr2-u16) was via tools/update-microkernels.py which fixed a previous unrelated commit (Respect numerical consistency flag in sigmoid configs)

copybara-service bot pushed a commit that referenced this pull request Apr 1, 2026
--
8bc70f6 by Ken Unger <ken.j.unger@gmail.com>:

rvv f16 softmax

--
857fb16 by Ken Unger <ken.j.unger@gmail.com>:

rvv f16 softmax cleanup

FUTURE_COPYBARA_INTEGRATE_REVIEW=#9812 from ken-unger:f16-softmax2-rvv 857fb16
PiperOrigin-RevId: 892525250
@copybara-service copybara-service bot merged commit 8ebc71c into google:master Apr 1, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants