Skip to content

Armv8.1-M: Add MVE Keccak-f1600 x4 implementation #911

Merged
hanno-becker merged 4 commits intomainfrom
mve-keccak-x4
Feb 14, 2026
Merged

Armv8.1-M: Add MVE Keccak-f1600 x4 implementation #911
hanno-becker merged 4 commits intomainfrom
mve-keccak-x4

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer commented Jan 27, 2026

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Jan 27, 2026

CBMC Results (ML-DSA-65)

Full Results (175 proofs)
Proof Status Current Previous Change
**TOTAL** 2484s 2472s +0.5%
polyvecl_pointwise_acc_montgomery_c 270s 257s +5%
mld_attempt_signature_generation 256s 261s -2%
sign_verify_internal 218s 216s +1%
poly_pointwise_montgomery_c 160s 160s +0%
rej_uniform_native 155s 154s +1%
mld_invntt_layer 131s 131s +0%
polyvec_matrix_expand 120s 125s -4%
mld_ct_memcmp 87s 87s +0%
polyvec_matrix_expand_serial 69s 69s +0%
mld_ntt_layer 47s 50s -6%
sign_signature_internal 47s 45s +4%
keccak_squeezeblocks_x4 42s 45s -7%
mld_compute_t0_t1_tr_from_sk_components 24s 26s -8%
rej_uniform 22s 21s +5%
fqmul 20s 19s +5%
rej_uniform_c 20s 19s +5%
poly_uniform_eta_4x 19s 17s +12%
polymat_permute_bitrev_to_custom 19s 20s -5%
poly_chknorm_c 18s 18s +0%
poly_uniform_4x 17s 14s +21%
polyveck_decompose 17s 18s -6%
polyvec_matrix_pointwise_montgomery 16s 14s +14%
mld_polyvecl_permute_bitrev_to_custom_native 15s 19s -21%
polyt0_unpack 14s 17s -18%
polyveck_use_hint 14s 14s +0%
keccak_absorb_once_x4 13s 11s +18%
keccakf1600x4_permute_native 13s 13s +0%
mld_ntt_butterfly_block 12s 12s +0%
polyveck_add 11s 9s +22%
polyvecl_ntt 11s 8s +38%
polyveck_caddq 10s 9s +11%
polyveck_ntt 10s 10s +0%
sign 10s 12s -17%
keccakf1600_permute 9s 9s +0%
keccakf1600_permute_native 9s 9s +0%
mld_check_pct 9s 9s +0%
polyveck_invntt_tomont 9s 12s -25%
mld_compute_pack_z 8s 7s +14%
poly_invntt_tomont_c 8s 9s -11%
polyveck_power2round 8s 7s +14%
polyveck_reduce 8s 8s +0%
polyveck_shiftl 8s 7s +14%
poly_decompose_c 7s 7s +0%
polyveck_sub 7s 6s +17%
keccak_absorb 6s 7s -14%
mld_h 6s 4s +50%
pack_sig_c_h 6s 3s +100%
poly_uniform_eta 6s 6s +0%
poly_uniform_gamma1 6s 4s +50%
poly_use_hint_native 6s 3s +100%
polyeta_unpack 6s 8s -25%
polyveck_pointwise_poly_montgomery 6s 7s -14%
polyvecl_chknorm 6s 5s +20%
rej_eta_native 6s 4s +50%
shake256_init 6s 3s +100%
sign_pk_from_sk 6s 10s -40%
sign_verify_extmu 6s 3s +100%
unpack_sig 6s 4s +50%
intt_native_x86_64 5s 4s +25%
mld_sample_s1_s2_serial 5s 6s -17%
poly_challenge 5s 4s +25%
poly_decompose_native 5s 4s +25%
poly_ntt_c 5s 1s +400%
poly_uniform_gamma1_4x 5s 4s +25%
poly_use_hint_c 5s 4s +25%
polyt1_unpack 5s 6s -17%
polyveck_make_hint 5s 5s +0%
power2round 5s 2s +150%
shake256 5s 3s +67%
shake256_squeeze 5s 2s +150%
sign_keypair 5s 3s +67%
sign_keypair_internal 5s 6s -17%
sign_signature_pre_hash_shake256 5s 2s +150%
sign_verify_pre_hash_shake256 5s 6s -17%
unpack_hints 5s 4s +25%
unpack_pk 5s 4s +25%
keccak_init 4s 3s +33%
keccakf1600_xor_bytes (big endian) 4s 4s +0%
keccakf1600x4_permute 4s 3s +33%
mld_ct_cmask_nonzero_u32 4s 5s -20%
mld_sample_s1_s2 4s 6s -33%
mld_value_barrier_u32 4s 1s +300%
ntt_native_x86_64 4s 3s +33%
poly_add 4s 3s +33%
poly_caddq_c 4s 3s +33%
poly_caddq_native_aarch64 4s 4s +0%
poly_ntt_native 4s 2s +100%
poly_power2round 4s 3s +33%
poly_sub 4s 5s -20%
poly_uniform 4s 2s +100%
poly_use_hint 4s 4s +0%
polyt1_pack 4s 4s +0%
polyveck_chknorm 4s 7s -43%
polyveck_unpack_t0 4s 4s +0%
polyvecl_pack_eta 4s 4s +0%
polyvecl_pointwise_acc_montgomery 4s 3s +33%
polyvecl_unpack_z 4s 4s +0%
polyz_unpack_c 4s 4s +0%
rej_eta 4s 2s +100%
shake128x4_absorb_once 4s 3s +33%
shake256x4_absorb_once 4s 3s +33%
sign_open 4s 3s +33%
sign_signature 4s 5s -20%
sign_signature_pre_hash_internal 4s 7s -43%
sign_verify_pre_hash_internal 4s 5s -20%
unpack_sk 4s 3s +33%
caddq 3s 3s +0%
decompose 3s 2s +50%
keccak_finalize 3s 2s +50%
keccak_squeeze 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 1s +200%
keccakf1600x4_xor_bytes 3s 5s -40%
mld_ct_abs_i32 3s 2s +50%
mld_ct_cmask_neg_i32 3s 3s +0%
mld_ct_get_optblocker_u32 3s 3s +0%
mld_ct_get_optblocker_u8 3s 5s -40%
mld_ct_sel_int32 3s 3s +0%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_prepare_domain_separation_prefix 3s 4s -25%
mld_value_barrier_u8 3s 2s +50%
montgomery_reduce 3s 3s +0%
pack_sk 3s 2s +50%
poly_caddq_native 3s 4s -25%
poly_chknorm_native 3s 2s +50%
poly_decompose 3s 2s +50%
poly_invntt_tomont_native 3s 2s +50%
poly_make_hint 3s 4s -25%
poly_pointwise_montgomery 3s 3s +0%
poly_pointwise_montgomery_native 3s 2s +50%
polyt0_pack 3s 5s -40%
polyveck_pack_eta 3s 3s +0%
polyveck_pack_w1 3s 4s -25%
polyveck_unpack_eta 3s 4s -25%
polyvecl_permute_bitrev_to_custom 3s 2s +50%
polyvecl_pointwise_acc_montgomery_native 3s 4s -25%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyvecl_unpack_eta 3s 2s +50%
polyw1_pack 3s 4s -25%
polyz_pack 3s 3s +0%
polyz_unpack 3s 3s +0%
polyz_unpack_native 3s 2s +50%
shake128x4_squeezeblocks 3s 2s +50%
sign_verify 3s 4s -25%
use_hint 3s 4s -25%
fqscale 2s 1s +100%
keccakf1600_xor_bytes 2s 3s -33%
keccakf1600x4_extract_bytes 2s 1s +100%
make_hint 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_value_barrier_i64 2s 3s -33%
pack_pk 2s 3s -33%
pack_sig_z 2s 5s -60%
poly_caddq 2s 3s -33%
poly_chknorm 2s 4s -50%
poly_invntt_tomont 2s 3s -33%
poly_ntt 2s 2s +0%
poly_reduce 2s 3s -33%
poly_shiftl 2s 5s -60%
polyeta_pack 2s 5s -60%
polyveck_pack_t0 2s 3s -33%
polyvecl_uniform_gamma1 2s 4s -50%
reduce32 2s 3s -33%
rej_eta_c 2s 3s -33%
shake128_absorb 2s 5s -60%
shake128_finalize 2s 6s -67%
shake128_init 2s 4s -50%
shake128_release 2s 3s -33%
shake128_squeeze 2s 3s -33%
shake256_finalize 2s 1s +100%
shake256_release 2s 3s -33%
shake256x4_squeezeblocks 2s 2s +0%
sign_signature_extmu 2s 5s -60%
mld_ct_get_optblocker_i64 1s 2s -50%
shake256_absorb 1s 2s -50%
sys_check_capability 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Jan 27, 2026

CBMC Results (ML-DSA-44)

⚠️ Attention Required

Proof Status Current Previous Change
rej_uniform_c ⚠️ 24s 16s +50%
Full Results (175 proofs)
Proof Status Current Previous Change
**TOTAL** 2509s 2021s +24.1%
mld_attempt_signature_generation 375s 304s +23%
polyvecl_pointwise_acc_montgomery_c 282s 197s +43%
sign_verify_internal 219s 188s +16%
poly_pointwise_montgomery_c 191s 129s +48%
rej_uniform_native 171s 140s +22%
mld_invntt_layer 142s 115s +23%
mld_ct_memcmp 110s 78s +41%
mld_ntt_layer 52s 43s +21%
keccak_squeezeblocks_x4 48s 42s +14%
fqmul 26s 20s +30%
sign_signature_internal 26s 20s +30%
rej_uniform_c ⚠️ 24s 16s +50%
rej_uniform 23s 21s +10%
poly_uniform_eta_4x 20s 19s +5%
mld_compute_t0_t1_tr_from_sk_components 18s 15s +20%
poly_chknorm_c 18s 14s +29%
keccakf1600x4_permute_native 16s 12s +33%
mld_ntt_butterfly_block 16s 13s +23%
polymat_permute_bitrev_to_custom 16s 15s +7%
polyt0_unpack 16s 13s +23%
poly_uniform_4x 15s 12s +25%
polyvec_matrix_expand 15s 14s +7%
mld_polyvecl_permute_bitrev_to_custom_native 14s 13s +8%
polyeta_unpack 14s 12s +17%
polyz_unpack_c 14s 10s +40%
keccak_absorb_once_x4 13s 10s +30%
poly_invntt_tomont_c 10s 7s +43%
polyveck_add 9s 7s +29%
sign_pk_from_sk 9s 6s +50%
keccakf1600_permute_native 8s 11s -27%
mld_compute_pack_z 8s 7s +14%
polyveck_use_hint 8s 6s +33%
polyvecl_chknorm 8s 5s +60%
polyvecl_ntt 8s 8s +0%
keccak_absorb 7s 5s +40%
keccakf1600_permute 7s 8s -12%
mld_check_pct 7s 8s -12%
polyvec_matrix_expand_serial 7s 8s -12%
polyvec_matrix_pointwise_montgomery 7s 5s +40%
polyveck_ntt 7s 9s -22%
polyveck_power2round 7s 5s +40%
polyveck_reduce 7s 8s -12%
polyveck_unpack_t0 7s 3s +133%
polyvecl_unpack_z 7s 4s +75%
sign 7s 6s +17%
mld_prepare_domain_separation_prefix 6s 4s +50%
mld_sample_s1_s2_serial 6s 4s +50%
poly_power2round 6s 2s +200%
poly_use_hint_c 6s 4s +50%
polyt0_pack 6s 7s -14%
polyveck_caddq 6s 4s +50%
polyveck_decompose 6s 5s +20%
polyveck_pack_w1 6s 2s +200%
polyveck_pointwise_poly_montgomery 6s 5s +20%
sign_signature 6s 6s +0%
sign_signature_extmu 6s 3s +100%
sys_check_capability 6s 4s +50%
unpack_hints 6s 6s +0%
unpack_sk 6s 3s +100%
intt_native_x86_64 5s 2s +150%
keccak_finalize 5s 2s +150%
montgomery_reduce 5s 2s +150%
pack_sig_c_h 5s 2s +150%
poly_add 5s 4s +25%
poly_caddq_native 5s 2s +150%
poly_caddq_native_aarch64 5s 3s +67%
poly_decompose 5s 3s +67%
poly_ntt 5s 2s +150%
poly_uniform_eta 5s 4s +25%
poly_uniform_gamma1 5s 3s +67%
poly_uniform_gamma1_4x 5s 4s +25%
polyveck_shiftl 5s 4s +25%
polyveck_unpack_eta 5s 3s +67%
polyvecl_pointwise_acc_montgomery_native 5s 4s +25%
polyw1_pack 5s 4s +25%
rej_eta 5s 2s +150%
sign_open 5s 5s +0%
sign_verify 5s 6s -17%
caddq 4s 3s +33%
keccak_init 4s 6s -33%
keccakf1600x4_permute 4s 1s +300%
mld_ct_cmask_nonzero_u32 4s 4s +0%
mld_sample_s1_s2 4s 7s -43%
mld_value_barrier_u32 4s 2s +100%
pack_pk 4s 3s +33%
poly_chknorm 4s 2s +100%
poly_decompose_c 4s 3s +33%
poly_invntt_tomont_native 4s 2s +100%
poly_ntt_native 4s 4s +0%
poly_sub 4s 4s +0%
poly_uniform 4s 3s +33%
poly_use_hint 4s 2s +100%
poly_use_hint_native 4s 4s +0%
polyt1_pack 4s 3s +33%
polyveck_chknorm 4s 2s +100%
polyveck_invntt_tomont 4s 5s -20%
polyveck_make_hint 4s 5s -20%
polyveck_pack_eta 4s 2s +100%
polyveck_pack_t0 4s 2s +100%
polyveck_sub 4s 3s +33%
polyvecl_pointwise_acc_montgomery 4s 5s -20%
polyvecl_uniform_gamma1 4s 5s -20%
polyz_pack 4s 2s +100%
rej_eta_c 4s 5s -20%
rej_eta_native 4s 4s +0%
shake128x4_squeezeblocks 4s 2s +100%
shake256_init 4s 2s +100%
sign_keypair_internal 4s 2s +100%
sign_signature_pre_hash_internal 4s 3s +33%
sign_signature_pre_hash_shake256 4s 5s -20%
sign_verify_extmu 4s 6s -33%
sign_verify_pre_hash_internal 4s 2s +100%
fqscale 3s 1s +200%
keccak_squeeze 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
make_hint 3s 2s +50%
mld_ct_abs_i32 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 4s -25%
mld_ct_get_optblocker_i64 3s 2s +50%
mld_ct_get_optblocker_u32 3s 2s +50%
mld_h 3s 4s -25%
ntt_native_x86_64 3s 3s +0%
poly_caddq 3s 3s +0%
poly_challenge 3s 3s +0%
poly_decompose_native 3s 2s +50%
poly_make_hint 3s 3s +0%
poly_ntt_c 3s 5s -40%
poly_pointwise_montgomery_native 3s 2s +50%
polyeta_pack 3s 5s -40%
polyt1_unpack 3s 3s +0%
polyvecl_pack_eta 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyvecl_unpack_eta 3s 4s -25%
reduce32 3s 2s +50%
shake128_absorb 3s 3s +0%
shake128_release 3s 4s -25%
shake128_squeeze 3s 3s +0%
shake128x4_absorb_once 3s 3s +0%
shake256 3s 1s +200%
shake256_finalize 3s 2s +50%
shake256_release 3s 2s +50%
shake256x4_squeezeblocks 3s 1s +200%
sign_keypair 3s 3s +0%
unpack_sig 3s 3s +0%
use_hint 3s 3s +0%
decompose 2s 3s -33%
keccakf1600_xor_bytes 2s 2s +0%
keccakf1600_xor_bytes (big endian) 2s 4s -50%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_sel_int32 2s 2s +0%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_value_barrier_i64 2s 2s +0%
mld_value_barrier_u8 2s 2s +0%
pack_sig_z 2s 5s -60%
pack_sk 2s 2s +0%
poly_caddq_c 2s 3s -33%
poly_chknorm_native 2s 3s -33%
poly_invntt_tomont 2s 5s -60%
poly_pointwise_montgomery 2s 1s +100%
poly_reduce 2s 4s -50%
poly_shiftl 2s 5s -60%
polyvecl_permute_bitrev_to_custom 2s 3s -33%
polyz_unpack 2s 3s -33%
polyz_unpack_native 2s 4s -50%
power2round 2s 2s +0%
shake128_finalize 2s 1s +100%
shake256_absorb 2s 2s +0%
shake256_squeeze 2s 3s -33%
shake256x4_absorb_once 2s 3s -33%
sign_verify_pre_hash_shake256 2s 3s -33%
unpack_pk 2s 3s -33%
keccakf1600x4_extract_bytes 1s 2s -50%
mld_ct_cmask_neg_i32 1s 3s -67%
mld_ct_get_optblocker_u8 1s 2s -50%
shake128_init 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Jan 27, 2026

CBMC Results (ML-DSA-87)

Full Results (175 proofs)
Proof Status Current Previous Change
**TOTAL** 2487s 2583s -3.7%
sign_verify_internal 313s 316s -1%
mld_attempt_signature_generation 228s 230s -1%
polyvecl_pointwise_acc_montgomery_c 187s 186s +1%
polyvec_matrix_expand 162s 168s -4%
rej_uniform_native 141s 154s -8%
poly_pointwise_montgomery_c 135s 154s -12%
mld_invntt_layer 122s 125s -2%
polyvec_matrix_expand_serial 107s 111s -4%
mld_ct_memcmp 81s 86s -6%
sign_signature_internal 77s 75s +3%
mld_ntt_layer 44s 49s -10%
keccak_squeezeblocks_x4 42s 45s -7%
mld_compute_t0_t1_tr_from_sk_components 25s 25s +0%
polymat_permute_bitrev_to_custom 25s 24s +4%
rej_uniform 23s 26s -12%
fqmul 20s 21s -5%
rej_uniform_c 20s 18s +11%
poly_chknorm_c 18s 18s +0%
poly_uniform_eta_4x 16s 18s -11%
polyveck_add 16s 14s +14%
poly_uniform_4x 14s 16s -12%
polyeta_unpack 14s 13s +8%
polyt0_unpack 14s 14s +0%
keccak_absorb_once_x4 13s 12s +8%
mld_ntt_butterfly_block 13s 13s +0%
polyveck_power2round 13s 13s +0%
keccakf1600x4_permute_native 12s 12s +0%
polyvec_matrix_pointwise_montgomery 12s 12s +0%
polyveck_reduce 12s 9s +33%
sign_pk_from_sk 11s 9s +22%
mld_check_pct 10s 9s +11%
mld_compute_pack_z 10s 11s -9%
poly_decompose_c 10s 10s +0%
polyveck_invntt_tomont 10s 6s +67%
polyveck_use_hint 10s 10s +0%
keccakf1600_permute 9s 7s +29%
polyveck_caddq 9s 9s +0%
polyveck_chknorm 9s 8s +12%
polyveck_shiftl 9s 6s +50%
mld_polyvecl_permute_bitrev_to_custom_native 8s 9s -11%
mld_sample_s1_s2_serial 8s 10s -20%
poly_invntt_tomont_c 8s 9s -11%
polyveck_ntt 8s 7s +14%
polyveck_pointwise_poly_montgomery 8s 7s +14%
polyvecl_ntt 8s 8s +0%
sign 8s 8s +0%
sign_signature 8s 3s +167%
polyveck_decompose 7s 8s -12%
caddq 6s 3s +100%
keccakf1600_permute_native 6s 8s -25%
mld_sample_s1_s2 6s 8s -25%
sign_open 6s 2s +200%
unpack_hints 6s 6s +0%
keccak_squeeze 5s 4s +25%
mld_ct_get_optblocker_i64 5s 3s +67%
poly_challenge 5s 6s -17%
poly_ntt 5s 3s +67%
poly_uniform 5s 6s -17%
poly_use_hint 5s 2s +150%
poly_use_hint_c 5s 3s +67%
polyt0_pack 5s 4s +25%
polyveck_pack_eta 5s 3s +67%
polyveck_sub 5s 7s -29%
polyvecl_chknorm 5s 6s -17%
rej_eta_c 5s 4s +25%
rej_eta_native 5s 4s +25%
sign_signature_pre_hash_shake256 5s 2s +150%
unpack_sk 5s 5s +0%
decompose 4s 4s +0%
intt_native_x86_64 4s 5s -20%
keccak_absorb 4s 5s -20%
pack_pk 4s 5s -20%
pack_sig_c_h 4s 5s -20%
poly_add 4s 3s +33%
poly_chknorm 4s 2s +100%
poly_ntt_native 4s 2s +100%
poly_sub 4s 2s +100%
poly_uniform_eta 4s 5s -20%
poly_uniform_gamma1 4s 4s +0%
poly_uniform_gamma1_4x 4s 4s +0%
polyt1_pack 4s 4s +0%
polyt1_unpack 4s 5s -20%
polyveck_make_hint 4s 7s -43%
polyveck_unpack_eta 4s 3s +33%
polyvecl_permute_bitrev_to_custom 4s 2s +100%
polyvecl_pointwise_acc_montgomery_native 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 4s +0%
polyz_unpack_c 4s 6s -33%
polyz_unpack_native 4s 3s +33%
rej_eta 4s 3s +33%
sign_verify_pre_hash_internal 4s 5s -20%
sign_verify_pre_hash_shake256 4s 5s -20%
unpack_sig 4s 8s -50%
fqscale 3s 3s +0%
keccak_finalize 3s 4s -25%
keccak_init 3s 4s -25%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
keccakf1600x4_extract_bytes 3s 1s +200%
keccakf1600x4_permute 3s 3s +0%
make_hint 3s 3s +0%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_ct_sel_int32 3s 3s +0%
mld_keccakf1600_extract_bytes 3s 3s +0%
mld_prepare_domain_separation_prefix 3s 4s -25%
mld_value_barrier_i64 3s 3s +0%
ntt_native_x86_64 3s 2s +50%
pack_sig_z 3s 4s -25%
pack_sk 3s 2s +50%
poly_chknorm_native 3s 3s +0%
poly_decompose 3s 4s -25%
poly_decompose_native 3s 4s -25%
poly_make_hint 3s 5s -40%
poly_ntt_c 3s 4s -25%
poly_power2round 3s 2s +50%
polyveck_pack_t0 3s 2s +50%
polyveck_unpack_t0 3s 5s -40%
polyvecl_pointwise_acc_montgomery 3s 4s -25%
polyvecl_uniform_gamma1 3s 4s -25%
polyw1_pack 3s 2s +50%
polyz_unpack 3s 3s +0%
power2round 3s 1s +200%
reduce32 3s 5s -40%
shake128_release 3s 5s -40%
shake128x4_absorb_once 3s 3s +0%
shake128x4_squeezeblocks 3s 3s +0%
shake256_init 3s 2s +50%
sign_keypair_internal 3s 6s -50%
sign_signature_extmu 3s 5s -40%
sign_verify_extmu 3s 7s -57%
unpack_pk 3s 5s -40%
keccakf1600x4_xor_bytes 2s 2s +0%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_u32 2s 3s -33%
mld_h 2s 3s -33%
mld_value_barrier_u32 2s 2s +0%
mld_value_barrier_u8 2s 2s +0%
poly_caddq 2s 2s +0%
poly_caddq_c 2s 3s -33%
poly_caddq_native 2s 3s -33%
poly_caddq_native_aarch64 2s 3s -33%
poly_invntt_tomont 2s 2s +0%
poly_invntt_tomont_native 2s 3s -33%
poly_pointwise_montgomery 2s 3s -33%
poly_pointwise_montgomery_native 2s 2s +0%
poly_reduce 2s 2s +0%
poly_shiftl 2s 4s -50%
poly_use_hint_native 2s 4s -50%
polyveck_pack_w1 2s 3s -33%
polyvecl_pack_eta 2s 4s -50%
polyvecl_unpack_eta 2s 5s -60%
polyz_pack 2s 4s -50%
shake128_finalize 2s 4s -50%
shake128_init 2s 3s -33%
shake128_squeeze 2s 2s +0%
shake256 2s 3s -33%
shake256_absorb 2s 2s +0%
shake256_release 2s 3s -33%
shake256_squeeze 2s 3s -33%
shake256x4_absorb_once 2s 2s +0%
sign_keypair 2s 5s -60%
sign_signature_pre_hash_internal 2s 5s -60%
sign_verify 2s 4s -50%
sys_check_capability 2s 4s -50%
use_hint 2s 3s -33%
mld_ct_abs_i32 1s 2s -50%
mld_ct_cmask_nonzero_u32 1s 1s +0%
montgomery_reduce 1s 3s -67%
polyeta_pack 1s 4s -75%
polyvecl_unpack_z 1s 5s -80%
shake128_absorb 1s 2s -50%
shake256_finalize 1s 3s -67%
shake256x4_squeezeblocks 1s 2s -50%

…ends only

Unit tests for Backends not support arthmetic do not use various i32 helper
functions resulting in unused function warnings.
This commit fixes that by introducing appropriate guards.

chknorm is an outlier here - it only uses generate_i32_array_ranged, but not
the other functions.
We, hence, need 3 different guards that include/exclude chknorm accordingly.

- Port of pq-code-package/mlkem-native@33c4af5

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
mkannwischer and others added 3 commits February 14, 2026 04:18
Test both optimized and non-optimized builds on M55-AN547.

- Port of pq-code-package/mlkem-native@4215daf

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Add 4-way parallel Keccak-f1600 permutation for Armv8.1-M with MVE,
using bit-interleaved state representation.

- Add keccak_f1600_x4_mve.S: MVE assembly for 4-way Keccak
- Add keccak_f1600_x4_mve.c: C wrapper with temporary bit-interleaving
  (to be eliminated once we have XORBytes and ExtractBytes implementations
   handling the bitinterleaving)
- Adjust simpasm to support Armv8.1-M Thumb assembly simplification

- Resolves #908

- Port of pq-code-package/mlkem-native@065c735

Co-Authored-By: Brendan Moran <brendan.moran@arm.com>
Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
The Armv8.1-M + MVE backend is still in active development and has
not undergone the same level of audit as the rest of the code.

This commit extends the documentation to make this clear.

The commit also disables the Armv8.1-M + MVE backend by default,
and instead explicitly enables it in the an547 baremetal Makefile.

- Port of pq-code-package/mlkem-native@9d2f1c2

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a faithful port. I tested it locally and confirmed that the assembly is indeed being used. A minor inconvenience is that CTRL-C does not work to interrupt QEMU (and hence, the tests), but that is not specific to this PR I believe and can be addressed separately.

@hanno-becker hanno-becker merged commit 39fe995 into main Feb 14, 2026
735 of 736 checks passed
@hanno-becker hanno-becker deleted the mve-keccak-x4 branch February 14, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build failure for test_unit.c on target without native NTT Port: Armv8.1-M: Add MVE Keccak-f1600 x4 implementation

3 participants