Skip to content

CBMC: Refine bounds for input and output of base multiplication#906

Draft
hanno-becker wants to merge 8 commits intomainfrom
refine_bounds
Draft

CBMC: Refine bounds for input and output of base multiplication#906
hanno-becker wants to merge 8 commits intomainfrom
refine_bounds

Conversation

@hanno-becker
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker commented Mar 24, 2025

Previously, the base multiplication would assume that one of its inputs is bound by 4096 in absolute value, but make no assumptions about the other input ("b-input" henceforth) and its mulcache.

This commit refines the bounds slightly, as follows:

  • The b-input is assumed to be bound by MLK_NTT_BOUND in absolute value. This comes for free since all values for b are results of the NTT.
  • The b-cache-input is assumed to be bound by MLKEM_Q in absolute value.

With those additional bounds in place, it can be showed that the result of the base multiplication is below INT16_MAX/2 in absolute value. Accordingly, this can be added as a precondition for the inverse NTT.

For the native AVX2 backend, the new output bound for the mulcache forces an explicit zeroization of the mulcache. This is not ideal since the cache is in fact entirely unused, but the performance penalty should be marginal (if the compiler can't eliminate the zeroization in the first place).

@hanno-becker hanno-becker added enhancement New feature or request CBMC labels Mar 24, 2025
@hanno-becker hanno-becker force-pushed the refine_bounds branch 4 times, most recently from 1326e42 to f9b25ee Compare March 24, 2025 11:12
@hanno-becker hanno-becker marked this pull request as ready for review March 24, 2025 12:56
@hanno-becker hanno-becker requested a review from a team as a code owner March 24, 2025 12:56
@hanno-becker hanno-becker added the benchmark this PR should be benchmarked in CI label Mar 24, 2025
Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 12257 cycles 12254 cycles 1.00
ML-KEM-512 encaps 14848 cycles 14846 cycles 1.00
ML-KEM-512 decaps 19725 cycles 19721 cycles 1.00
ML-KEM-768 keypair 21048 cycles 21051 cycles 1.00
ML-KEM-768 encaps 23677 cycles 23679 cycles 1.00
ML-KEM-768 decaps 30515 cycles 30515 cycles 1
ML-KEM-1024 keypair 29977 cycles 29978 cycles 1.00
ML-KEM-1024 encaps 34479 cycles 34478 cycles 1.00
ML-KEM-1024 decaps 43430 cycles 43442 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 9597 cycles 9477 cycles 1.01
ML-KEM-512 encaps 11103 cycles 11217 cycles 0.99
ML-KEM-512 decaps 15370 cycles 15166 cycles 1.01
ML-KEM-768 keypair 16376 cycles 16303 cycles 1.00
ML-KEM-768 encaps 17765 cycles 17790 cycles 1.00
ML-KEM-768 decaps 23604 cycles 23444 cycles 1.01
ML-KEM-1024 keypair 22063 cycles 22001 cycles 1.00
ML-KEM-1024 encaps 24180 cycles 24012 cycles 1.01
ML-KEM-1024 decaps 31927 cycles 31587 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 30097 cycles 30164 cycles 1.00
ML-KEM-512 encaps 35523 cycles 35677 cycles 1.00
ML-KEM-512 decaps 44923 cycles 45084 cycles 1.00
ML-KEM-768 keypair 48474 cycles 48448 cycles 1.00
ML-KEM-768 encaps 55666 cycles 55690 cycles 1.00
ML-KEM-768 decaps 68997 cycles 68876 cycles 1.00
ML-KEM-1024 keypair 74954 cycles 74942 cycles 1.00
ML-KEM-1024 encaps 83393 cycles 83410 cycles 1.00
ML-KEM-1024 decaps 99713 cycles 99614 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 59336 cycles 59330 cycles 1.00
ML-KEM-512 encaps 66928 cycles 66920 cycles 1.00
ML-KEM-512 decaps 86001 cycles 86014 cycles 1.00
ML-KEM-768 keypair 101044 cycles 101087 cycles 1.00
ML-KEM-768 encaps 112086 cycles 112317 cycles 1.00
ML-KEM-768 decaps 139218 cycles 139358 cycles 1.00
ML-KEM-1024 keypair 153559 cycles 153471 cycles 1.00
ML-KEM-1024 encaps 172905 cycles 170285 cycles 1.02
ML-KEM-1024 decaps 210032 cycles 207958 cycles 1.01

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 11636 cycles 11702 cycles 0.99
ML-KEM-512 encaps 13259 cycles 13439 cycles 0.99
ML-KEM-512 decaps 18151 cycles 18310 cycles 0.99
ML-KEM-768 keypair 20132 cycles 20125 cycles 1.00
ML-KEM-768 encaps 21192 cycles 21112 cycles 1.00
ML-KEM-768 decaps 28294 cycles 28224 cycles 1.00
ML-KEM-1024 keypair 27080 cycles 27067 cycles 1.00
ML-KEM-1024 encaps 29207 cycles 29123 cycles 1.00
ML-KEM-1024 decaps 38665 cycles 38639 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 17167 cycles 17264 cycles 0.99
ML-KEM-512 encaps 18926 cycles 19057 cycles 0.99
ML-KEM-512 decaps 24408 cycles 24513 cycles 1.00
ML-KEM-768 keypair 29445 cycles 29552 cycles 1.00
ML-KEM-768 encaps 30852 cycles 30875 cycles 1.00
ML-KEM-768 decaps 38787 cycles 38640 cycles 1.00
ML-KEM-1024 keypair 42726 cycles 42962 cycles 0.99
ML-KEM-1024 encaps 45290 cycles 45604 cycles 0.99
ML-KEM-1024 decaps 55637 cycles 55827 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 16123 cycles 16149 cycles 1.00
ML-KEM-512 encaps 18258 cycles 18247 cycles 1.00
ML-KEM-512 decaps 24788 cycles 24723 cycles 1.00
ML-KEM-768 keypair 27724 cycles 27803 cycles 1.00
ML-KEM-768 encaps 29501 cycles 29387 cycles 1.00
ML-KEM-768 decaps 38979 cycles 38836 cycles 1.00
ML-KEM-1024 keypair 37668 cycles 37578 cycles 1.00
ML-KEM-1024 encaps 40643 cycles 40554 cycles 1.00
ML-KEM-1024 decaps 53079 cycles 53168 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Intel Xeon 3rd gen (c6i)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: f9b25ee Previous: 8dba13b Ratio
ML-KEM-768 encaps 30902 cycles 29387 cycles 1.05

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 36353 cycles 36358 cycles 1.00
ML-KEM-512 encaps 42971 cycles 42959 cycles 1.00
ML-KEM-512 decaps 55977 cycles 55963 cycles 1.00
ML-KEM-768 keypair 59342 cycles 59249 cycles 1.00
ML-KEM-768 encaps 67754 cycles 67706 cycles 1.00
ML-KEM-768 decaps 84988 cycles 84932 cycles 1.00
ML-KEM-1024 keypair 87597 cycles 87560 cycles 1.00
ML-KEM-1024 encaps 98261 cycles 98317 cycles 1.00
ML-KEM-1024 decaps 120160 cycles 120104 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 39011 cycles 39026 cycles 1.00
ML-KEM-512 encaps 47090 cycles 47097 cycles 1.00
ML-KEM-512 decaps 60662 cycles 60745 cycles 1.00
ML-KEM-768 keypair 63305 cycles 63315 cycles 1.00
ML-KEM-768 encaps 74013 cycles 73946 cycles 1.00
ML-KEM-768 decaps 91862 cycles 91861 cycles 1.00
ML-KEM-1024 keypair 94066 cycles 94031 cycles 1.00
ML-KEM-1024 encaps 107373 cycles 107041 cycles 1.00
ML-KEM-1024 decaps 129368 cycles 129724 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 17808 cycles 17771 cycles 1.00
ML-KEM-512 encaps 21034 cycles 21026 cycles 1.00
ML-KEM-512 decaps 27670 cycles 27696 cycles 1.00
ML-KEM-768 keypair 30684 cycles 30710 cycles 1.00
ML-KEM-768 encaps 33605 cycles 33580 cycles 1.00
ML-KEM-768 decaps 43197 cycles 43195 cycles 1.00
ML-KEM-1024 keypair 44319 cycles 44315 cycles 1.00
ML-KEM-1024 encaps 49573 cycles 49591 cycles 1.00
ML-KEM-1024 decaps 62550 cycles 62497 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 47469 cycles 47466 cycles 1.00
ML-KEM-512 encaps 55355 cycles 55344 cycles 1.00
ML-KEM-512 decaps 71584 cycles 71556 cycles 1.00
ML-KEM-768 keypair 77391 cycles 77538 cycles 1.00
ML-KEM-768 encaps 87479 cycles 87559 cycles 1.00
ML-KEM-768 decaps 108019 cycles 108276 cycles 1.00
ML-KEM-1024 keypair 113202 cycles 113314 cycles 1.00
ML-KEM-1024 encaps 126336 cycles 126306 cycles 1.00
ML-KEM-1024 decaps 152907 cycles 152929 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 35922 cycles 35918 cycles 1.00
ML-KEM-512 encaps 40862 cycles 40842 cycles 1.00
ML-KEM-512 decaps 52270 cycles 52306 cycles 1.00
ML-KEM-768 keypair 60047 cycles 60759 cycles 0.99
ML-KEM-768 encaps 66957 cycles 67536 cycles 0.99
ML-KEM-768 decaps 81391 cycles 81425 cycles 1.00
ML-KEM-1024 keypair 89042 cycles 89028 cycles 1.00
ML-KEM-1024 encaps 98838 cycles 98877 cycles 1.00
ML-KEM-1024 decaps 117682 cycles 117716 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 18864 cycles 18877 cycles 1.00
ML-KEM-512 encaps 22344 cycles 22358 cycles 1.00
ML-KEM-512 decaps 29577 cycles 29566 cycles 1.00
ML-KEM-768 keypair 32236 cycles 32250 cycles 1.00
ML-KEM-768 encaps 35628 cycles 35598 cycles 1.00
ML-KEM-768 decaps 45941 cycles 45965 cycles 1.00
ML-KEM-1024 keypair 46491 cycles 46487 cycles 1.00
ML-KEM-1024 encaps 52069 cycles 52080 cycles 1.00
ML-KEM-1024 decaps 65916 cycles 65962 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 29410 cycles 29444 cycles 1.00
ML-KEM-512 encaps 34613 cycles 34680 cycles 1.00
ML-KEM-512 decaps 45213 cycles 45279 cycles 1.00
ML-KEM-768 keypair 50051 cycles 50139 cycles 1.00
ML-KEM-768 encaps 55328 cycles 55224 cycles 1.00
ML-KEM-768 decaps 70158 cycles 70096 cycles 1.00
ML-KEM-1024 keypair 73082 cycles 73098 cycles 1.00
ML-KEM-1024 encaps 81472 cycles 81514 cycles 1.00
ML-KEM-1024 decaps 101379 cycles 101471 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 38966 cycles 39685 cycles 0.98
ML-KEM-512 encaps 44939 cycles 44864 cycles 1.00
ML-KEM-512 decaps 56734 cycles 56842 cycles 1.00
ML-KEM-768 keypair 64234 cycles 64182 cycles 1.00
ML-KEM-768 encaps 72031 cycles 72669 cycles 0.99
ML-KEM-768 decaps 88070 cycles 88056 cycles 1.00
ML-KEM-1024 keypair 96228 cycles 96095 cycles 1.00
ML-KEM-1024 encaps 106307 cycles 106211 cycles 1.00
ML-KEM-1024 decaps 127120 cycles 127057 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 225601 cycles 225593 cycles 1.00
ML-KEM-512 encaps 271839 cycles 271821 cycles 1.00
ML-KEM-512 decaps 346406 cycles 346326 cycles 1.00
ML-KEM-768 keypair 374348 cycles 374216 cycles 1.00
ML-KEM-768 encaps 433877 cycles 433736 cycles 1.00
ML-KEM-768 decaps 532770 cycles 532650 cycles 1.00
ML-KEM-1024 keypair 554398 cycles 554463 cycles 1.00
ML-KEM-1024 encaps 632515 cycles 632521 cycles 1.00
ML-KEM-1024 decaps 754467 cycles 754358 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 52821 cycles 53196 cycles 0.99
ML-KEM-512 encaps 60537 cycles 61239 cycles 0.99
ML-KEM-512 decaps 77877 cycles 77578 cycles 1.00
ML-KEM-768 keypair 89670 cycles 89929 cycles 1.00
ML-KEM-768 encaps 97185 cycles 98860 cycles 0.98
ML-KEM-768 decaps 121486 cycles 123436 cycles 0.98
ML-KEM-1024 keypair 135647 cycles 134458 cycles 1.01
ML-KEM-1024 encaps 148998 cycles 147089 cycles 1.01
ML-KEM-1024 decaps 180865 cycles 180649 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 59326 cycles 59431 cycles 1.00
ML-KEM-512 encaps 68033 cycles 68011 cycles 1.00
ML-KEM-512 decaps 86704 cycles 86649 cycles 1.00
ML-KEM-768 keypair 98815 cycles 98924 cycles 1.00
ML-KEM-768 encaps 110974 cycles 110402 cycles 1.01
ML-KEM-768 decaps 134574 cycles 134811 cycles 1.00
ML-KEM-1024 keypair 148680 cycles 148822 cycles 1.00
ML-KEM-1024 encaps 163724 cycles 163875 cycles 1.00
ML-KEM-1024 decaps 195449 cycles 195498 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks

Details
Benchmark suite Current: ee52c4e Previous: 8dba13b Ratio
ML-KEM-512 keypair 29409 cycles 29440 cycles 1.00
ML-KEM-512 encaps 34618 cycles 34675 cycles 1.00
ML-KEM-512 decaps 45210 cycles 45264 cycles 1.00
ML-KEM-768 keypair 50053 cycles 50138 cycles 1.00
ML-KEM-768 encaps 55322 cycles 55220 cycles 1.00
ML-KEM-768 decaps 70154 cycles 70075 cycles 1.00
ML-KEM-1024 keypair 73076 cycles 73079 cycles 1.00
ML-KEM-1024 encaps 81445 cycles 81517 cycles 1.00
ML-KEM-1024 decaps 101376 cycles 101473 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@hanno-becker hanno-becker removed the benchmark this PR should be benchmarked in CI label Mar 24, 2025
@rod-chapman rod-chapman force-pushed the refine_bounds branch 4 times, most recently from c9ffb66 to b518349 Compare October 7, 2025 12:09
@rod-chapman
Copy link
Copy Markdown
Contributor

rod-chapman commented Oct 7, 2025

Still to do on this PR:

  1. Resolve proof scaling issues with mlk_matvec_mul() when MLKEM_K=3 or 4. May be improved by CBMC 6.8.0 and/or Z3 4.15.3 releases to come.
  2. Re-check post-conditions of AArch64 HOL-Light specification of polyvec_basemul_acc_cached_native_asm_k[234] and re-prove against C contracts to prove that AArch64 asm code yields coefficients bounded by INT16_MAX/2
  3. Re-implement and re-prove x86_64 native implementations of the same.

@hanno-becker
Copy link
Copy Markdown
Contributor Author

@dkostic Now that you have done all the proofs including the basemul. How much work would you expect it to be to adjust the basemul to use a (slightly) lazy reduction strategy? That is, only reduce after computing each polynomial product, but before accumulation.

@hanno-becker
Copy link
Copy Markdown
Contributor Author

@dkostic Would you (and Claude) have time to do this, and at the same time resolve #1421?

@rod-chapman
Copy link
Copy Markdown
Contributor

I hope that CBMC 6.9.0 will unblock progress on this one. I will rebase the branch first and see if the proofs complete OK.

hanno-becker and others added 6 commits May 4, 2026 11:21
- Change mlk_polyvec back to struct { mlk_poly vec[MLKEM_K]; }
- Change mlk_polymat to struct { mlk_polyvec vec[MLKEM_K]; }
- Update all function signatures to use pointer style
- Fix all implementations to use struct member access
- Update tests, benchmarks, and CBMC harnesses
- Add consistent const annotations

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Previously, the base multiplication would assume that one of its inputs
is bound by 4096 in absolute value, but make no assumptions about the
other input ("b-input" henceforth) and its mulcache.

This commit refines the bounds slightly, as follows:
- The b-input is assumed to be bound by MLK_NTT_BOUND in absolute value.
  This comes for free since all values for b _are_ results of the NTT.
- The b-cache-input is assumed to be bound by MLKEM_Q in absolute value.

With those additional bounds in place, it can be showed that the result
of the base multiplication is below INT16_MAX/2 in absolute value.
Accordingly, this can be added as a precondition for the inverse NTT.

Those refined bounds can help in subsequent commits to improve the
reduction strategy inside the inverse NTT.

For the native AVX2 backend, the new output bound for the mulcache
forces an explicit zeroization of the mulcache. This is not ideal
since the cache is in fact entirely unused, but the performance
penalty should be marginal (if the compiler can't eliminate the
zeroization in the first place).

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
Signed-off-by: Rod Chapman <rodchap@amazon.com>
polyvec_basemul_acc_montgomery_cached_asm_k[234]

TODO - Re-check specifications and proof of AArch64
implementations of these 3 functions.

Signed-off-by: Rod Chapman <rodchap@amazon.com>
Signed-off-by: Rod Chapman <rodchap@amazon.com>
- Change mlk_polyvec back to struct `{ mlk_poly vec[MLKEM_K]; }`
- Change mlk_polymat to struct `{ mlk_polyvec vec[MLKEM_K]; }`
- Update all function signatures to use pointer style
- Fix all implementations to use struct member access
- Update tests, benchmarks, and CBMC harnesses
- Add consistent const annotations

Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
@hanno-becker
Copy link
Copy Markdown
Contributor Author

Thanks @rod-chapman! But this also needs significant work on the x86_64 backend.

Signed-off-by: Rod Chapman <rodchap@amazon.com>
Signed-off-by: Rod Chapman <rodchap@amazon.com>
@rod-chapman
Copy link
Copy Markdown
Contributor

Oh yes... the x86 back-end has to brought into line with the AArch64 and C back-ends to produce tighter bounds on the output of basemul, right?

I have re-based and re-running proofs now on the C stuff.

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 4, 2026

CBMC Results (ML-KEM-512)

⚠️ Attention Required

Proof Status Current Previous Change
**TOTAL** ⚠️ 1847s 1312s +40.8%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 - 4s -
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 - 4s -
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 - 3s -
mlk_indcpa_enc ⚠️ 289s 170s +70%
mlk_indcpa_keypair_derand ⚠️ 412s 244s +69%
mlk_polyvec_basemul_acc_montgomery_cached_c ⚠️ 97s 53s +83%
polyvec_basemul_acc_montgomery_cached_native ⚠️ 218s 8s +2625%
Full Results (191 proofs)
Proof Status Current Previous Change
**TOTAL** ⚠️ 1847s 1312s +40.8%
mlk_indcpa_keypair_derand ⚠️ 412s 244s +69%
mlk_indcpa_enc ⚠️ 289s 170s +70%
polyvec_basemul_acc_montgomery_cached_native ⚠️ 218s 8s +2625%
mlk_rej_uniform_c 117s 128s -9%
mlk_polyvec_basemul_acc_montgomery_cached_c ⚠️ 97s 53s +83%
mlk_poly_rej_uniform 32s 33s -3%
poly_ntt_native 28s 24s +17%
mlk_keccak_squeezeblocks_x4 27s 26s +4%
mlk_ntt_layer 27s 37s -27%
mlk_indcpa_dec 20s 15s +33%
mlk_poly_reduce_native 19s 19s +0%
keccakf1600x4_permute_native_x4 17s 16s +6%
mlk_poly_decompress_d10_native 16s 13s +23%
mlk_poly_decompress_d4_native 14s 12s +17%
mlk_polyvec_add 11s 11s +0%
mlk_keccak_squeezeblocks 10s 8s +25%
mlk_matvec_mul 10s 2s +400%
mlk_poly_frommsg 9s 8s +12%
mlk_ntt_butterfly_block 8s 8s +0%
mlk_poly_frombytes_native 8s 9s -11%
mlk_poly_rej_uniform_x4 8s 6s +33%
mlk_polymat_permute_bitrev_to_custom 8s 3s +167%
mlk_fqmul 7s 16s -56%
mlk_keccak_absorb_once_x4 7s 5s +40%
mlk_keccak_squeeze_once 7s 9s -22%
mlk_poly_ntt 7s 8s -12%
mlk_keccak_absorb_once 5s 5s +0%
mlk_keccakf1600_permute_c 5s 7s -29%
mlk_poly_mulcache_compute_c 5s 5s +0%
mlk_poly_ntt_c 5s 4s +25%
mlk_scalar_compress_d1 5s 4s +25%
poly_compress_d10_native_x86_64 5s 3s +67%
poly_decompress_d10_native_x86_64 5s 5s +0%
kem_enc 4s 1s +300%
mlk_ct_sel_int16 4s 2s +100%
mlk_poly_cbd_eta2 4s 4s +0%
mlk_poly_compress_d10_c 4s 3s +33%
mlk_poly_compress_d4_c 4s 2s +100%
mlk_poly_frombytes_c 4s 3s +33%
mlk_poly_getnoise_eta1_4x 4s 4s +0%
mlk_poly_invntt_tomont_c 4s 2s +100%
mlk_poly_sub 4s 2s +100%
mlk_polyvec_basemul_acc_montgomery_cached 4s 3s +33%
mlk_polyvec_tomont 4s 2s +100%
mlk_scalar_compress_d5 4s 2s +100%
mlk_scalar_decompress_d10 4s 5s -20%
mlk_sha3_256 4s 2s +100%
mlk_shake256x4 4s 5s -20%
mlk_value_barrier_u8 4s 1s +300%
nttunpack_native_x86_64 4s 3s +33%
poly_compress_d4_native_x86_64 4s 1s +300%
poly_decompress_d4_native_x86_64 4s 3s +33%
poly_frombytes_native_x86_64 4s 6s -33%
poly_getnoise_eta1122_4x_native 4s 2s +100%
poly_tobytes_native_aarch64 4s 2s +100%
poly_tobytes_native_x86_64 4s 1s +300%
poly_tomont_native_aarch64 4s 1s +300%
poly_tomont_native_x86_64 4s 1s +300%
keccak_f1600_x1_native_aarch64 3s 2s +50%
kem_check_sk 3s 2s +50%
kem_dec 3s 5s -40%
kem_keypair 3s 2s +50%
mlk_ct_cmask_neg_i16 3s 3s +0%
mlk_ct_get_optblocker_u32 3s 1s +200%
mlk_ct_get_optblocker_u8 3s 3s +0%
mlk_ct_memcmp 3s 1s +200%
mlk_gen_matrix 3s 1s +200%
mlk_gen_matrix_serial 3s 2s +50%
mlk_invntt_layer 3s 7s -57%
mlk_keccakf1600_extract_bytes 3s 4s -25%
mlk_keccakf1600_xor_bytes (big endian) 3s 3s +0%
mlk_keccakf1600x4_extract_bytes_c 3s 2s +50%
mlk_keccakf1600x4_xor_bytes 3s 2s +50%
mlk_montgomery_reduce 3s 2s +50%
mlk_poly_cbd_eta1 3s 4s -25%
mlk_poly_compress_d11 3s 2s +50%
mlk_poly_compress_d11_native 3s 2s +50%
mlk_poly_compress_d5 3s 2s +50%
mlk_poly_compress_du 3s 1s +200%
mlk_poly_decompress_d10 3s 2s +50%
mlk_poly_decompress_d11 3s 2s +50%
mlk_poly_decompress_d4 3s 2s +50%
mlk_poly_frombytes 3s 3s +0%
mlk_poly_getnoise_eta1122_4x 3s 2s +50%
mlk_poly_invntt_tomont 3s 1s +200%
mlk_poly_reduce 3s 2s +50%
mlk_poly_reduce_c 3s 2s +50%
mlk_poly_tomont 3s 3s +0%
mlk_polyvec_permute_bitrev_to_custom 3s 4s -25%
mlk_polyvec_permute_bitrev_to_custom_native 3s 1s +200%
mlk_polyvec_reduce 3s 3s +0%
mlk_scalar_decompress_d5 3s 1s +200%
mlk_value_barrier_i32 3s 3s +0%
ntt_native_aarch64 3s 5s -40%
poly_invntt_tomont_native 3s 3s +0%
poly_mulcache_compute_native_aarch64 3s 4s -25%
poly_reduce_native_x86_64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 3s 2s +50%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 3s +0%
rej_uniform_native 3s 2s +50%
rej_uniform_native_x86_64 3s 4s -25%
intt_native_x86_64 2s 3s -33%
keccak_f1600_x4_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_avx2 2s 3s -33%
keccakf1600_permute_native 2s 2s +0%
keccakf1600x4_xor_bytes_native 2s 2s +0%
kem_check_pk 2s 2s +0%
kem_keypair_derand 2s 3s -33%
mlk_barrett_reduce 2s 2s +0%
mlk_ct_cmask_nonzero_u16 2s 1s +100%
mlk_ct_cmask_nonzero_u8 2s 3s -33%
mlk_ct_cmov_zero 2s 2s +0%
mlk_ct_get_optblocker_i32 2s 2s +0%
mlk_ct_sel_uint8 2s 1s +100%
mlk_keccakf1600_permute 2s 2s +0%
mlk_keccakf1600x4_permute 2s 2s +0%
mlk_keypair_getnoise_eta1 2s 2s +0%
mlk_poly_compress_d10 2s 2s +0%
mlk_poly_compress_d10_native 2s 4s -50%
mlk_poly_compress_d4 2s 1s +100%
mlk_poly_compress_d4_native 2s 2s +0%
mlk_poly_compress_d5_c 2s 2s +0%
mlk_poly_compress_dv 2s 5s -60%
mlk_poly_decompress_d10_c 2s 2s +0%
mlk_poly_decompress_d11_c 2s 3s -33%
mlk_poly_decompress_d11_native 2s 1s +100%
mlk_poly_decompress_d4_c 2s 3s -33%
mlk_poly_decompress_d5 2s 4s -50%
mlk_poly_decompress_d5_c 2s 1s +100%
mlk_poly_decompress_d5_native 2s 1s +100%
mlk_poly_getnoise_eta1_4x_native 2s 1s +100%
mlk_poly_mulcache_compute 2s 2s +0%
mlk_poly_mulcache_compute_native 2s 3s -33%
mlk_poly_tobytes_native 2s 3s -33%
mlk_poly_tomont_c 2s 3s -33%
mlk_poly_tomont_native 2s 3s -33%
mlk_poly_tomsg 2s 2s +0%
mlk_polyvec_compress_du 2s 4s -50%
mlk_polyvec_decompress_du 2s 2s +0%
mlk_polyvec_frombytes 2s 3s -33%
mlk_polyvec_mulcache_compute 2s 4s -50%
mlk_polyvec_ntt 2s 2s +0%
mlk_polyvec_tobytes 2s 2s +0%
mlk_rej_uniform 2s 2s +0%
mlk_scalar_compress_d10 2s 3s -33%
mlk_scalar_compress_d4 2s 2s +0%
mlk_scalar_decompress_d11 2s 2s +0%
mlk_scalar_decompress_d4 2s 3s -33%
mlk_sha3_512 2s 3s -33%
mlk_shake128_squeezeblocks 2s 3s -33%
mlk_shake128x4_absorb_once 2s 2s +0%
mlk_shake256 2s 2s +0%
mlk_value_barrier_u32 2s 3s -33%
poly_compress_d11_native_x86_64 2s 2s +0%
poly_compress_d5_native_x86_64 2s 2s +0%
poly_decompress_d11_native_x86_64 2s 1s +100%
poly_mulcache_compute_native_x86_64 2s 4s -50%
poly_reduce_native_aarch64 2s 2s +0%
rej_uniform_native_aarch64 2s 2s +0%
sys_check_capability 2s 3s -33%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 - 4s -
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 - 4s -
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 - 3s -
intt_native_aarch64 1s 4s -75%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
keccakf1600x4_extract_bytes_native 1s 2s -50%
kem_enc_derand 1s 2s -50%
mlk_check_pct 1s 3s -67%
mlk_enc_getnoise_eta1_eta2 1s 1s +0%
mlk_keccakf1600_extract_bytes (big endian) 1s 1s +0%
mlk_keccakf1600_xor_bytes 1s 4s -75%
mlk_keccakf1600x4_extract_bytes 1s 3s -67%
mlk_keccakf1600x4_xor_bytes_c 1s 2s -50%
mlk_poly_add 1s 2s -50%
mlk_poly_compress_d11_c 1s 2s -50%
mlk_poly_compress_d5_native 1s 2s -50%
mlk_poly_decompress_du 1s 1s +0%
mlk_poly_decompress_dv 1s 2s -50%
mlk_poly_getnoise_eta2 1s 4s -75%
mlk_poly_tobytes 1s 3s -67%
mlk_poly_tobytes_c 1s 3s -67%
mlk_polyvec_invntt_tomont 1s 5s -80%
mlk_scalar_compress_d11 1s 1s +0%
mlk_scalar_signed_to_unsigned_q 1s 2s -50%
mlk_shake128_absorb_once 1s 2s -50%
mlk_shake128x4_squeezeblocks 1s 1s +0%
ntt_native_x86_64 1s 2s -50%
poly_decompress_d5_native_x86_64 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 4, 2026

CBMC Results (ML-KEM-768)

⚠️ Attention Required

Proof Status Current Previous Change
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 - 2s -
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 - 3s -
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 - 2s -
polyvec_basemul_acc_montgomery_cached_native - 20s -
mlk_polyvec_basemul_acc_montgomery_cached_c ⚠️ 177s 64s +177%
Full Results (191 proofs)
Proof Status Current Previous Change
**TOTAL** 1364s 1561s -12.6%
mlk_indcpa_keypair_derand 212s 234s -9%
mlk_polyvec_basemul_acc_montgomery_cached_c ⚠️ 177s 64s +177%
mlk_indcpa_enc 152s 210s -28%
mlk_rej_uniform_c 129s 173s -25%
mlk_poly_rej_uniform 32s 38s -16%
mlk_ntt_layer 28s 48s -42%
mlk_indcpa_dec 26s 18s +44%
mlk_keccak_squeezeblocks_x4 25s 34s -26%
poly_ntt_native 25s 34s -26%
mlk_poly_reduce_native 19s 24s -21%
mlk_poly_decompress_d4_native 17s 18s -6%
keccakf1600x4_permute_native_x4 15s 19s -21%
mlk_poly_decompress_d10_native 15s 20s -25%
mlk_ntt_butterfly_block 9s 8s +12%
mlk_polyvec_add 9s 12s -25%
mlk_poly_frommsg 8s 14s -43%
mlk_poly_ntt 8s 9s -11%
mlk_keccak_squeeze_once 7s 12s -42%
mlk_keccak_squeezeblocks 7s 11s -36%
mlk_poly_frombytes_native 7s 11s -36%
mlk_poly_rej_uniform_x4 7s 8s -12%
poly_frombytes_native_x86_64 7s 7s +0%
mlk_fqmul 6s 25s -76%
mlk_keccak_absorb_once_x4 6s 7s -14%
mlk_poly_ntt_c 6s 6s +0%
mlk_polyvec_ntt 6s 1s +500%
poly_compress_d10_native_x86_64 6s 3s +100%
kem_dec 5s 7s -29%
mlk_gen_matrix 5s 3s +67%
nttunpack_native_x86_64 5s 4s +25%
poly_decompress_d10_native_x86_64 5s 5s +0%
rej_uniform_native_x86_64 5s 7s -29%
kem_check_sk 4s 2s +100%
kem_enc_derand 4s 4s +0%
mlk_ct_cmov_zero 4s 4s +0%
mlk_ct_get_optblocker_i32 4s 1s +300%
mlk_gen_matrix_serial 4s 3s +33%
mlk_invntt_layer 4s 9s -56%
mlk_keccak_absorb_once 4s 7s -43%
mlk_keccakf1600_permute 4s 1s +300%
mlk_keccakf1600_permute_c 4s 6s -33%
mlk_poly_cbd_eta2 4s 1s +300%
mlk_poly_compress_d10_c 4s 7s -43%
mlk_poly_compress_d4_native 4s 4s +0%
mlk_poly_decompress_d11_c 4s 2s +100%
mlk_poly_decompress_d5_native 4s 1s +300%
mlk_poly_getnoise_eta1_4x 4s 6s -33%
mlk_poly_getnoise_eta2 4s 2s +100%
mlk_poly_invntt_tomont_c 4s 4s +0%
mlk_poly_mulcache_compute_c 4s 5s -20%
mlk_poly_reduce 4s 2s +100%
mlk_poly_tomont 4s 4s +0%
mlk_sha3_512 4s 2s +100%
poly_decompress_d11_native_x86_64 4s 2s +100%
poly_decompress_d4_native_x86_64 4s 7s -43%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 4s 4s +0%
rej_uniform_native 4s 4s +0%
rej_uniform_native_aarch64 4s 4s +0%
intt_native_x86_64 3s 4s -25%
keccak_f1600_x1_native_aarch64_v84a 3s 1s +200%
keccak_f1600_x4_native_avx2 3s 4s -25%
keccakf1600x4_extract_bytes_native 3s 2s +50%
kem_check_pk 3s 4s -25%
kem_enc 3s 2s +50%
mlk_barrett_reduce 3s 2s +50%
mlk_check_pct 3s 1s +200%
mlk_ct_get_optblocker_u8 3s 4s -25%
mlk_ct_memcmp 3s 3s +0%
mlk_ct_sel_int16 3s 2s +50%
mlk_ct_sel_uint8 3s 2s +50%
mlk_keccakf1600x4_extract_bytes_c 3s 3s +0%
mlk_keypair_getnoise_eta1 3s 5s -40%
mlk_matvec_mul 3s 3s +0%
mlk_poly_compress_d10 3s 2s +50%
mlk_poly_compress_d11 3s 1s +200%
mlk_poly_compress_d4_c 3s 3s +0%
mlk_poly_compress_d5_c 3s 2s +50%
mlk_poly_decompress_d4_c 3s 3s +0%
mlk_poly_decompress_d5_c 3s 3s +0%
mlk_poly_frombytes_c 3s 3s +0%
mlk_poly_mulcache_compute_native 3s 3s +0%
mlk_poly_tomont_c 3s 4s -25%
mlk_poly_tomont_native 3s 2s +50%
mlk_polyvec_frombytes 3s 3s +0%
mlk_polyvec_mulcache_compute 3s 2s +50%
mlk_scalar_compress_d5 3s 2s +50%
mlk_scalar_decompress_d5 3s 6s -50%
mlk_shake128x4_absorb_once 3s 3s +0%
ntt_native_aarch64 3s 3s +0%
poly_compress_d4_native_x86_64 3s 3s +0%
poly_compress_d5_native_x86_64 3s 2s +50%
poly_mulcache_compute_native_aarch64 3s 2s +50%
poly_mulcache_compute_native_x86_64 3s 3s +0%
poly_tobytes_native_aarch64 3s 2s +50%
poly_tobytes_native_x86_64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 2s +50%
sys_check_capability 3s 3s +0%
intt_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64 2s 1s +100%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 3s -33%
keccakf1600_permute_native 2s 1s +100%
kem_keypair 2s 3s -33%
kem_keypair_derand 2s 5s -60%
mlk_ct_cmask_neg_i16 2s 4s -50%
mlk_ct_cmask_nonzero_u16 2s 2s +0%
mlk_enc_getnoise_eta1_eta2 2s 5s -60%
mlk_keccakf1600_xor_bytes 2s 1s +100%
mlk_keccakf1600_xor_bytes (big endian) 2s 2s +0%
mlk_keccakf1600x4_permute 2s 1s +100%
mlk_keccakf1600x4_xor_bytes 2s 2s +0%
mlk_montgomery_reduce 2s 2s +0%
mlk_poly_cbd_eta1 2s 3s -33%
mlk_poly_compress_d10_native 2s 2s +0%
mlk_poly_compress_d11_c 2s 4s -50%
mlk_poly_compress_d5 2s 3s -33%
mlk_poly_decompress_d10 2s 4s -50%
mlk_poly_decompress_d11 2s 2s +0%
mlk_poly_decompress_d11_native 2s 4s -50%
mlk_poly_decompress_d4 2s 2s +0%
mlk_poly_decompress_du 2s 4s -50%
mlk_poly_frombytes 2s 4s -50%
mlk_poly_getnoise_eta1_4x_native 2s 3s -33%
mlk_poly_mulcache_compute 2s 4s -50%
mlk_poly_reduce_c 2s 3s -33%
mlk_poly_sub 2s 3s -33%
mlk_poly_tobytes 2s 1s +100%
mlk_poly_tobytes_c 2s 3s -33%
mlk_polymat_permute_bitrev_to_custom 2s 5s -60%
mlk_polyvec_basemul_acc_montgomery_cached 2s 3s -33%
mlk_polyvec_compress_du 2s 3s -33%
mlk_polyvec_decompress_du 2s 4s -50%
mlk_polyvec_permute_bitrev_to_custom_native 2s 3s -33%
mlk_polyvec_reduce 2s 2s +0%
mlk_rej_uniform 2s 3s -33%
mlk_scalar_compress_d1 2s 3s -33%
mlk_scalar_compress_d10 2s 4s -50%
mlk_scalar_compress_d11 2s 2s +0%
mlk_scalar_compress_d4 2s 1s +100%
mlk_scalar_decompress_d10 2s 2s +0%
mlk_scalar_decompress_d11 2s 2s +0%
mlk_scalar_decompress_d4 2s 2s +0%
mlk_sha3_256 2s 3s -33%
mlk_shake128_squeezeblocks 2s 2s +0%
mlk_shake128x4_squeezeblocks 2s 1s +100%
mlk_shake256 2s 2s +0%
mlk_shake256x4 2s 3s -33%
mlk_value_barrier_u32 2s 4s -50%
ntt_native_x86_64 2s 4s -50%
poly_getnoise_eta1122_4x_native 2s 3s -33%
poly_reduce_native_aarch64 2s 2s +0%
poly_tomont_native_x86_64 2s 4s -50%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 - 2s -
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 - 3s -
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 - 2s -
polyvec_basemul_acc_montgomery_cached_native - 20s -
keccakf1600x4_xor_bytes_native 1s 1s +0%
mlk_ct_cmask_nonzero_u8 1s 2s -50%
mlk_ct_get_optblocker_u32 1s 3s -67%
mlk_keccakf1600_extract_bytes 1s 3s -67%
mlk_keccakf1600_extract_bytes (big endian) 1s 2s -50%
mlk_keccakf1600x4_extract_bytes 1s 2s -50%
mlk_keccakf1600x4_xor_bytes_c 1s 3s -67%
mlk_poly_add 1s 2s -50%
mlk_poly_compress_d11_native 1s 2s -50%
mlk_poly_compress_d4 1s 3s -67%
mlk_poly_compress_d5_native 1s 4s -75%
mlk_poly_compress_du 1s 1s +0%
mlk_poly_compress_dv 1s 1s +0%
mlk_poly_decompress_d10_c 1s 2s -50%
mlk_poly_decompress_d5 1s 3s -67%
mlk_poly_decompress_dv 1s 2s -50%
mlk_poly_getnoise_eta1122_4x 1s 2s -50%
mlk_poly_invntt_tomont 1s 2s -50%
mlk_poly_tobytes_native 1s 1s +0%
mlk_poly_tomsg 1s 2s -50%
mlk_polyvec_invntt_tomont 1s 2s -50%
mlk_polyvec_permute_bitrev_to_custom 1s 1s +0%
mlk_polyvec_tobytes 1s 3s -67%
mlk_polyvec_tomont 1s 2s -50%
mlk_scalar_signed_to_unsigned_q 1s 2s -50%
mlk_shake128_absorb_once 1s 4s -75%
mlk_value_barrier_i32 1s 4s -75%
mlk_value_barrier_u8 1s 2s -50%
poly_compress_d11_native_x86_64 1s 3s -67%
poly_decompress_d5_native_x86_64 1s 2s -50%
poly_invntt_tomont_native 1s 2s -50%
poly_reduce_native_x86_64 1s 4s -75%
poly_tomont_native_aarch64 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 4, 2026

CBMC Results (ML-KEM-1024)

⚠️ Attention Required

Proof Status Current Previous Change
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 - 3s -
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 - 4s -
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 - 2s -
polyvec_basemul_acc_montgomery_cached_native - 39s -
mlk_indcpa_dec ⚠️ 21s 10s +110%
mlk_indcpa_enc ⚠️ 303s 145s +109%
mlk_polyvec_basemul_acc_montgomery_cached_c ⚠️ 133s 71s +87%
Full Results (191 proofs)
Proof Status Current Previous Change
**TOTAL** 1404s 1233s +13.9%
mlk_indcpa_enc ⚠️ 303s 145s +109%
mlk_polyvec_basemul_acc_montgomery_cached_c ⚠️ 133s 71s +87%
mlk_indcpa_keypair_derand 126s 132s -5%
mlk_rej_uniform_c 111s 143s -22%
mlk_ntt_layer 31s 29s +7%
mlk_poly_rej_uniform 31s 30s +3%
mlk_keccak_squeezeblocks_x4 26s 24s +8%
poly_ntt_native 26s 22s +18%
mlk_indcpa_dec ⚠️ 21s 10s +110%
mlk_poly_reduce_native 21s 22s -5%
keccakf1600x4_permute_native_x4 17s 19s -11%
mlk_matvec_mul 13s 4s +225%
mlk_poly_decompress_d11_native 13s 13s +0%
mlk_poly_decompress_d5_native 13s 16s -19%
kem_dec 10s 5s +100%
mlk_poly_ntt 10s 5s +100%
mlk_keccak_squeeze_once 9s 9s +0%
mlk_poly_frommsg 9s 8s +12%
mlk_polyvec_add 9s 11s -18%
mlk_keccak_squeezeblocks 8s 7s +14%
mlk_poly_frombytes_native 8s 7s +14%
mlk_poly_ntt_c 8s 3s +167%
mlk_fqmul 7s 14s -50%
mlk_ntt_butterfly_block 7s 9s -22%
mlk_poly_compress_d11_c 7s 6s +17%
mlk_poly_rej_uniform_x4 7s 5s +40%
mlk_shake128x4_absorb_once 7s 2s +250%
kem_check_pk 6s 1s +500%
mlk_keccak_absorb_once 6s 4s +50%
mlk_keccakf1600_permute_c 6s 7s -14%
poly_decompress_d5_native_x86_64 6s 4s +50%
mlk_ct_cmask_nonzero_u8 5s 4s +25%
mlk_invntt_layer 5s 8s -38%
mlk_keccak_absorb_once_x4 5s 6s -17%
mlk_keccakf1600x4_extract_bytes_c 5s 2s +150%
mlk_keypair_getnoise_eta1 5s 1s +400%
mlk_poly_compress_d4_native 5s 2s +150%
mlk_shake128_absorb_once 5s 3s +67%
mlk_shake256x4 5s 3s +67%
intt_native_x86_64 4s 1s +300%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 3s +33%
mlk_ct_cmask_nonzero_u16 4s 2s +100%
mlk_gen_matrix 4s 6s -33%
mlk_poly_cbd_eta2 4s 2s +100%
mlk_poly_decompress_d4_native 4s 1s +300%
mlk_poly_decompress_d5_c 4s 2s +100%
mlk_poly_getnoise_eta2 4s 2s +100%
mlk_poly_invntt_tomont 4s 4s +0%
mlk_poly_invntt_tomont_c 4s 3s +33%
mlk_poly_reduce_c 4s 4s +0%
mlk_poly_tobytes_native 4s 5s -20%
mlk_polyvec_basemul_acc_montgomery_cached 4s 3s +33%
mlk_polyvec_ntt 4s 4s +0%
mlk_scalar_decompress_d10 4s 2s +100%
nttunpack_native_x86_64 4s 3s +33%
poly_compress_d5_native_x86_64 4s 2s +100%
poly_frombytes_native_x86_64 4s 5s -20%
poly_getnoise_eta1122_4x_native 4s 4s +0%
poly_mulcache_compute_native_aarch64 4s 4s +0%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccakf1600x4_xor_bytes_native 3s 1s +200%
kem_enc_derand 3s 4s -25%
kem_keypair 3s 1s +200%
mlk_ct_cmask_neg_i16 3s 2s +50%
mlk_ct_cmov_zero 3s 5s -40%
mlk_ct_memcmp 3s 2s +50%
mlk_enc_getnoise_eta1_eta2 3s 2s +50%
mlk_gen_matrix_serial 3s 5s -40%
mlk_keccakf1600x4_extract_bytes 3s 1s +200%
mlk_poly_cbd_eta1 3s 4s -25%
mlk_poly_compress_d10 3s 2s +50%
mlk_poly_compress_d10_c 3s 2s +50%
mlk_poly_compress_d10_native 3s 1s +200%
mlk_poly_compress_d11_native 3s 3s +0%
mlk_poly_compress_d4 3s 1s +200%
mlk_poly_compress_d4_c 3s 5s -40%
mlk_poly_compress_d5_c 3s 2s +50%
mlk_poly_compress_dv 3s 2s +50%
mlk_poly_decompress_d10 3s 2s +50%
mlk_poly_decompress_d11 3s 3s +0%
mlk_poly_decompress_d5 3s 5s -40%
mlk_poly_decompress_dv 3s 2s +50%
mlk_poly_getnoise_eta1_4x 3s 2s +50%
mlk_poly_mulcache_compute 3s 1s +200%
mlk_poly_mulcache_compute_c 3s 3s +0%
mlk_poly_tomont_native 3s 2s +50%
mlk_polymat_permute_bitrev_to_custom 3s 8s -62%
mlk_polyvec_decompress_du 3s 1s +200%
mlk_polyvec_invntt_tomont 3s 2s +50%
mlk_rej_uniform 3s 2s +50%
mlk_scalar_compress_d1 3s 2s +50%
mlk_scalar_compress_d10 3s 2s +50%
mlk_scalar_decompress_d11 3s 2s +50%
mlk_scalar_decompress_d4 3s 1s +200%
mlk_scalar_decompress_d5 3s 3s +0%
mlk_shake256 3s 3s +0%
poly_compress_d10_native_x86_64 3s 1s +200%
poly_compress_d11_native_x86_64 3s 2s +50%
poly_compress_d4_native_x86_64 3s 1s +200%
poly_decompress_d11_native_x86_64 3s 6s -50%
poly_invntt_tomont_native 3s 3s +0%
poly_mulcache_compute_native_x86_64 3s 3s +0%
poly_reduce_native_x86_64 3s 2s +50%
poly_tobytes_native_aarch64 3s 3s +0%
polyvec_basemul_acc_montgomery_cached_k4_native_aarch64 3s 2s +50%
rej_uniform_native 3s 3s +0%
rej_uniform_native_x86_64 3s 5s -40%
intt_native_aarch64 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v84a 2s 2s +0%
keccakf1600x4_extract_bytes_native 2s 2s +0%
kem_check_sk 2s 2s +0%
kem_enc 2s 2s +0%
kem_keypair_derand 2s 3s -33%
mlk_ct_get_optblocker_i32 2s 4s -50%
mlk_ct_get_optblocker_u32 2s 1s +100%
mlk_ct_get_optblocker_u8 2s 2s +0%
mlk_ct_sel_int16 2s 1s +100%
mlk_keccakf1600_extract_bytes 2s 2s +0%
mlk_keccakf1600_permute 2s 2s +0%
mlk_keccakf1600x4_xor_bytes 2s 4s -50%
mlk_poly_add 2s 3s -33%
mlk_poly_compress_d11 2s 3s -33%
mlk_poly_compress_d5_native 2s 4s -50%
mlk_poly_compress_du 2s 1s +100%
mlk_poly_decompress_d11_c 2s 4s -50%
mlk_poly_decompress_d4_c 2s 2s +0%
mlk_poly_decompress_du 2s 2s +0%
mlk_poly_getnoise_eta1122_4x 2s 2s +0%
mlk_poly_reduce 2s 1s +100%
mlk_poly_tobytes_c 2s 3s -33%
mlk_poly_tomont_c 2s 3s -33%
mlk_polyvec_compress_du 2s 2s +0%
mlk_polyvec_frombytes 2s 4s -50%
mlk_polyvec_mulcache_compute 2s 3s -33%
mlk_polyvec_permute_bitrev_to_custom 2s 2s +0%
mlk_polyvec_permute_bitrev_to_custom_native 2s 3s -33%
mlk_polyvec_tomont 2s 2s +0%
mlk_scalar_compress_d11 2s 1s +100%
mlk_scalar_compress_d4 2s 4s -50%
mlk_scalar_compress_d5 2s 1s +100%
mlk_sha3_256 2s 1s +100%
mlk_sha3_512 2s 2s +0%
mlk_shake128x4_squeezeblocks 2s 4s -50%
mlk_value_barrier_u32 2s 2s +0%
ntt_native_x86_64 2s 1s +100%
poly_decompress_d10_native_x86_64 2s 3s -33%
poly_decompress_d4_native_x86_64 2s 2s +0%
poly_reduce_native_aarch64 2s 3s -33%
poly_tobytes_native_x86_64 2s 2s +0%
poly_tomont_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_aarch64 2s 2s +0%
polyvec_basemul_acc_montgomery_cached_k2_native_x86_64 - 3s -
polyvec_basemul_acc_montgomery_cached_k3_native_x86_64 - 4s -
polyvec_basemul_acc_montgomery_cached_k4_native_x86_64 - 2s -
polyvec_basemul_acc_montgomery_cached_native - 39s -
keccak_f1600_x1_native_aarch64 1s 1s +0%
keccak_f1600_x4_native_avx2 1s 2s -50%
keccakf1600_permute_native 1s 3s -67%
mlk_barrett_reduce 1s 2s -50%
mlk_check_pct 1s 2s -50%
mlk_ct_sel_uint8 1s 2s -50%
mlk_keccakf1600_extract_bytes (big endian) 1s 2s -50%
mlk_keccakf1600_xor_bytes 1s 2s -50%
mlk_keccakf1600_xor_bytes (big endian) 1s 1s +0%
mlk_keccakf1600x4_permute 1s 2s -50%
mlk_keccakf1600x4_xor_bytes_c 1s 2s -50%
mlk_montgomery_reduce 1s 4s -75%
mlk_poly_compress_d5 1s 2s -50%
mlk_poly_decompress_d10_c 1s 2s -50%
mlk_poly_decompress_d10_native 1s 2s -50%
mlk_poly_decompress_d4 1s 3s -67%
mlk_poly_frombytes 1s 2s -50%
mlk_poly_frombytes_c 1s 2s -50%
mlk_poly_getnoise_eta1_4x_native 1s 3s -67%
mlk_poly_mulcache_compute_native 1s 3s -67%
mlk_poly_sub 1s 3s -67%
mlk_poly_tobytes 1s 2s -50%
mlk_poly_tomont 1s 3s -67%
mlk_poly_tomsg 1s 1s +0%
mlk_polyvec_reduce 1s 2s -50%
mlk_polyvec_tobytes 1s 2s -50%
mlk_scalar_signed_to_unsigned_q 1s 2s -50%
mlk_shake128_squeezeblocks 1s 3s -67%
mlk_value_barrier_i32 1s 2s -50%
mlk_value_barrier_u8 1s 3s -67%
ntt_native_aarch64 1s 3s -67%
poly_tomont_native_x86_64 1s 2s -50%
polyvec_basemul_acc_montgomery_cached_k3_native_aarch64 1s 2s -50%
rej_uniform_native_aarch64 1s 2s -50%
sys_check_capability 1s 2s -50%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark this PR should be benchmarked in CI CBMC DO-NOT-MERGE enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants