Conversation
00b155f to
3819863
Compare
CBMC Results (ML-DSA-87)Full Results (174 proofs)
|
CBMC Results (ML-DSA-44)Full Results (174 proofs)
|
CBMC Results (ML-DSA-65)Full Results (174 proofs)
|
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46204 cycles |
46203 cycles |
1.00 |
ML-DSA-44 sign |
131289 cycles |
131278 cycles |
1.00 |
ML-DSA-44 verify |
47763 cycles |
47762 cycles |
1.00 |
ML-DSA-65 keypair |
81015 cycles |
81014 cycles |
1.00 |
ML-DSA-65 sign |
215763 cycles |
215783 cycles |
1.00 |
ML-DSA-65 verify |
80054 cycles |
80051 cycles |
1.00 |
ML-DSA-87 keypair |
132159 cycles |
132161 cycles |
1.00 |
ML-DSA-87 sign |
276888 cycles |
276854 cycles |
1.00 |
ML-DSA-87 verify |
130426 cycles |
130402 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
114189 cycles |
114160 cycles |
1.00 |
ML-DSA-44 sign |
418072 cycles |
417949 cycles |
1.00 |
ML-DSA-44 verify |
122294 cycles |
122254 cycles |
1.00 |
ML-DSA-65 keypair |
195495 cycles |
195504 cycles |
1.00 |
ML-DSA-65 sign |
682472 cycles |
682465 cycles |
1.00 |
ML-DSA-65 verify |
197737 cycles |
197733 cycles |
1.00 |
ML-DSA-87 keypair |
322648 cycles |
322653 cycles |
1.00 |
ML-DSA-87 sign |
864619 cycles |
864668 cycles |
1.00 |
ML-DSA-87 verify |
328624 cycles |
328682 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
34471 cycles |
34561 cycles |
1.00 |
ML-DSA-44 sign |
120257 cycles |
119604 cycles |
1.01 |
ML-DSA-44 verify |
38102 cycles |
38161 cycles |
1.00 |
ML-DSA-65 keypair |
61645 cycles |
61342 cycles |
1.00 |
ML-DSA-65 sign |
202965 cycles |
201886 cycles |
1.01 |
ML-DSA-65 verify |
62950 cycles |
63038 cycles |
1.00 |
ML-DSA-87 keypair |
94655 cycles |
93985 cycles |
1.01 |
ML-DSA-87 sign |
237727 cycles |
239107 cycles |
0.99 |
ML-DSA-87 verify |
94851 cycles |
96550 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
237495 cycles |
229189 cycles |
1.04 |
ML-DSA-44 sign |
617962 cycles |
646441 cycles |
0.96 |
ML-DSA-44 verify |
221067 cycles |
226888 cycles |
0.97 |
ML-DSA-65 keypair |
392612 cycles |
411260 cycles |
0.95 |
ML-DSA-65 sign |
1045982 cycles |
1058663 cycles |
0.99 |
ML-DSA-65 verify |
377333 cycles |
393295 cycles |
0.96 |
ML-DSA-87 keypair |
648120 cycles |
682350 cycles |
0.95 |
ML-DSA-87 sign |
1340435 cycles |
1396069 cycles |
0.96 |
ML-DSA-87 verify |
621700 cycles |
651014 cycles |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
93617 cycles |
93653 cycles |
1.00 |
ML-DSA-44 sign |
333332 cycles |
333354 cycles |
1.00 |
ML-DSA-44 verify |
99744 cycles |
99709 cycles |
1.00 |
ML-DSA-65 keypair |
160092 cycles |
160242 cycles |
1.00 |
ML-DSA-65 sign |
545851 cycles |
546031 cycles |
1.00 |
ML-DSA-65 verify |
160873 cycles |
160833 cycles |
1.00 |
ML-DSA-87 keypair |
268252 cycles |
267347 cycles |
1.00 |
ML-DSA-87 sign |
707830 cycles |
706548 cycles |
1.00 |
ML-DSA-87 verify |
270627 cycles |
270921 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
69051 cycles |
69255 cycles |
1.00 |
ML-DSA-44 sign |
187895 cycles |
188098 cycles |
1.00 |
ML-DSA-44 verify |
69083 cycles |
69380 cycles |
1.00 |
ML-DSA-65 keypair |
119654 cycles |
120115 cycles |
1.00 |
ML-DSA-65 sign |
299489 cycles |
301522 cycles |
0.99 |
ML-DSA-65 verify |
115283 cycles |
115505 cycles |
1.00 |
ML-DSA-87 keypair |
203725 cycles |
204908 cycles |
0.99 |
ML-DSA-87 sign |
392930 cycles |
396816 cycles |
0.99 |
ML-DSA-87 verify |
195673 cycles |
197182 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
56503 cycles |
59724 cycles |
0.95 |
ML-DSA-44 sign |
180796 cycles |
193453 cycles |
0.93 |
ML-DSA-44 verify |
61187 cycles |
65079 cycles |
0.94 |
ML-DSA-65 keypair |
98682 cycles |
104370 cycles |
0.95 |
ML-DSA-65 sign |
298537 cycles |
315933 cycles |
0.94 |
ML-DSA-65 verify |
100423 cycles |
106176 cycles |
0.95 |
ML-DSA-87 keypair |
156536 cycles |
162885 cycles |
0.96 |
ML-DSA-87 sign |
364760 cycles |
379531 cycles |
0.96 |
ML-DSA-87 verify |
156758 cycles |
164743 cycles |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
42082 cycles |
41564 cycles |
1.01 |
ML-DSA-44 sign |
134163 cycles |
133585 cycles |
1.00 |
ML-DSA-44 verify |
45157 cycles |
44717 cycles |
1.01 |
ML-DSA-65 keypair |
73088 cycles |
72591 cycles |
1.01 |
ML-DSA-65 sign |
214510 cycles |
214322 cycles |
1.00 |
ML-DSA-65 verify |
73546 cycles |
73308 cycles |
1.00 |
ML-DSA-87 keypair |
108117 cycles |
108001 cycles |
1.00 |
ML-DSA-87 sign |
252050 cycles |
253603 cycles |
0.99 |
ML-DSA-87 verify |
111866 cycles |
109742 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 3819863 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-65 keypair |
75829 cycles |
72591 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135619 cycles |
134864 cycles |
1.01 |
ML-DSA-44 sign |
526267 cycles |
523274 cycles |
1.01 |
ML-DSA-44 verify |
148308 cycles |
147587 cycles |
1.00 |
ML-DSA-65 keypair |
226650 cycles |
226332 cycles |
1.00 |
ML-DSA-65 sign |
860097 cycles |
860452 cycles |
1.00 |
ML-DSA-65 verify |
234863 cycles |
234687 cycles |
1.00 |
ML-DSA-87 keypair |
370322 cycles |
370327 cycles |
1.00 |
ML-DSA-87 sign |
1078704 cycles |
1078650 cycles |
1.00 |
ML-DSA-87 verify |
381978 cycles |
382103 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
157726 cycles |
166712 cycles |
0.95 |
ML-DSA-44 sign |
550116 cycles |
582606 cycles |
0.94 |
ML-DSA-44 verify |
169086 cycles |
179293 cycles |
0.94 |
ML-DSA-65 keypair |
267836 cycles |
285346 cycles |
0.94 |
ML-DSA-65 sign |
902166 cycles |
964659 cycles |
0.94 |
ML-DSA-65 verify |
274146 cycles |
292689 cycles |
0.94 |
ML-DSA-87 keypair |
447769 cycles |
479766 cycles |
0.93 |
ML-DSA-87 sign |
1157051 cycles |
1244761 cycles |
0.93 |
ML-DSA-87 verify |
457856 cycles |
490453 cycles |
0.93 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
68281 cycles |
68206 cycles |
1.00 |
ML-DSA-44 sign |
201979 cycles |
201988 cycles |
1.00 |
ML-DSA-44 verify |
70738 cycles |
70695 cycles |
1.00 |
ML-DSA-65 keypair |
121375 cycles |
121162 cycles |
1.00 |
ML-DSA-65 sign |
330717 cycles |
331255 cycles |
1.00 |
ML-DSA-65 verify |
118005 cycles |
118031 cycles |
1.00 |
ML-DSA-87 keypair |
198121 cycles |
198330 cycles |
1.00 |
ML-DSA-87 sign |
426802 cycles |
426779 cycles |
1.00 |
ML-DSA-87 verify |
194748 cycles |
194253 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
72334 cycles |
72197 cycles |
1.00 |
ML-DSA-44 sign |
212135 cycles |
212050 cycles |
1.00 |
ML-DSA-44 verify |
75716 cycles |
75727 cycles |
1.00 |
ML-DSA-65 keypair |
127531 cycles |
127412 cycles |
1.00 |
ML-DSA-65 sign |
350281 cycles |
350180 cycles |
1.00 |
ML-DSA-65 verify |
125483 cycles |
125339 cycles |
1.00 |
ML-DSA-87 keypair |
205301 cycles |
208131 cycles |
0.99 |
ML-DSA-87 sign |
443389 cycles |
448959 cycles |
0.99 |
ML-DSA-87 verify |
205204 cycles |
205063 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
120766 cycles |
120426 cycles |
1.00 |
ML-DSA-44 sign |
452473 cycles |
449374 cycles |
1.01 |
ML-DSA-44 verify |
130839 cycles |
131829 cycles |
0.99 |
ML-DSA-65 keypair |
205544 cycles |
208687 cycles |
0.98 |
ML-DSA-65 sign |
729796 cycles |
740391 cycles |
0.99 |
ML-DSA-65 verify |
211042 cycles |
213436 cycles |
0.99 |
ML-DSA-87 keypair |
338441 cycles |
337635 cycles |
1.00 |
ML-DSA-87 sign |
929258 cycles |
924886 cycles |
1.00 |
ML-DSA-87 verify |
348567 cycles |
345917 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128278 cycles |
128259 cycles |
1.00 |
ML-DSA-44 sign |
447568 cycles |
447651 cycles |
1.00 |
ML-DSA-44 verify |
138373 cycles |
138315 cycles |
1.00 |
ML-DSA-65 keypair |
220146 cycles |
220341 cycles |
1.00 |
ML-DSA-65 sign |
727221 cycles |
727602 cycles |
1.00 |
ML-DSA-65 verify |
223062 cycles |
223189 cycles |
1.00 |
ML-DSA-87 keypair |
365113 cycles |
365093 cycles |
1.00 |
ML-DSA-87 sign |
926622 cycles |
926051 cycles |
1.00 |
ML-DSA-87 verify |
372778 cycles |
372761 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
138533 cycles |
138530 cycles |
1.00 |
ML-DSA-44 sign |
484119 cycles |
484127 cycles |
1.00 |
ML-DSA-44 verify |
148714 cycles |
148699 cycles |
1.00 |
ML-DSA-65 keypair |
242002 cycles |
242316 cycles |
1.00 |
ML-DSA-65 sign |
792696 cycles |
792717 cycles |
1.00 |
ML-DSA-65 verify |
241201 cycles |
241180 cycles |
1.00 |
ML-DSA-87 keypair |
396212 cycles |
396270 cycles |
1.00 |
ML-DSA-87 sign |
1012825 cycles |
1012390 cycles |
1.00 |
ML-DSA-87 verify |
402487 cycles |
402495 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113890 cycles |
113785 cycles |
1.00 |
ML-DSA-44 sign |
356860 cycles |
356400 cycles |
1.00 |
ML-DSA-44 verify |
118531 cycles |
118156 cycles |
1.00 |
ML-DSA-65 keypair |
197180 cycles |
196636 cycles |
1.00 |
ML-DSA-65 sign |
589760 cycles |
589236 cycles |
1.00 |
ML-DSA-65 verify |
194927 cycles |
194738 cycles |
1.00 |
ML-DSA-87 keypair |
323665 cycles |
323344 cycles |
1.00 |
ML-DSA-87 sign |
755812 cycles |
754065 cycles |
1.00 |
ML-DSA-87 verify |
321048 cycles |
320254 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
829325 cycles |
827416 cycles |
1.00 |
ML-DSA-44 sign |
3239633 cycles |
3233101 cycles |
1.00 |
ML-DSA-44 verify |
925127 cycles |
922573 cycles |
1.00 |
ML-DSA-65 keypair |
1413241 cycles |
1410466 cycles |
1.00 |
ML-DSA-65 sign |
5353017 cycles |
5337064 cycles |
1.00 |
ML-DSA-65 verify |
1482432 cycles |
1479411 cycles |
1.00 |
ML-DSA-87 keypair |
2313345 cycles |
2308452 cycles |
1.00 |
ML-DSA-87 sign |
6671028 cycles |
6657983 cycles |
1.00 |
ML-DSA-87 verify |
2417189 cycles |
2413172 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
213826 cycles |
212836 cycles |
1.00 |
ML-DSA-44 sign |
762167 cycles |
760705 cycles |
1.00 |
ML-DSA-44 verify |
241958 cycles |
229196 cycles |
1.06 |
ML-DSA-65 keypair |
381627 cycles |
380999 cycles |
1.00 |
ML-DSA-65 sign |
1253488 cycles |
1254188 cycles |
1.00 |
ML-DSA-65 verify |
372913 cycles |
372030 cycles |
1.00 |
ML-DSA-87 keypair |
606826 cycles |
604389 cycles |
1.00 |
ML-DSA-87 sign |
1594099 cycles |
1595105 cycles |
1.00 |
ML-DSA-87 verify |
618467 cycles |
618551 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
309195 cycles |
299195 cycles |
1.03 |
ML-DSA-44 sign |
1168415 cycles |
1162268 cycles |
1.01 |
ML-DSA-44 verify |
335171 cycles |
330180 cycles |
1.02 |
ML-DSA-65 keypair |
561211 cycles |
555502 cycles |
1.01 |
ML-DSA-65 sign |
1917406 cycles |
1912815 cycles |
1.00 |
ML-DSA-65 verify |
537657 cycles |
527139 cycles |
1.02 |
ML-DSA-87 keypair |
863436 cycles |
868917 cycles |
0.99 |
ML-DSA-87 sign |
2447930 cycles |
2435700 cycles |
1.01 |
ML-DSA-87 verify |
887441 cycles |
879744 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
309195 cycles |
299195 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
272576 cycles |
271747 cycles |
1.00 |
ML-DSA-44 sign |
801220 cycles |
799220 cycles |
1.00 |
ML-DSA-44 verify |
273471 cycles |
272494 cycles |
1.00 |
ML-DSA-65 keypair |
466875 cycles |
469149 cycles |
1.00 |
ML-DSA-65 sign |
1312938 cycles |
1319018 cycles |
1.00 |
ML-DSA-65 verify |
449913 cycles |
451950 cycles |
1.00 |
ML-DSA-87 keypair |
806611 cycles |
805651 cycles |
1.00 |
ML-DSA-87 sign |
1800985 cycles |
1810381 cycles |
0.99 |
ML-DSA-87 verify |
782904 cycles |
783507 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
72bc3f8 to
d186f5e
Compare
This commit replace the eurrently caddq AVX2 implementation to x86_64 assembly code. Signed-off-by: willieyz <willie.zhao@chelpis.com>
a467f42 to
10f3614
Compare
This commit adds mld_poly_caddq to the benchmark components to evaluate the performance impact of replacing the caddq AVX2 intrinsics with x86_64 assembly code. Signed-off-by: willieyz <willie.zhao@chelpis.com>
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
113342 cycles |
113370 cycles |
1.00 |
ML-DSA-44 sign |
356021 cycles |
355986 cycles |
1.00 |
ML-DSA-44 verify |
117872 cycles |
118036 cycles |
1.00 |
ML-DSA-65 keypair |
196532 cycles |
196544 cycles |
1.00 |
ML-DSA-65 sign |
589189 cycles |
589033 cycles |
1.00 |
ML-DSA-65 verify |
194577 cycles |
194759 cycles |
1.00 |
ML-DSA-87 keypair |
322408 cycles |
322752 cycles |
1.00 |
ML-DSA-87 sign |
752104 cycles |
753067 cycles |
1.00 |
ML-DSA-87 verify |
319915 cycles |
320159 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
212629 cycles |
212540 cycles |
1.00 |
ML-DSA-44 sign |
760055 cycles |
759958 cycles |
1.00 |
ML-DSA-44 verify |
228848 cycles |
228975 cycles |
1.00 |
ML-DSA-65 keypair |
380543 cycles |
380692 cycles |
1.00 |
ML-DSA-65 sign |
1252397 cycles |
1252836 cycles |
1.00 |
ML-DSA-65 verify |
371721 cycles |
371790 cycles |
1.00 |
ML-DSA-87 keypair |
604737 cycles |
604270 cycles |
1.00 |
ML-DSA-87 sign |
1593720 cycles |
1593938 cycles |
1.00 |
ML-DSA-87 verify |
618504 cycles |
618393 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 verify |
241958 cycles |
229196 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
461720 cycles |
461204 cycles |
1.00 |
ML-DSA-44 sign |
2132007 cycles |
2133615 cycles |
1.00 |
ML-DSA-44 verify |
546386 cycles |
546329 cycles |
1.00 |
ML-DSA-65 keypair |
773928 cycles |
774556 cycles |
1.00 |
ML-DSA-65 sign |
3496455 cycles |
3505243 cycles |
1.00 |
ML-DSA-65 verify |
849310 cycles |
849774 cycles |
1.00 |
ML-DSA-87 keypair |
1253417 cycles |
1251282 cycles |
1.00 |
ML-DSA-87 sign |
4370207 cycles |
4327691 cycles |
1.01 |
ML-DSA-87 verify |
1368861 cycles |
1367270 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: f9a6d30 | Previous: 9258ea1 | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
237495 cycles |
229189 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
6faaac2 to
5b1b8a7
Compare
Signed-off-by: willieyz <willie.zhao@chelpis.com>
5b1b8a7 to
f9a6d30
Compare
poly_caddqwith assembly #491In this PR, we replace the AVX2 intrinsics implementation of
poly_caddqwith a x86_64 assembly version.To estimate the performance impact, we compare the results shown in the two tables below.
Overall, for keypair, sign, and verify (opt), the performance difference is below 1%, which is consistent with the no-opt case.
In the component-level benchmark for mld_poly_caddq, the observed performance differences are at least 17%. After unrolling the loop by a factor of 4, the differences are reduced to approximately 10%.