Hot on the heels of our MacOS benchmarks, we now have our Kyber Benchmarks for Arm Cortex-M4.
Before getting into the numbers, some information on the conditions under which the benchmarks were taken:
- The hardware platform was STM NUCLEO-F446ZE
- The HCLK in the project was set to 168MHz
- Only 1 core used
- wolfSSL Math Configuration set to “Single Precision ASM Cortex-M3+ Math”
- Optimization flag: -Ofast
- Conventional algorithms are present for comparison purposes
Here are our results:
RSA 2048 public 82 ops took 1.020 sec, avg 12.439 ms, 80.392 ops/sec
RSA 2048 private 4 ops took 1.827 sec, avg 456.750 ms, 2.189 ops/sec
DH 2048 key gen 5 ops took 1.181 sec, avg 236.200 ms, 4.234 ops/sec
DH 2048 agree 6 ops took 1.419 sec, avg 236.500 ms, 4.228 ops/sec
ECC SECP256R1 key gen 118 ops took 1.012 sec, avg 8.576 ms, 116.601 ops/sec
ECDHE SECP256R1 agree 56 ops took 1.016 sec, avg 18.143 ms, 55.118 ops/sec
KYBER512 128 key gen 232 ops took 1.004 sec, avg 4.328 ms, 231.076 ops/sec
KYBER512 128 encap 192 ops took 1.008 sec, avg 5.250 ms, 190.476 ops/sec
KYBER512 128 decap 178 ops took 1.004 sec, avg 5.640 ms, 177.291 ops/sec
KYBER768 192 key gen 146 ops took 1.008 sec, avg 6.904 ms, 144.841 ops/sec
KYBER768 192 encap 118 ops took 1.008 sec, avg 8.542 ms, 117.063 ops/sec
KYBER768 192 decap 110 ops took 1.000 sec, avg 9.091 ms, 110.000 ops/sec
KYBER1024 256 key gen 92 ops took 1.011 sec, avg 10.989 ms, 90.999 ops/sec
KYBER1024 256 encap 76 ops took 1.000 sec, avg 13.158 ms, 76.000 ops/sec
KYBER1024 256 decap 72 ops took 1.000 sec, avg 13.889 ms, 72.000 ops/sec
Our implementation of Kyber’s performance is looking great compared to all the other algorithms. It might appear that ECDHE comes close, but not when you consider the mechanics of a key exchange.
Note that ECDHE is a NIKE (Non-Interactive Key Exchange) while Kyber is a KEM (Key Encapsulation Mechanism) so in the context of TLS 1.3, the numbers as they stand are misleading.
For NIKEs, both the server and the client must do the key generation operation. Then both the server and the client must also do the key agreement step. On the other hand, for KEMs, the client does key generation once, the server does encapsulation once, and the client does decapsulation once. Since NIKEs have double the number of operations to achieve a shared secret, for a fair comparison, we need to double the average time for ECDHE. In this light, the total time for a key exchange looks like this:
Algorithm |
Total Time for Key Exchange |
ECDH SECP256R1 |
26.719 ms |
Kyber512 (NIST Level 1) |
15.218 ms |
Kyber768 (NIST Level 3) |
24.537 ms |
Kyber1024 (NIST Level 5) |
38.036 ms |
Note that Kyber512, from a security perspective, is comparable to ECDH at SECP256R1.
The numbers speak for themselves: Kyber wins. That said, you can look forward to future optimizations and even better performance gains.
As we’ve noted in the past, Kyber has considerably larger artifacts than ECDHE, depending on your method of transmission, this margin can easily be lost if your transmission speeds are slow.
Want to see further optimizations to our Kyber implementation? Interested in wolfSSL’s other post-quantum algorithm implementations? Let us know so we can prioritize the things you are looking for.
If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.
Download wolfSSL Now