We are excited to share the latest benchmark results of wolfSSL v5.7.0 running on the HiFive Unleashed at 1.4GHz. We implemented AES for ECB, CBC, CTR, GCM, and CCM using assembly for RISC-V. This benchmark demonstrates the performance capabilities of wolfSSL on RISC-V architecture, highlighting our commitment to providing high-performance, lightweight, and secure SSL/TLS solutions across diverse platforms.
The benchmark results prove that the new assembly optimizations are much faster.
With RISC-V assembly optimizations:
./configure --enable-riscv-asm && make
root@HiFiveU:~/wolfssl-riscv# ./wolfcrypt/benchmark/benchmark -aes-cbc -aes-gcm------------------------------------------------------------------------------
wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math: Multi-Precision: Wolf(SP) word-size=64 bits=3072 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
AES-128-CBC-enc 20 MiB took 1.076 seconds, 18.588 MiB/s
AES-128-CBC-dec 20 MiB took 1.083 seconds, 18.473 MiB/s
AES-192-CBC-enc 20 MiB took 1.245 seconds, 16.062 MiB/s
AES-192-CBC-dec 20 MiB took 1.246 seconds, 16.047 MiB/s
AES-256-CBC-enc 15 MiB took 1.057 seconds, 14.189 MiB/s
AES-256-CBC-dec 15 MiB took 1.055 seconds, 14.212 MiB/s
AES-128-GCM-enc 15 MiB took 1.300 seconds, 11.543 MiB/s
AES-128-GCM-dec 15 MiB took 1.300 seconds, 11.535 MiB/s
AES-192-GCM-enc 15 MiB took 1.425 seconds, 10.526 MiB/s
AES-192-GCM-dec 15 MiB took 1.425 seconds, 10.523 MiB/s
AES-256-GCM-enc 10 MiB took 1.032 seconds, 9.687 MiB/s
AES-256-GCM-dec 10 MiB took 1.032 seconds, 9.691 MiB/s
GMAC Table 4-bit 31 MiB took 1.025 seconds, 30.251 MiB/s
Benchmark complete
Without RISC-V assembly optimizations:
./configure —enable-all && make
root@HiFiveU:~/wolfssl# ./wolfcrypt/benchmark/benchmark -aes-cbc -aes-gcm
------------------------------------------------------------------------------
wolfSSL version 5.7.0
------------------------------------------------------------------------------
Math: Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
AES-128-CBC-enc 5 MiB took 12.798 seconds, 0.391 MiB/s
AES-128-CBC-dec 5 MiB took 12.672 seconds, 0.395 MiB/s
AES-192-CBC-enc 5 MiB took 15.301 seconds, 0.327 MiB/s
AES-192-CBC-dec 5 MiB took 15.181 seconds, 0.329 MiB/s
AES-256-CBC-enc 5 MiB took 17.820 seconds, 0.281 MiB/s
AES-256-CBC-dec 5 MiB took 17.669 seconds, 0.283 MiB/s
AES-128-GCM-enc 5 MiB took 12.870 seconds, 0.388 MiB/s
AES-128-GCM-dec 5 MiB took 12.870 seconds, 0.388 MiB/s
AES-192-GCM-enc 5 MiB took 15.375 seconds, 0.325 MiB/s
AES-192-GCM-dec 5 MiB took 15.376 seconds, 0.325 MiB/s
AES-256-GCM-enc 5 MiB took 17.878 seconds, 0.280 MiB/s
AES-256-GCM-dec 5 MiB took 17.896 seconds, 0.279 MiB/s
AES-128-GCM-STREAM-enc 5 MiB took 12.878 seconds, 0.388 MiB/s
AES-128-GCM-STREAM-dec 5 MiB took 12.878 seconds, 0.388 MiB/s
AES-192-GCM-STREAM-enc 5 MiB took 15.379 seconds, 0.325 MiB/s
AES-192-GCM-STREAM-dec 5 MiB took 15.385 seconds, 0.325 MiB/s
AES-256-GCM-STREAM-enc 5 MiB took 17.881 seconds, 0.280 MiB/s
AES-256-GCM-STREAM-dec 5 MiB took 17.888 seconds, 0.280 MiB/s
GMAC Table 4-bit 30 MiB took 1.006 seconds, 29.831 MiB/s
Benchmark complete
At wolfSSL we’re excited about stateful hash-based signature schemes and the CNSA 2.0, and we just had a webinar on this subject. If you recall, previously we added initial support for LMS/HSS and XMSS/XMSS^MT, through external integration with the hash-sigs and xmss-reference implementations.
Recently however we have completed our own wolfCrypt implementations of these algorithms, and would like to share benchmarking results and some of the build options available. Generally the wolfCrypt implementations of these signature methods are faster, with more options available to tune build size and performance.
With that said, we’ll review some of the more relevant build options and benchmarking data for LMS/HSS, and XMSS/XMSS^MT. These benchmarks were obtained on a Fedora 38 workstation with an Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz. Only a single core was used. wolfSSL was built with –enable-intelasm to utilize assembly speedups for all tests. Note: LMS/HSS and XMSS/XMSS^MT support a very wide range of parameters. For the sake of conciseness only a targeted range is benchmarked here.
LMS build options and benchmarking
The five main defines that customize the wolfCrypt LMS/HSS build are the following:
WOLFSSL_LMS_LARGE_CACHES
WOLFSSL_WC_LMS_SMALL
WOLFSSL_LMS_MAX_LEVELS=N
WOLFSSL_LMS_MAX_HEIGHT=H
WOLFSSL_LMS_VERIFY_ONLY
The define WOLFSSL_LMS_LARGE_CACHES will cache more of the authentication path into memory, speeding up signing operations for larger height trees.
The define WOLFSSL_WC_LMS_SMALL reduces code size and memory use overall, with the tradeoff of much slower signing operations. However the performance impact for verification is negligible.
The defines WOLFSSL_LMS_MAX_LEVELS, and WOLFSSL_LMS_MAX_HEIGHT set compile time limits on the size of the LMS/HSS hypertree, and mainly reduce code footprint without impacting performance. These can be used to slim the build size if you are only interested in a specific parameter set range. More specifically, WOLFSSL_LMS_MAX_LEVELS sets the max allowed levels in HSS (the number of trees in the hypertree), while WOLFSSL_LMS_MAX_HEIGHT sets the max allowed height per tree for both LMS and HSS.
The define WOLFSSL_LMS_VERIFY_ONLY restricts the build to a smaller verify-only subset (LMS API and data structures needed for keygen/signing are omitted). This does not impact verify performance, and is intended for embedded targets that need verify-only functionality (e.g. wolfBoot). WOLFSSL_LMS_VERIFY_ONLY can be combined with WOLFSSL_WC_LMS_SMALL, WOLFSSL_LMS_MAX_LEVELS, and WOLFSSL_LMS_MAX_HEIGHT for further footprint reduction.
In Table 1 we show benchmarking results (obtained with ./wolfcrypt/benchmark/benchmark -lms_hss) for these different build options, with the external LMS/HSS implementation provided for comparison.
In general we see the default wolfCrypt LMS/HSS performance (wc_lms) is much faster than the external integration (ext_lms) for all categories of operation (keygen, signing, verifying). The WOLFSSL_LMS_LARGE_CACHES (wc_lms large) option speeds up signing operations for larger height trees, but otherwise does not impact performance. The small variations in verify speed across wc_lms, wc_lms large, and wc_lms small are likely just system noise and do not represent a systematic trend. The WOLFSSL_WC_LMS_SMALL option (wc_lms small) significantly reduces signing speed, but leaves verification speed basically unchanged, making this option attractive for verify-only applications in embedded systems.
Table 1: Comparison of wolfCrypt LMS/HSS (wc_lms), wolfCrypt LMS/HSS with WOLFSSL_LMS_LARGE_CACHES (wc_lms large), wolfCrypt LMS/HSS with WOLFSSL_WC_LMS_SMALL (wc_lms small), and the external integration implementation (ext_lms). All values in units of ops/sec.
wc_lms
wc_lms large
wc_lms small
ext_lms
L2_H10_W2 keygen
6.482
6.494
12.828
1.330
L2_H10_W2 sign
4437.469
5521.796
6.526
786.083
L2_H10_W2 verify
13954.450
14087.794
13874.450
4789.383
L2_H10_W4 keygen
3.567
3.592
6.954
0.764
L2_H10_W4 sign
2452.361
3052.326
3.562
443.225
L2_H10_W4 verify
6482.891
6707.271
6962.215
2281.440
L3_H5_W4 keygen
70.926
73.673
227.376
17.467
L3_H5_W4 sign
4660.370
4669.019
74.653
820.640
L3_H5_W4 verify
4632.118
4670.963
4790.742
1756.355
L3_H5_W8 keygen
9.395
9.413
29.041
2.265
L3_H5_W8 sign
609.408
605.199
9.542
106.059
L3_H5_W8 verify
561.759
554.635
573.341
214.093
L3_H10_W4 keygen
2.384
2.368
7.128
0.569
L3_H10_W4 sign
2459.698
3067.848
2.376
444.601
L3_H10_W4 verify
4895.203
4345.130
4793.853
1618.676
L4_H5_W8 keygen
7.045
7.017
29.258
1.770
L4_H5_W8 sign
608.915
607.318
7.168
106.881
L4_H5_W8 verify
446.384
443.804
438.542
145.672
Graph 1: Signing speeds for wolfCrypt LMS/HSS (wc_lms), wolfCrypt LMS/HSS with WOLFSSL_LMS_LARGE_CACHES (wc_lms large), and the external integration implementation (ext_lms). All values in units of ops/sec.
XMSS build options and benchmarking
Three important defines that customize the wc_xmss build are:
WOLFSSL_WC_XMSS_SMALL
WOLFSSL_XMSS_MAX_HEIGHT=N
WOLFSSL_XMSS_VERIFY_ONLY
The define WOLFSSL_WC_XMSS_SMALL reduces code size and memory use overall, with the tradeoff of much slower signing operations, and 20-30% slower verification.
The define WOLFSSL_XMSS_MAX_HEIGHT=N sets compile time limits on the max height of the hypertree, and mainly reduces code size without impacting performance.
The define WOLFSSL_XMSS_VERIFY_ONLY restricts the build to a smaller verify-only subset, and can be combined with WOLFSSL_WC_XMSS_SMALL, and WOLFSSL_XMSS_MAX_HEIGHT for further size reduction. It does not impact verify performance.
In Table 2 we show benchmarking results for XMSS/XMSS^MT for these options (obtained with ./wolfcrypt/benchmark/benchmark -xmss_xmssmt_sha256), with the external XMSS/XMSS^MT implementation for comparison. The default wolfCrypt XMSS/XMSS^MT (wc_xmss) is in general better than the external integration (ext_xmss), for all operations. There is a smaller difference between wc_xmss and ext_xmss as compared to wc_lms and ext_lms though, because ext_xmss can benefit from assembly speedups whereas ext_lms cannot. Similar to LMS, the WOLFSSL_WC_XMSS_SMALL option (wc_xmss small) significantly reduces signing performance, but verify speeds remain fast, making this a good option for embedded verify-only targets.
Table 2: Comparison of wolfCrypt XMSS/XMSS^MT (wc_xmss), wolfCrypt XMSS/XMSS^MT with WOLFSSL_WC_XMSS_SMALL (wc_xmss small), and the external integration implementation (ext_xmss). All values in units of ops/sec.
wc_xmss
wc_xmss small
ext_xmss
XMSS-SHA2_10_256 keygen
1.587
1.079
0.943
XMSS-SHA2_10_256 sign
363.693
1.106
226.782
XMSS-SHA2_10_256 verify
3050.276
2044.995
1892.234
XMSSMT-SHA2_20/2_256 keygen
0.808
1.100
0.472
XMSSMT-SHA2_20/2_256 sign
298.138
0.551
191.214
XMSSMT-SHA2_20/2_256 verify
1307.295
982.836
852.348
XMSSMT-SHA2_20/4_256 keygen
9.880
35.274
7.309
XMSSMT-SHA2_20/4_256 sign
390.942
8.681
290.516
XMSSMT-SHA2_20/4_256 verify
729.433
517.298
443.444
XMSSMT-SHA2_40/4_256 keygen
0.406
1.107
0.237
XMSSMT-SHA2_40/4_256 sign
294.738
0.276
161.656
XMSSMT-SHA2_40/4_256 verify
750.591
487.257
424.986
XMSSMT-SHA2_40/8_256 keygen
5.604
35.318
3.755
XMSSMT-SHA2_40/8_256 sign
469.764
4.374
293.184
XMSSMT-SHA2_40/8_256 verify
361.289
262.160
225.254
XMSSMT-SHA2_60/6_256 keygen
0.266
1.099
0.159
XMSSMT-SHA2_60/6_256 sign
280.160
0.185
144.637
XMSSMT-SHA2_60/6_256 verify
521.610
352.718
295.882
XMSSMT-SHA2_60/12_256 keygen
4.143
35.280
2.505
XMSSMT-SHA2_60/12_256 sign
514.658
2.910
292.371
XMSSMT-SHA2_60/12_256 verify
247.682
170.459
152.471
Graph 2: Verify speeds for wolfCrypt XMSS/XMSS^MT (wc_xmss), wolfCrypt XMSS/XMSS^MT with WOLFSSL_WC_XMSS_SMALL (wc_xmss small), and the external integration implementation (ext_xmss). All values in units of ops/sec.
Conclusions
In general our wolfCrypt implementations for LMS/HSS and XMSS/XMSS^MT are significantly faster than the external reference implementations, with speedups of 20-30% to even 3x-4x possible depending on the combination of operation, algorithm, and parameters.
The small footprint build shows fast verification speeds for all parameters, making it an attractive choice for embedded verify-only applications (e.g. wolfBoot).
Overall our LMS/HSS implementation is faster than XMSS/XMSS^MT (at least on x86), which is consistent with what is known about these two methods. However which of the two is more appropriate for your use case will ultimately depend on other factors as well, such as signature size, target environment, and parameters used.
If you’re interested in learning more about our post-quantum work, or want to learn more about stateful hash-based signature schemes, contact us at wolfSSL by emailing facts@wolfSSL.com or calling us at +1 425 245 8247 to reach out to your regional wolfSSL business director.
It may not be as glamorous as the new ESP32 RISC-V chipsets with all the various hardware acceleration capabilities, but the ESP8266 is a well established device which has a large codebase available with an even larger user community.
Due to high customer demand, we’ve enhanced the wolfSSL libraries for the ESP8266. The recent changes have improved both the ESP-IDF CMake and traditional Makefile builds. This new capability allows for specification of the wolfSSL component source code as an alternative to using the setup script to copy everything locally.
For make, set the WOLFSSL_ROOT value in components/wolfssl/component.mk
For cmake, there are more options:
Set the WOLFSSL_ROOT value in components/wolfssl/CMakeLists.txt
Set the WOLFSSL_ROOT environment variable.
Have the components/wolfssl/CMakeLists.txt as a subdirectory in wolfSSL.
When a project is in a subdirectory of wolfSSL, the cmake file will search parent directories, up to the root, looking for wolfSSL.
The ability to specify the wolfSSL component source code ensures consistent versioning across projects and facilitates easy updates via GitHub.
You may have seen our recent announcement regarding wolfCrypt hardware acceleration for the ESP32 series. There’s no such capability on the ESP8266. However, there’s still a noticeable difference between debug and release optimizations, as shown at the end of this blog.
Once the Espressif ESP8266 RTOS SDK is installed, it is easy to get the wolfSSL examples working (see the README for more details):
# Set your path to RTOS SDK,
# shown here for default from WSL with VisualGDB
WRK_IDF_PATH=/mnt/c/SysGCC/esp8266/rtos-sdk/v3.4
# or
WRK_IDF_PATH=~/esp/ESP8266_RTOS_SDK
# Setup the environment
. $WRK_IDF_PATH/export.sh
# Optional: install as needed / prompted
# /mnt/c/SysGCC/esp8266/rtos-sdk/v3.4/install.sh
# Fetch wolfssl from GitHub if needed:
cd /workspace
git clone https://github.com/wolfSSL/wolfssl.git
# change directory to wolfssl client example.
cd wolfssl/IDE/Espressif/ESP-IDF/examples/wolfssl_client
# Adjust settings as desired
# Set IP address and wifi SSID name & password
idf.py menuconfig
# Build, flash and monitor
idf.py build flash -p /dev/ttyS70 -b 115200
idf.py monitor -p /dev/ttyS70 -b 74880
Are you interested in using the ESP8266 or ESP32 in your next project? Let us know! We love to hear about how wolfSSL is being used, and can optionally help promote your project on social media, with your approval.
The best encryption libraries are now available on the PlatformIO environment!
At wolfSSL, we continue to embrace rapid prototyping environments, including Arduino, Visual Studio, and now PlatformIO for VS Code, among other IDE applications.
The stable release versions will generally follow our standard release cycle. The initial 5.7.0 versions include post stable-release updates needed for the Initial PlatformIO support.
set PATH=%PATH%;C:\Users\%USERNAME%\.platformio\penv\Scripts\
pio --help
pio account show
Our initial release has full support for Espressif ESP32 boards, but other boards should work with just a few modifications to the wolfSSL user_settings.h file. See the example configs:
Is your device working on the PlatformIO environment with wolfSSL? Send us a message and let us help you get started: support@wolfSSL.com or open an issue on GitHub.
PQC support for the Zephyr port was introduced in the last wolfSSL release using liboqs. This involved adding necessary files to the CMakeLists.txt for the Zephyr module. Zephyr is an open-source real-time operating system (RTOS) designed for resource-constrained devices and embedded systems. It is maintained by the Linux Foundation and supported by a vibrant community of developers and contributors.
PR #7026 (https://github.com/wolfSSL/wolfssl/pull/7026) also addressed proper random number generation within liboqs by using the wolfSSL interface. Previously, liboqs random data acquisition relied on various sources, depending on the liboqs build configuration. With the changes, a custom RNG method is provided through the OQS_randombytes_custom_algorithm() interface, enabling liboqs to obtain RNG data from wolfSSL for all generic liboqs uses.
MicroBlaze, developed by Xilinx, is a soft processor core optimized for Xilinx FPGAs. It offers flexibility and scalability, making it suitable for a wide range of applications, including embedded systems and IoT devices. Integrating wolfSSL’s AES-GCM with MicroBlaze is possible and has been done running on a soft CPU on MicroBlaze. In the latest wolfSSL release this integration saw some additional enhancements. When used on a MicroBlaze, wolfSSL’s AES-GCM enhances the security capabilities of FPGA-based systems, enabling developers to implement secure communication protocols and data encryption mechanisms. There is also the option of setting up wolfSSL so that it makes use of Xilinx’s xilsecure while running on the Microblaze. Increasing the AES-GCM performance significantly.
For more information about using wolfSSL on a MicroBlaze or if you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.
Did you know wolfSSL has integration of RSA-PSS signatures with Certificate Revocation List (CRL) support?
RSA-PSS: Enhancing Security Layers
RSA-PSS, or Probabilistic Signature Scheme, represents a modern approach to digital signatures. Unlike traditional RSA signatures, RSA-PSS offers improved security properties, making it more resilient against various cryptographic attacks. By adopting RSA-PSS, wolfSSL users benefit from heightened security, enhancing the integrity of cryptographic operations.
Certificate Revocation List (CRL): Managing Certificate Integrity
In the realm of certificate management, CRL plays a pivotal role. It serves as a mechanism for indicating the revocation status of digital certificates. With CRL, systems can promptly identify and reject compromised or revoked certificates, bolstering the overall security posture. Integrating CRL support into wolfSSL empowers users with efficient certificate management capabilities, ensuring the authenticity and integrity of cryptographic transactions.
Empowering wolfSSL with RSA-PSS and CRL Integration
The fusion of RSA-PSS with CRL support within wolfSSL is a logical step when providing cutting-edge security solutions. Now, wolfSSL users can leverage the combined strength of RSA-PSS signatures and CRL management to fortify their cryptographic environments.
In the last release of wolfSSL there was some house cleaning done on older RSA implementations. The user RSA layer was removed along with the hooks used for tying in IPP. When those were first introduced we had yet to implement SP (single precision) versions of RSA. Fast forward to today, and there is a faster implementation of RSA in wolfSSL itself. In IPP v0.9 it was able to do 990.09 RSA 2048 bit sign operations per second and in wolfSSL 5.7.0 it was able to run 1,015.23 operations per second. Verify operations took around the same time with both libraries now at 35,714 operations per second on average. These measurements were collected on an older Intel(R) Core(TM) i7-4870HQ CPU. Along with a performant implementation of RSA there are now the crypto callbacks if desiring to plug in custom RSA operations. This being the case the –enable-fastrsa, user RSA, and IPP hooks were dropped to lower maintenance and reduce bundle size.
Recently, a notable modification was introduced in wolfSSL, a prominent provider of security solutions. Pull request #7245 (https://github.com/wolfSSL/wolfssl/pull/7245) focuses on optimizing memory management by introducing a function to unload intermediate CA certificates and free up memory. Let’s explore the significance of this code change and its potential impact on enhancing efficiency and resource utilization within cryptographic applications.
Specifically, the code change addresses the need to efficiently handle intermediate Certificate Authority (CA) certificates. These certificates, while essential for establishing trust chains in cryptographic operations, can consume valuable memory resources, particularly in resource-constrained environments.
The essence of the code change lies in the introduction of a dedicated function (wolfSSL_CertManagerUnloadIntermediateCerts()) to unload intermediate CA certificates from memory when they are no longer needed. By using this function, developers can optimize resource utilization, thereby enhancing the overall efficiency and stability of cryptographic operations.
Key Benefits: The introduction of the function to unload intermediate CA certificates brings several notable benefits:
Efficient Memory Management: By providing a mechanism to unload intermediate CA certificates from memory, the code change ensures efficient utilization of resources. This is particularly crucial in environments where memory constraints are a concern, such as embedded systems and IoT devices.
Prevention of Memory Leaks: Memory leaks can pose significant security and reliability risks in software applications. The new function helps prevent memory leaks by explicitly releasing memory allocated for intermediate CA certificates when they are no longer required, thereby improving the robustness of cryptographic operations.
Scalability and Performance: Optimal memory management contributes to improved scalability and performance of cryptographic applications. By freeing up memory resources, the code change enables applications to handle larger workloads more efficiently, leading to enhanced responsiveness and overall performance.
By incorporating the function to unload intermediate CA certificates, developers can optimize resource utilization and mitigate potential security risks associated with memory management issues. This not only enhances the reliability and stability of cryptographic applications but also contributes to the overall security resilience of the systems in which they are deployed.
wolfSSL is announcing a long term support (LTS) version of the wolfSSL library. The goal of this product will be to provide users with fully ABI compatible releases of wolfSSL that are secure against all known vulnerabilities. Patches for vulnerabilities will be backported to the LTS branch in an ABI compatible way to guarantee security and stability.
ABI (Application Binary Interface) is a low-level interface that defines how functions and data structures are accessed in machine code. ABI specifies how parameters are passed to functions, how return values are retrieved, and how data structures are arranged in memory. Guaranteeing ABI stability means that compiling a newer version of the library with the same configuration will work with programs linked against an older release.
wolfSSL LTS will provide users with a stable and reliable library that can be kept up to date with no additional headache. Your programs will always be compatible with the latest wolfSSL LTS release and will be safe from discovered vulnerabilities. Separating out a LTS branch will allow us to continue developing and improving wolfSSL without the strict restrictions of ABI backwards compatibility. Users will be able to choose the appropriate version for them based on their individual priorities.
wolfSSL LTS, like all of our libraries, will be offered under a dual-license model. GPL will be available for open source users and a commercial license will be available for customers integrating it in commercial products. Additional limited backporting services will also be available for customers.
If you’re interested in wolfSSL LTS or have questions about any of the above, please reach to us at facts@wolfSSL.com or call us at +1 425 245 8247!