RECENT BLOG NEWS
wolfSSL Performance on Intel x86_64 (Part 6)
Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is the last part. In this blog, we will talk about the performance of Elliptic Curve (EC) operations over the P-256 curve.
Elliptic curve cryptography (ECC) is the alternative to finite field (FF) cryptography which has algorithms like RSA, DSA and DH. ECDSA is the elliptic curve variant of RSA and DSA while ECDH is the elliptic curve variant of DH. ECDSA and ECDH can be used anywhere their FF counterparts can be used. ECC requires a pre-defined curve to perform the operations on. The most commonly used curve is P-256 as it has 128-bit strength and is in many standards including TLS, for certificates in IETF, and NIST’s FIPS 186-4. Browsers and web servers are preferring ECDH over DH as it is much faster.
wolfSSL 3.13 and later have completely new implementations of the EC algorithms over the P-256 curve. The implementation is constant-time with respect to private key operations. The implementations include variants in C, and assembly code targeted at Intel x86_64 and x86_64 with BMI2 and ADX. There is a small code size variant of the assembly code that is about 1/3rd the size (smaller pre-computed tables) yet remains very fast.
The two charts below show the relative performance of the old wolfSSL code, new small wolfSSL assembly code, new fast wolfSSL assembly code and OpenSSL as compared to the new wolfSSL C implementation on Ivy Bridge and Skylake CPUs. Note that the OpenSSL super-app does not measure the speed of the ECDH key generation operation. The new C implementation is a lot faster than the old generic C/ASM code for both CPUs. The assembly code is many times better than the C code mostly due to the use of larger pre-computed tables of elliptic curve points. The OpenSSL code is around 10% slower than the new fast wolfSSL assembly code using the generic x86_64 code and between 5% and 35% slower than wolfSSL assembly code for x86_64 with BMI2 and ADX instructions.
Contact us at support@wolfssl.com with questions about the performance of the wolfSSL embedded TLS library.
References:
ECDSA (Elliptic Curve Digital Signature Algorithm)
ECDH (Elliptic-curve Diffie–Hellman)
wolfSSL Embedded SSL for Bare Metal and No OS Environments
Are you looking for an SSL/TLS library which will seamlessly integrate into your bare metal or No-OS environment? If so, continue reading to learn why the wolfSSL lightweight SSL library is a perfect fit for such environments.
wolfSSL has been designed with portability and ease of use in mind, allowing developers to easily integrate it into a bare metal or operating systemless environment. As a large percentage of wolfSSL users are running the library on small, embedded devices, we have added several abstraction layers which make tying wolfSSL into these types of environments an easy task.
Available abstraction layers include:
- Custom Input/Output
- Standard C library / Memory
- File system (Able to use cert/key buffers instead)
- Threading
- Operating System
In addition to abstraction layers, we have tried to keep wolfSSL’s memory usage as low as possible. Build sizes for a complete SSL/TLS stack range from 20-100kB depending on build options, with RAM usage between 1-36kB per connection.
To learn more about how to integrate wolfSSL into your environment or get more information about reducing wolfSSL’s memory usage, please see the wolfSSL Manual or contact us directly.
wolfSSL FAQ page
The wolfSSL FAQ page can be useful for information or general questions that need need answers immediately. It covers some of the most common questions that the support team receives, along with the support team's responses. It's a great resource for questions about wolfSSL, embedded TLS, and for solutions to problems getting started with wolfSSL.
To view this page for yourself, please follow this link here.
Here is a sample list of 5 questions that the FAQ page covers:
- How do I build wolfSSL on ... (*NIX, Windows, Embedded device) ?
- How do I manage the build configuration of wolfSSL?
- How much Flash/RAM does wolfSSL use?
- How do I extract a public key from a X.509 certificate?
- Is it possible to use no dynamic memory with wolfSSL and/or wolfCrypt?
Have a question that isn't on the FAQ? Feel free to email us at support@wolfssl.com.
Case Study: wolfSSL Enables Sensolus to Easily Secure Communications Between Embedded Systems and the Cloud
Based out of Belgium, Sensolus enables companies to more effectively secure and manage their non-powered assets by providing internet-based tracking solutions over a low-powered, wide-area network. STICKNTRACK, Sensolus's flagship product, lets users easily view an asset's statistics such as current location, temperature, and recent activity in a user friendly way on a map or in dashboards.
In order to ensure the encryption of data from the STICKNTRACK devices to the platform, Sensolus found wolfSSL's wolfCrypt crypto library to be the optimal solution. With it's lightweight design and the inclusion of some of the latest ciphers, wolfCrypt was seamlessly integtrated into Sensolus's products to provide users with a safe and secure communication channel to manage all of their assets.
The wolfSSL/Sensolus case study can be viewed on our case studies page along with various other case studies that we have also conducted.
To learn more about Sensolus and their products, feel free to visit their website or contact them at info@sensolus.com.
For questions regarding the use of wolfSSL products in your embedded or IoT devices, please contact us at facts@wolfssl.com.
TLS 1.3 is now available in wolfSSL's embedded SSL/TLS library! Learn more here and don't forget to check out our product page.
wolfSSL Performance on Intel x86_64 (Part 5)
Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 5. In this blog, we will talk about the performance of RSA and Diffie-Hellman (DH).
RSA is the most commonly used public key algorithm for certificates. When performing a TLS handshake, the server will sign a hash of the messages seen so far and the client will verify the signature of certificates in the certificate chain and verify the hash of messages with the public key in the certificate. Signing and verifying are the most time-consuming operations in a handshake.
DH has been the key exchange algorithm of choice in handshakes but is falling out of favor as the Elliptic Curve variants are considerably faster at the same security level. Performing the key exchange is the second most time-consuming operation in a TLS handshake.
wolfSSL 3.13 and later have completely new implementations of RSA and DH targeted at specific key sizes: 2048 and 3072 bits. The implementation is constant-time with respect to private key operations. The implementations include variants in C and assembly code targeted at Intel x86_64 and x86_64 with BMI2 and ADX. The new code is significantly better than the old generic code and is about the same speed as OpenSSL on older CPUs and a little faster on newer CPUs.
The two charts below show the relative performance of the old wolfSSL code, new wolfSSL assembly code and OpenSSL as compared to the new wolfSSL C implementation on Ivy Bridge and Skylake CPUs. Note that the OpenSSL super-app does not measure the speed of DH operations. The new C implementation is a lot faster than the old generic C/ASM code for both CPUs. The assembly code for x86_64 is better than the C code by between 23% and 46% on x86_64 and 92% and 144% using BMI2 and ADX instructions. The OpenSSL code is about the same speed as the wolfSSL assembly code.
Contact us at support@wolfssl.com for questions about the performance of the wolfSSL embedded TLS library, using it on your platform, our about our TLS 1.3 support!
References:
wolfSSL Performance on Intel x86_64 (Part 4)
Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 4. In this blog, we will talk about the performance of Curve25519 and Ed25519.
Curve25519 is set of parameters for a Montgomery elliptic curve and has ~128-bit security. It is used in key exchange and has become popular due to its speed and inclusion in standards. The algorithm is included as part of TLS v1.3 and NIST is considering it as part of SP 800-186. Ed25519 is set of parameters for a Twisted Edwards curve and is mathematically related to Curve25519 and has the same security properties. A new signature scheme has been designed over Twisted Edwards curves that is fast and included as part of TLS v1.3. A draft specification has been written describing digital certificates using EdDSA with Ed25519.
In a TLS handshake, a key exchange operation should always be performed to ensure forward-secrecy. When used, it will be a significant amount of the processing time during the handshake. Improving the performance of Curve25519, therefore, increases the number of TLS connections that can be made per second.
Older releases of wolfSSL have a C implementation of the algorithms. While the C code was quite fast, the new assembly code is significantly better. There is assembly code for generic Intel x86_64 CPUs, and for CPUs with BMI2 and ADX (Broadwell and newer CPUs).
The two charts below show the relative performance of wolfSSL and OpenSSL compared to the C implementation on Ivy Bridge and Skylake CPUs. On the Ivy Bridge CPU, the new assembly code is between 20% and 60% better than the C code and is better than OpenSSL in the one operation that can be measured. On the Skylake CPU, the assembly code is between 60% and 86% faster. The OpenSSL code has not been optimized for this CPU and is significantly slower.
Contact us at support@wolfssl.com with questions about the performance of the wolfSSL embedded TLS library.
References:
wolfSSL Performance on Intel x86_64 (Part 3)
Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 3. In this blog, we will talk about the performance of SHA-256 and SHA-512.
The most commonly used digest algorithms are SHA-256 and SHA-384. With the introduction of AES-GCM in TLS, SHA-256 and SHA-384 are less commonly used for application data authentication. But, they are still used for handshake message authentication, as a one-way function (as required in a pseudo-random number generator) and digital signatures.
The assembly code has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of SHA-256 and SHA-512 is now as good or better than OpenSSL. The four charts below show the performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the performance has increased by between 19% and 60% for SHA-256 and between 25% and 53%. Similarly, on AVX2, the improvement has increased by between 22% and 40% for SHA-256 and between 23% and 37% for SHA-512. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. SHA-384 uses the same algorithm as SHA-512 and therefore has the same underlying implementation and thus the same performance improvements.
Please contact us at support@wolfssl.com with any questions about the performance of the wolfSSL embedded TLS library.
References:
Introduction to Intel® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)
wolfSSL Contributor Stickers!
Are you a code contributor to the wolfSSL embedded SSL/TLS library, or one of wolfSSL’s other projects? If so, and you are interested in receiving some wolfSSL Contributor stickers, email us at facts@wolfssl.com with a quick mention of your contribution and we will mail you some free stickers!
wolfSSL products are Open Source and dual licensed under both GPLv2 and commercial licenses. We are big fans of Open Source software and enjoy seeing the fun things people have used wolfSSL projects in. We give free support to Open Source projects, and maintain a Community page with links to some of the projects that currently use wolfSSL.
If you have done something cool with wolfSSL, or ported us into your favorite project, we’ll be happy to add you to our Community page. Just let us know about your project and send us a link! Using wolfSSL you can easily add in secure, well-tested, and progressive SSL/TLS and crypto to your project. If you are currently using OpenSSL, we even have an OpenSSL compatibility layer to make the transition easier!
wolfSSL Performance on Intel x86_64 (Part 2)
Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 2. In this blog, we will talk about the performance of ChaCha20-Poly1305.
ChaCha20-Poly1305 is a relatively new authenticated encryption algorithm. It was designed as an alternative to AES-GCM. The algorithm is simple and fast on CPUs that do not have hardware acceleration for AES and GCM.
Older releases of wolfSSL did not have assembly code implementations of ChaCh20 or Poly1305. So, adding assembly code that uses AVX1 and AVX2 instructions has made a significant difference. The two charts below show the performance of wolfSSL with respect to OpenSSL on AVX1 and AVX2 chipsets. In both charts, the new assembly code is a clear improvement over the C code. Compared to OpenSSL, wolfSSL is between 2.5% and 23% faster on AVX1 and on AVX2 they are the same speed to wolfSSL being 16% faster!
If you have questions about the performance of the wolfSSL embedded TLS library, please contact us at support@wolfssl.com!
References:
wolfSSL Performance on Intel x86_64 (Part 1)
Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made which are being discussed over a six blog post series. In this first blog, we will talk about the performance of AES-GCM.
The assembly code for AES-GCM has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of AES-GCM is now as good or better than OpenSSL.
The two charts below show the relative performance of AES-128-GCM encryption on an Intel AVX1 and AVX2 chipsets. They compare the performance of wolfSSL and OpenSSL with an older version of wolfSSL (before the assembly code changes).
Small block size performance is important when dealing with locally stored data like keys or data in a database. Meanwhile, large block size performance is important for large data transfers in TLS.
The performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the smallest block size performance has increased by over 130% and at the top end, there is a 42% improvement. Similarly, on AVX2, the improvement is over 150% for small block sizes to 11% for large block sizes. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. Similar performance improvements have been achieved for AES-256-GCM as well.
If you have questions about using the wolfSSL embedded TLS library on your platform, or about performance optimization of the library, contact us at support@wolfssl.com.
References:
Introduction to Intel® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)
Weekly updates
Archives
- January 2025 (7)
- December 2024 (22)
- November 2024 (29)
- October 2024 (18)
- September 2024 (21)
- August 2024 (24)
- July 2024 (27)
- June 2024 (22)
- May 2024 (28)
- April 2024 (29)
- March 2024 (21)
- February 2024 (18)
- January 2024 (21)
- December 2023 (20)
- November 2023 (20)
- October 2023 (23)
- September 2023 (17)
- August 2023 (25)
- July 2023 (39)
- June 2023 (13)
- May 2023 (11)
- April 2023 (6)
- March 2023 (23)
- February 2023 (7)
- January 2023 (7)
- December 2022 (15)
- November 2022 (11)
- October 2022 (8)
- September 2022 (7)
- August 2022 (12)
- July 2022 (7)
- June 2022 (14)
- May 2022 (10)
- April 2022 (11)
- March 2022 (12)
- February 2022 (22)
- January 2022 (12)
- December 2021 (13)
- November 2021 (27)
- October 2021 (11)
- September 2021 (14)
- August 2021 (10)
- July 2021 (16)
- June 2021 (13)
- May 2021 (9)
- April 2021 (13)
- March 2021 (24)
- February 2021 (22)
- January 2021 (18)
- December 2020 (19)
- November 2020 (11)
- October 2020 (3)
- September 2020 (20)
- August 2020 (11)
- July 2020 (7)
- June 2020 (14)
- May 2020 (13)
- April 2020 (14)
- March 2020 (4)
- February 2020 (21)
- January 2020 (18)
- December 2019 (7)
- November 2019 (16)
- October 2019 (14)
- September 2019 (18)
- August 2019 (16)
- July 2019 (8)
- June 2019 (9)
- May 2019 (28)
- April 2019 (27)
- March 2019 (15)
- February 2019 (10)
- January 2019 (16)
- December 2018 (24)
- November 2018 (9)
- October 2018 (15)
- September 2018 (15)
- August 2018 (5)
- July 2018 (15)
- June 2018 (29)
- May 2018 (12)
- April 2018 (6)
- March 2018 (18)
- February 2018 (6)
- January 2018 (11)
- December 2017 (5)
- November 2017 (12)
- October 2017 (5)
- September 2017 (7)
- August 2017 (6)
- July 2017 (11)
- June 2017 (7)
- May 2017 (9)
- April 2017 (5)
- March 2017 (6)
- January 2017 (8)
- December 2016 (2)
- November 2016 (1)
- October 2016 (15)
- September 2016 (6)
- August 2016 (5)
- July 2016 (4)
- June 2016 (9)
- May 2016 (4)
- April 2016 (4)
- March 2016 (4)
- February 2016 (9)
- January 2016 (6)
- December 2015 (4)
- November 2015 (6)
- October 2015 (5)
- September 2015 (5)
- August 2015 (8)
- July 2015 (7)
- June 2015 (9)
- May 2015 (1)
- April 2015 (4)
- March 2015 (12)
- January 2015 (4)
- December 2014 (6)
- November 2014 (3)
- October 2014 (1)
- September 2014 (11)
- August 2014 (5)
- July 2014 (9)
- June 2014 (10)
- May 2014 (5)
- April 2014 (9)
- February 2014 (3)
- January 2014 (5)
- December 2013 (7)
- November 2013 (4)
- October 2013 (7)
- September 2013 (3)
- August 2013 (9)
- July 2013 (7)
- June 2013 (4)
- May 2013 (7)
- April 2013 (4)
- March 2013 (2)
- February 2013 (3)
- January 2013 (8)
- December 2012 (12)
- November 2012 (5)
- October 2012 (7)
- September 2012 (3)
- August 2012 (6)
- July 2012 (4)
- June 2012 (3)
- May 2012 (4)
- April 2012 (6)
- March 2012 (2)
- February 2012 (5)
- January 2012 (7)
- December 2011 (5)
- November 2011 (7)
- October 2011 (5)
- September 2011 (6)
- August 2011 (5)
- July 2011 (2)
- June 2011 (7)
- May 2011 (11)
- April 2011 (4)
- March 2011 (12)
- February 2011 (7)
- January 2011 (11)
- December 2010 (17)
- November 2010 (12)
- October 2010 (11)
- September 2010 (9)
- August 2010 (20)
- July 2010 (12)
- June 2010 (7)
- May 2010 (1)
- January 2010 (2)
- November 2009 (2)
- October 2009 (1)
- September 2009 (1)
- May 2009 (1)
- February 2009 (1)
- January 2009 (1)
- December 2008 (1)