wolfSSL Performance on Intel x86_64 (Part 4)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 4. In this blog, we will talk about the performance of Curve25519 and Ed25519.

Curve25519 is set of parameters for a Montgomery elliptic curve and has ~128-bit security. It is used in key exchange and has become popular due to its speed and inclusion in standards. The algorithm is included as part of TLS v1.3 and NIST is considering it as part of SP 800-186. Ed25519 is set of parameters for a Twisted Edwards curve and is mathematically related to Curve25519 and has the same security properties. A new signature scheme has been designed over Twisted Edwards curves that is fast and included as part of TLS v1.3. A draft specification has been written describing digital certificates using EdDSA with Ed25519.

In a TLS handshake, a key exchange operation should always be performed to ensure forward-secrecy. When used, it will be a significant amount of the processing time during the handshake. Improving the performance of Curve25519, therefore, increases the number of TLS connections that can be made per second.

Older releases of wolfSSL have a C implementation of the algorithms. While the C code was quite fast, the new assembly code is significantly better. There is assembly code for generic Intel x86_64 CPUs, and for CPUs with BMI2 and ADX (Broadwell and newer CPUs).

The two charts below show the relative performance of wolfSSL and OpenSSL compared to the C implementation on Ivy Bridge and Skylake CPUs. On the Ivy Bridge CPU, the new assembly code is between 20% and 60% better than the C code and is better than OpenSSL in the one operation that can be measured. On the Skylake CPU, the assembly code is between 60% and 86% faster. The OpenSSL code has not been optimized for this CPU and is significantly slower.

Contact us at support@wolfssl.com with questions about the performance of the wolfSSL embedded TLS library.

Curve25519_Ed25519_Intel_x86_64Curve25519_Ed25519_Intel_BMI2_ADX

References:

Curve25519: high-speed elliptic-curve cryptography
Ed25519

wolfSSL Performance on Intel x86_64 (Part 3)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 3. In this blog, we will talk about the performance of SHA-256 and SHA-512.

The most commonly used digest algorithms are SHA-256 and SHA-384. With the introduction of AES-GCM in TLS, SHA-256 and SHA-384 are less commonly used for application data authentication. But, they are still used for handshake message authentication, as a one-way function (as required in a pseudo-random number generator) and digital signatures.

The assembly code has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of SHA-256 and SHA-512 is now as good or better than OpenSSL. The four charts below show the performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the performance has increased by between 19% and 60% for SHA-256 and between 25% and 53%. Similarly, on AVX2, the improvement has increased by between 22% and 40% for SHA-256 and between 23% and 37% for SHA-512. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. SHA-384 uses the same algorithm as SHA-512 and therefore has the same underlying implementation and thus the same performance improvements.

Please contact us at support@wolfssl.com with any questions about the performance of the wolfSSL embedded TLS library.

SHA-256-AVX1 SHA-256-AVX2-BMI2 SHA-512-AVX1 SHA-512-AVX2-BMI2

References:

Introduction to Intel® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)

wolfSSL Contributor Stickers!

wolfSSL Contributor Stickers

Are you a code contributor to the wolfSSL embedded SSL/TLS library, or one of wolfSSL’s other projects?  If so, and you are interested in receiving some wolfSSL Contributor stickers, email us at facts@wolfssl.com with a quick mention of your contribution and we will mail you some free stickers!

wolfSSL products are Open Source and dual licensed under both GPLv2 and commercial licenses.  We are big fans of Open Source software and enjoy seeing the fun things people have used wolfSSL projects in.  We give free support to Open Source projects, and maintain a Community page with links to some of the projects that currently use wolfSSL.

If you have done something cool with wolfSSL, or ported us into your favorite project, we’ll be happy to add you to our Community page.  Just let us know about your project and send us a link!  Using wolfSSL you can easily add in secure, well-tested, and progressive SSL/TLS and crypto to your project.  If you are currently using OpenSSL, we even have an OpenSSL compatibility layer to make the transition easier!

wolfSSL Performance on Intel x86_64 (Part 2)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 2. In this blog, we will talk about the performance of ChaCha20-Poly1305.

ChaCha20-Poly1305 is a relatively new authenticated encryption algorithm. It was designed as an alternative to AES-GCM. The algorithm is simple and fast on CPUs that do not have hardware acceleration for AES and GCM.

Older releases of wolfSSL did not have assembly code implementations of ChaCh20 or Poly1305. So, adding assembly code that uses AVX1 and AVX2 instructions has made a significant difference. The two charts below show the performance of wolfSSL with respect to OpenSSL on AVX1 and AVX2 chipsets. In both charts, the new assembly code is a clear improvement over the C code. Compared to OpenSSL, wolfSSL is between 2.5% and 23% faster on AVX1 and on AVX2 they are the same speed to wolfSSL being 16% faster!

If you have questions about the performance of the wolfSSL embedded TLS library, please contact us at support@wolfssl.com!

ChaCha-Poly1305 - AVX1

ChaCha-Poly1305 - AVX2

References:

ChaCha Stream Cipher
Poly1305 (Wikipedia)

wolfSSL Performance on Intel x86_64 (Part 1)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made which are being discussed over a six blog post series. In this first blog, we will talk about the performance of AES-GCM.

The assembly code for AES-GCM has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of AES-GCM is now as good or better than OpenSSL.

The two charts below show the relative performance of AES-128-GCM encryption on an Intel AVX1 and AVX2 chipsets. They compare the performance of wolfSSL and OpenSSL with an older version of wolfSSL (before the assembly code changes).

Small block size performance is important when dealing with locally stored data like keys or data in a database. Meanwhile, large block size performance is important for large data transfers in TLS.

The performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the smallest block size performance has increased by over 130% and at the top end, there is a 42% improvement. Similarly, on AVX2, the improvement is over 150% for small block sizes to 11% for large block sizes. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. Similar performance improvements have been achieved for AES-256-GCM as well.

AES-128-GCM Enc - AVX1 AES-128-GCM Enc - AVX2 (with RORX)

If you have questions about using the wolfSSL embedded TLS library on your platform, or about performance optimization of the library, contact us at support@wolfssl.com.

References:

Introduction to Intel® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)

wolfSSL 3.15.3 Now Available

wolfSSL is proud to announce release version 3.15.3 of the wolfSSL embedded TLS library.  This release contains bug fixes and new features, which include:

  • ECDSA blinding added for hardening against side channel attacks
  • Fix for OpenSSL compatibility layer build with no server (NO_WOLFSSL_SERVER) and no client (NO_WOLFSSL_CLIENT) defined
  • Intel assembly instructions support for compatible AMD processors
  • wolfCrypt port for Mentor Graphics Nucleus RTOS
  • Fix added for MatchDomainName() with additional tests added
  • Fixes for building with ‘WOLFSSL_ATECC508A’ defined
  • Fix for verifying a PKCS7 files in BER format with indefinite size

This release of wolfSSL fixes 2 security vulnerability fixes:

Medium level fix for PRIME + PROBE attack combined with a variant of Lucky 13.  Constant time hardening was done to avoid potential cache-based side channel attacks when verifying the MAC on a TLS packet. CBC cipher suites are susceptible on systems where an attacker could gain access and run a parallel program for inspecting caching. Only wolfSSL users that are using TLS/DTLS CBC cipher suites need to update. Users that have only AEAD and stream cipher suites set, or have built with WOLFSSL_MAX_STRENGTH (--enable-maxstrength), are not vulnerable. Thanks to Eyal Ronen, Kenny Paterson, and Adi Shamir for the report.

Medium level fix for a ECDSA side channel attack. wolfSSL is one of over a dozen vendors mentioned in the recent Technical Advisory “ROHNP” by author Ryan Keegan. Only wolfSSL users with long term ECDSA private keys using our fastmath or normal math libraries on systems where attackers can get access to the machine using the ECDSA key need to update. An attacker gaining access to the system could mount a memory cache side channel attack that could recover the key within a few thousand signatures. wolfSSL users that are not using ECDSA private keys, that are using the single precision math library, or that are using ECDSA offloading do not need to update. (blog with more information: https://www.wolfssl.com/wolfssl-and-rohnp/)

For more information, please contact facts@wolfssl.com. You can see the full change log in the source archive from our website at www.wolfssl.com or at our GitHub repository.

wolfSSL in stunnel TLS Proxy

Since version 3.6.6, wolfSSL has had continually improving support for stunnel, a lightweight TLS proxy, designed to add SSL/TLS encryption to unsecured applications without changes to the program`s source code. Licensed under GNU GPLv2 and with an alternative commercial option, stunnel can be utilized to secure a host of different applications, including: mail exchange (SMTP, IMAP, POP3), web hosting (HTTP), remote shell, and virtually any other unprotected protocol desired.

Porting stunnel to use wolfSSL`s embedded SSL/TLS library means taking advantage of wolfSSL`s minimal footprint and high speed crypto implementation to increase performance and decrease required resources when compared to the previous SSL library. Not only that, but using wolfSSL with stunnel combines these benefits with the peace of mind that your application is secured by a progressive, transparent and stable SSL/TLS library, known for its quality, integrity and efficiency.

To build wolfSSL for use with stunnel, simply configure wolfSSL with:

$ ./configure --enable-stunnel

from wolfSSL`s main directory, then make and make install.

For a version of stunnel that links to the wolfSSL library, or for more information, contact us at facts@wolfssl.com.

wolfMQTT connects with IBM’s Watson IoT Platform

With the latest wolfMQTT v1.1 release you can easily connect your devices running wolfMQTT to IBM’s Watson IoT Platform. Trying out wolfMQTT is simple using the provided MQTT client example and your IBM Cloud account. The default example provides a link to the IBM Quickstart broker where you can view a graph generated by the data without an account.

As a side note, wolfMQTT uses the wolfSSL embedded SSL/TLS library for SSL/TLS support.  Since wolfSSL supports TLS 1.3, your wolfMQTT-based projects can now use MQTT with TLS 1.3 with a supported broker!

You can download the latest release from our website or clone on GitHub. For more information please email us at facts@wolfssl.com.

wolfSSL now has lwIP support

The wolfSSL (formerly CyaSSL) embedded SSL library supports lwIP, the light weight internet protocol implementation, out of the box.  The user merely needs to define WOLFSSL_LWIP or uncomment the line /* #define WOLFSSL_LWIP */ in os_settings.h to use wolfSSL with lwIP.  

The focus of lwIP is to reduce RAM usage while still providing a full TCP stack.  That focus makes lwIP great for use in embedded systems, the same area where wolfSSL is an ideal match for SSL/TLS needs.  An active community exists with contributor ports for many systems.  Give it a try and let us know if you have any suggestions or questions.

For the latest news and releases of lwIP, you can visit the project homepage, here: http://savannah.nongnu.org/projects/lwip/

Intro to PKCS #5: Password-Based Cryptography Specification

Our third post in our PKCS series, we will be looking at PKCS  #5. PKCS #5 is the Password-Based Cryptography Specification and is currently defined by version 2.0 of the specification. It is defined in RFC 2898 http://tools.ietf.org/html/rfc2898. It applies a pseudorandom function, such as a cryptographic hash, cipher, or HMAC to the input password or passphrase along with a salt value and repeats the process many times to produce a derived key, which can then be used as a cryptographic key in subsequent operations. The added computational work makes password cracking much more difficult, and is known as key stretching.

A. Key Derivation Functions

A key derivation function produces a derived key from a based key and other parameters. In a password-based key derivation function, the base key is a password and the other parameters are a salt value and an iteration count.

Two functions are specified below: PBKDF1 and PBKDF2. PBKDF2 is recommended for new applications; PBKDF1 is included only for compatibility with existing applications, and is not recommended for new applications.

B. PBKDF1

PBKDF1 applies a hash function, which shall be MD2, MD5 or SHA-1, to derive keys. The lengths of the derived keying bounded by the length of the hash function output, which is 16 octets from MD2 and MD5 and 20 octets from SHA-1.

Steps:

1. If dkLen > 16 for MD2 and MD5, or dkLen > 20 for SHA-1, output “derived key too long” and stop.

2. Apply the underlying hash function Hash for c iterations to the concatenation of the password P and

    the salt S, then extract the first dkLen octets to produce a derived key DK:

T_1 = Hash (P || S) ,

T_2 = Hash (T_1) ,

T_c = Hash (T_{c-1}) ,

DK = Tc<0..dkLen-1>

3. Output the derived key DK.

C. PBKDF2

PBKDF2 applies a pseudorandom function to derive keys. The length of the derived key is essentially unbounded. However, the maximum effective search space for the derived key may be limited by the structure of the underlying pseudorandom function.

Steps:

1. If dkLen > (2^32 – 1) * hLen, output “derived key too long” and stop.

2. Let l be the number of hLen-octet blocks in the derived key, rounding up, and let r be the number of octets

    in the last block:

l = CEIL (dkLen / hLen) ,

r = dkLen – (l – 1) * hLen .

Here, CEIL (x) is the “ceiling” function, i.e. the smallest integer greater than, or equal to, x.

3. For each block of the derived key apply the function F defined below to the password P, the salt S, the

    iteration count c, and the block index to compute the block:

T_1 = F (P, S, c, 1) ,

T_2 = F (P, S, c, 2) ,

T_l = F (P, S, c, l) ,

where the function F is defined as the exclusive-or sum of the first c iterates of the underlying pseudorandom  function PRF applied to the password P and the concatenation of the salt S and the block index i:

F (P, S, c, i) = U_1 \xor U_2 \xor … \xor U_c

where

U_1 = PRF (P, S || INT (i)) ,

U_2 = PRF (P, U_1) ,

U_c = PRF (P, U_{c-1}) .

Here, INT (i) is a four-octet encoding of the integer i, most significant octet first.

4. Concatenate the blocks and extract the first dkLen octets to produce a derived key DK:

DK = T_1 || T_2 ||  …  || T_l<0..r-1>

5. Output the derived key DK.

To learn more about PKCS #5, you can look through the specification, here:

http://tools.ietf.org/html/rfc2898

D. CyaSSL Support

CyaSSL supports both PBKDF1 and PBKDF2. The header file can be found in <cyassl_root>/cyassl/ctaocrypt/pwdbased.h and the source file can be found in <cyassl_root>/ctaocrypt/src/pwdbased.c of the CyaSSL library. When using these functions, they must be enabled when CyaSSL is configured. This is done by:

./configure –enable-pwdbased

The functions:

int PBKDF1(byte* output, const byte* passwd, int pLen,
           const byte* salt, int sLen, int iterations, int kLen,
           int hashType);
int PBKDF2(byte* output, const byte* passwd, int pLen,
           const byte* salt, int sLen, int iterations, int kLen,
           int hashType);

CyaSSL also supports PKCS12

int PKCS12_PBKDF(byte* output, const byte* passwd, int pLen,
                 const byte* salt, int sLen, int iterations,
                 int kLen, int hashType, int purpose);

To learn more about the CyaSSL embedded SSL library, you can download a free GPLv2-licensed copy from the wolfSSL download page, http://wolfssl.com/yaSSL/download/downloadForm.php, or look through the CyaSSL Manual, https://www.wolfssl.com/docs/wolfssl-manual/.  If you have any additional questions, please contact us at facts@wolfssl.com.

Posts navigation

1 2 3 127 128 129 130 131 132 133 195 196 197