Apple TV 2

(www.apple.com)

Apple A4 (ARM Cortex-A8)

1 GHz

8 GB FLASH

256 MB RAM

Crypto Benchmarks:

    AES         5 megs took 0.500 seconds,   9.99 MB/s

    ARC4       5 megs took 0.174 seconds,  28.66 MB/s

    RABBIT    5 megs took 0.126 seconds,  39.56 MB/s

    3DES       5 megs took 2.196 seconds,   2.28 MB/s


    MD5        5 megs took 0.163 seconds,  30.73 MB/s

    SHA         5 megs took 0.137 seconds,  36.61 MB/s

    SHA-256  5 megs took 0.309 seconds,  16.20 MB/s


    RSA 1024 encryption took   1.12 milliseconds, avg over 100 iterations

    RSA 1024 decryption took  17.81 milliseconds, avg over 100 iterations

    DH  1024 key generation   11.90 milliseconds, avg over 100 iterations

    DH  1024 key agreement    11.22 milliseconds, avg over 100 iterations



Build Details
    - Complete build, compiled with fastmath (--enable-fastmath)


Reference

    Blog Post: Running CyaSSL on the Apple TV 2

Documentation

Docs -> wolfSSL / wolfCrypt Benchmarks

wolfSSL / wolfCrypt Benchmarks

Documentation:

wolfSSL is dual licensed under both the GPLv2 and commercial licensing.  For more information, please see the following links.

Licensing and Ordering:

Follow us on Twitter and Facebook!

Stay up to date:

Description

The wolfSSL embedded SSL library (formerly CyaSSL) was written from the ground-up with portability, performance, and memory usage in mind.  Here you will find a collection of existing benchmark information for wolfSSL and the wolfCrypt cryptography library.  If you would like additional benchmark data or have any questions about your specific platform, please contact us at [email protected].

Copyright 2016 wolfSSL Inc.  All rights reserved.

Publications / Flyers

Publications in relation to benchmarking our SSL/TLS and crypto libraries:

wolfSSL+NTRU: High-Performance SSL


This flyer details the performance gains that can be seen when using the wolfSSL embedded SSL library with Security Innovation’s NTRU cipher.  NTRU is similar to the RSA public key algorithm but can offer anywhere from a 20-200X speed improvement.


Download Flyer

wolfSSL Secure memcached Benchmarks


Because wolfSSL can offer fast encryption and low memory usage it can easily be leveraged onto high-volume servers supporting many thousands of connections. This flyer demonstrates memcached benchmarks using wolfSSL.


Download Flyer

wolfCrypt Benchmark Application

Many users are curious about how the wolfSSL embedded SSL library will perform on a specific hardware device or in a specific environment.  Because of the wide variety of different platforms and compilers used today in embedded, enterprise, and cloud-based environments, it is hard to give generic performance calculations across the board.


To help wolfSSL users and customers in determining SSL/TLS performance for wolfSSL / wolfCrypt, a benchmark application is bundled with wolfSSL.  Because the underlying cryptography is a very performance-critical aspect of SSL/TLS, our benchmark application runs performance tests on wolfCrypt’s algorithms.


The benchmark utility is located in the ./wolfcrypt/benchmark directory of the wolfSSL download.  After building wolfSSL and the associated examples and apps, the benchmark application can be run by issuing the following command from the package directory root:


./wolfcrypt/benchmark/benchmark


Typical output may look similar to the output below (showing throughput in MB/s as well as cycles per byte):


AES      50 megs took 0.262 seconds,  190.808 MB/s Cycles per byte =  11.47

AES-GCM  50 megs took 0.797 seconds,   62.773 MB/s Cycles per byte =  34.86

HC128    50 megs took 0.029 seconds, 1695.093 MB/s Cycles per byte =   1.29

RABBIT   50 megs took 0.103 seconds,  486.585 MB/s Cycles per byte =   4.50

CHACHA   50 megs took 0.133 seconds,  375.004 MB/s Cycles per byte =   5.84

CHA-POLY 50 megs took 0.170 seconds,  293.567 MB/s Cycles per byte =   7.45

3DES     50 megs took 1.782 seconds,   28.051 MB/s Cycles per byte =  78.01


MD5      50 megs took 0.108 seconds,  461.106 MB/s Cycles per byte =   4.75

POLY1305 50 megs took 0.038 seconds, 1325.843 MB/s Cycles per byte =   1.65

SHA      50 megs took 0.101 seconds,  497.007 MB/s Cycles per byte =   4.40

SHA-256  50 megs took 0.235 seconds,  212.993 MB/s Cycles per byte =  10.27

SHA-384  50 megs took 0.153 seconds,  327.161 MB/s Cycles per byte =   6.69

SHA-512  50 megs took 0.203 seconds,  245.780 MB/s Cycles per byte =   8.90

BLAKE2b  50 megs took 0.088 seconds,  570.073 MB/s Cycles per byte =   3.84


RSA 2048 encryption took  0.077 milliseconds, avg over 100 iterations

RSA 2048 decryption took  1.980 milliseconds, avg over 100 iterations

DH  2048 key generation   0.799 milliseconds, avg over 100 iterations

DH  2048 key agreement    0.737 milliseconds, avg over 100 iterations


ECC  256 key generation   0.424 milliseconds, avg over 100 iterations

EC-DHE   key agreement    0.412 milliseconds, avg over 100 iterations

EC-DSA   sign   time      0.442 milliseconds, avg over 100 iterations

EC-DSA   verify time      0.598 milliseconds, avg over 100 iterations


This application is especially useful for comparing the public key speed before and after changing the math library. You can test the results using the normal math library (./configure), the fastmath library (./configure --enable-fastmath), and the fasthugemath library (./configure --enable-fasthugemath).

Memory Usage

Footprint sizes (compiled binary size) for wolfSSL range between 20-100kB depending on build options and the compiler being used.  Typically on an embedded system with an embedded and optimized compiler, build sizes will be around 60kB.  This will include a full-featured TLS 1.2 client and server.  For details on build options and ways to further customize wolfSSL, please see Chapter 2 of the CyaSSL Manual, or the wolfSSL Tuning Guide.


Regarding runtime memory usage, wolfSSL will generally consume between 1-36 kB per SSL/TLS session.  The RAM usage per connection will vary depending the size of the input/output buffers being used, public key algorithm, and key size.  The I/O buffers in wolfSSL default to 128 bytes and are controlled by the RECORD_SIZE define in ./wolfssl/internal.h.  The maximum size is 16 kB per buffer (as specified by the SSL/TLS RFC).  As an example, with standard 16kB buffers, the total runtime memory usage of wolfSSL with a single connection would be 3kB (the library) + 16kB (input buffer) + 16kB (output buffer) = around 35kB.


The TLS context (WOLFSSL_CTX) is shared between all TLS connections of either a client or server.  The runtime memory usage can vary depending on how many certificates are being loaded and what size the certificate files are.  It will also vary depending on the session cache and whether or not storing session certificates is turned on (--enable-session-certs).  If you are concerned with reducing the session cache size, you can define SMALL_SESSION_CACHE (reduce the default session cache from 33 session to 6 sessions) and save almost 2.5 kB.  You can disable the session cache by defining NO_SESSION_CACHE, reducing memory by nearly 3 kB.

Reference Benchmarks

As we port wolfSSL to various platforms, we oftentimes conduct benchmarks on these platforms.  Below you will find a collection of some of those benchmarks for reference.  If you have benchmarked wolfSSL on a specific platform, please send us your benchmark numbers (with specific platform and library configuration) and we’ll add them to the list!

BENCHMARK:



Memory Usage:

    RAM Usage:  2.0 kB

    Flash Usage*: 64 kB

    * This included our test driver code, about 3kB.


Crypto Benchmarks:

    public RSA:  10 milliseconds

    private RSA: 165 milliseconds


Build Details
    - Complete build, everything but SHA-512, DH, DSA, and HC-128

    - Compiled using mbed cloud compiler


Reference

    http://mbed.org/users/toddouska/libraries/CyaSSL/lm43pv

    http://mbed.org/users/toddouska/programs/cyassl-client/lm394s

PLATFORM:





(www.mbed.org)

ARM Cortex-M3

96 MHz

512 kB FLASH

32 kB RAM

Relative Cipher Performance

Although the performance of individual ciphers and algorithms will depend on the host platform, the following graph shows relative performance between some of wolfCrypt’s algorithms.  These tests were conducted on a Macbook Pro (OS X 10.6.8) running a 2.2 GHz Intel Core i7.

If you want to use only a subset of ciphers, you can customize which specific cipher suites and/or ciphers wolfSSL uses when making an SSL/TLS connection.  For example, to force 128-bit AES, add the following line after the call to wolfSSL_CTX_new (SSL_CTX_new):


wolfSSL_CTX_set_cipher_list(ctx, “AES128-SHA”);

Benchmarking Notes

  1. 1.The processors native register size (32 vs 64-bit) can make a big difference when doing 1000+ bit public key operations.


  1. 2.fastmath (--enable-fastmath) reduces dynamic memory usage and speeds up public key operations.  If you are having trouble building on a 32-bit platform with fastmath, disable shared libraries so that PIC isn’t hogging a register (also see notes in the README):


  1. ./configure --enable-fastmath --disable-shared

  2. make clean

  3. make


  1. *NOTE: doing a “make clean” is good practice with wolfSSL when switching configure options


  1. 3.By default, fastmath tries to use assembly optimizations if possible.  If assembly optimizations don’t work, you can still use fastmath without them by adding TFM_NO_ASM to CFLAGS when building wolfSSL:


  1. ./configure --enable-fastmath CFLAGS=-DTFM_NO_ASM


  1. 4.Using fasthugemath can try to push fastmath even more for users who are not running on embedded platforms:


  1. ./configure --enable-fasthugemath


  1. 5.With the default wolfSSL build, we have tried to find a good balance between memory usage and performance.  If you are more concerned about one of the two, please see Chapter 2 of the wolfSSL manual for additional wolfSSL configuration options.


  1. 6.Bulk Transfers:  wolfSSL by default uses 128 byte I/O buffers since about 80% of SSL traffic falls within this size and to limit dynamic memory use.  It can be configured to use 16K buffers (the maximum SSL size) if bulk transfers are required.

Freescale TWR-K70F120M

(www.freescale.com)

Freescale Kinetis K70

120 MHz

2 GB FLASH

1 GB RAM

Crypto Benchmarks:

    AES        5120 kB took 9.059 seconds,   0.55 MB/s                               

    ARC4      5120 kB took 2.190 seconds,   2.28 MB/s                               

    DES        5120 kB took 18.453 seconds,   0.27 MB/s                               

                                                                               

    MD5         5120 kB took 1.396 seconds,   3.58 MB/s                               

    SHA         5120 kB took 3.635 seconds,   1.38 MB/s                               

    SHA-256  5120 kB took 9.145 seconds,   0.55 MB/s                               

                                                                               

    RSA 2048 encryption took  73.99 milliseconds, avg over 100 iterations          

    RSA 2048 decryption took 1359.09 milliseconds, avg over 100 iterations         

    DH  2048 key generation  536.75 milliseconds, avg over 100 iterations          

    DH  2048 key agreement   540.99 milliseconds, avg over 100 iterations



Build Details
    - MQX RTOS, using the fastmath library with TFM_TIMING_RESISTANT

    - FREESCALE_MQX define set in <cyassl_root>/cyassl/ctaocrypt/settings.h

    - CodeWarrior 10.2 IDE and compiler, optimizing for speed


Reference

    Freescale TWR-K70F120M: http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=TWR-K70F120M

embedded ssl

Questions? +1 (425) 245-8247

Texas Instruments

Tiva C Series TM4C1294XL Connected Launchpad

(www.ti.com)

ARM Cortex-M4

120 MHz

1 MB FLASH

256 KB SRAM

6 KB EEPROM

Crypto Benchmarks:

    AES      25 kB took 0.038 seconds,   0.642 MB/s

    Camellia 25 kB took 0.032 seconds,   0.763 MB/s

    ARC4     25 kB took 0.006 seconds,   4.069 MB/s

    RABBIT   25 kB took 0.005 seconds,   4.883 MB/s

    CHACHA   25 kB took 0.007 seconds,   3.488 MB/s

    3DES     25 kB took 0.164 seconds,   0.149 MB/s


    MD5      25 kB took 0.003 seconds,   8.138 MB/s

    POLY1305 25 kB took 0.004 seconds,   6.104 MB/s

    SHA      25 kB took 0.006 seconds,   4.069 MB/s

    SHA-256  25 kB took 0.014 seconds,   1.744 MB/s

    SHA-512  25 kB took 0.042 seconds,   0.581 MB/s


    RSA 2048 encryption took 88.000 milliseconds, avg over 1 iterations

    RSA 2048 decryption took 1456.000 milliseconds, avg over 1 iterations

    DH  2048 key generation  661.000 milliseconds, avg over 1 iterations

    DH  2048 key agreement   665.000 milliseconds, avg over 1 iterations


    ECC  256 key generation  130.400 milliseconds, avg over 5 iterations

    EC-DHE   key agreement   118.000 milliseconds, avg over 5 iterations

    EC-DSA   sign   time     136.800 milliseconds, avg over 5 iterations

    EC-DSA   verify time     253.800 milliseconds, avg over 5 iterations


Reference

    CyaSSL and TI-RTOS

Crypto Benchmarks:

    AVX2:    SHA-256  50 megs took 0.320 seconds, 156.118 MB/s

                     Cycles per byte =  9.75  = 47%

    AVX1:   SHA-256  50 megs took 0.272 seconds, 184.068 MB/s

                    Cycles per byte = 11.89  = 39%

    Normal: SHA-256  50 megs took 0.376 seconds, 132.985 MB/s

                    Cycles per byte = 16.46


    AVX2:    SHA-384  50 megs took 0.226 seconds, 221.318 MB/s

                    Cycles per byte =  6.88  = 42%

    AVX1:    SHA-384  50 megs took 0.192 seconds, 260.975 MB/s

                    Cycles per byte =  8.39  = 9%

    Normal: SHA-384  50 megs took 0.209 seconds, 239.743 MB/s

                    Cycles per byte =  9.13


    AVX2:    SHA-512  50 megs took 0.224 seconds, 223.120 MB/s

                    Cycles per byte =  6.82  = 75%

    AVX1:    SHA-512  50 megs took 0.188 seconds, 266.126 MB/s

                    Cycles per byte =  8.22  = 50%

    Normal: SHA-512  50 megs took 0.281 seconds, 177.997 MB/s

                    Cycles per byte = 12.29


Reference

    wolfSSL Blog Post

AVX1:1.8GHz, Intel Core i5

AVX2: Intel Broadwell

Crypto Benchmarks:

Software Crypto: wolfCrypt Benchmark, Normal Big Integer Math Library


AES        1024 kB took 0.822 seconds,   1.22 MB/s

ARC4      1024 KB took 0.219 seconds,   4.57 MB/s

DES        1024 KB took 1.513 seconds,   0.66 MB/s

3DES      1024 KB took 3.986 seconds,   0.25 MB/s


MD5          1024 KB took 0.119 seconds,   8.40 MB/s

SHA          1024 KB took 0.279 seconds,   3.58 MB/s

SHA-256    1024 KB took 0.690 seconds,   1.45 MB/s


RSA 2048 encryption took 111.17 milliseconds, avg over 100 iterations

RSA 2048 decryption took 1204.77 milliseconds, avg over 100 iterations

DH  2048 key generation   467.90 milliseconds, avg over 100 iterations

DH  2048 key agreement   538.94 milliseconds, avg over 100 iterations



STM32F2 Hardware Crypto: wolfCrypt Benchmark, Normal Big Integer Math Library


AES        1024 kB took 0.105 seconds,   9.52 MB/s

ARC4      1024 KB took 0.219 seconds,   4.57 MB/s

DES        1024 KB took 0.125 seconds,   8.00 MB/s

3DES      1024 KB took 0.141 seconds,   7.09 MB/s


MD5           1024 KB took 0.045 seconds,  22.22 MB/s

SHA           1024 KB took 0.047 seconds,  21.28 MB/s

SHA-256    1024 KB took 0.690 seconds,   1.45 MB/s


RSA 2048 encryption took 111.09 milliseconds, avg over 100 iterations

RSA 2048 decryption took 1204.88 milliseconds, avg over 100 iterations

DH  2048 key generation  467.56 milliseconds, avg over 100 iterations

DH  2048 key agreement   542.11 milliseconds, avg over 100 iterations


Reference

    wolfSSL and STM32

STM32F221G-EVAL

ARM Cortex M3

120MHz

1 MB FLASH

128 KB SRAM