Intel’s Extended Instructions Accelerates Hash Algorithms

Curious about how new machine instructions can accelerate crypto algorithms?  Most recently we added Intel’s Advanced Vector Extensions (AVX1 and 2) to wolfSSL’s secure hash algorithms.  Benchmarks show it improves the performance of SHA-256, 384 and 512 up to 75% (See: figure below). 

Intel`s AVX1/2 allows 128bit/256bit registers to perform multiple word operations with a single instruction in parallel.
The hashes take advantage of the AVX register parallelism and functional stitching between AVX and conventional registers as well.

How can you get it? Simply specify –enable-intelasm during ./configure with our latest version. It checks the instruction availability at run time, and you get the maximum performance improvement on your machine.

For further detail visit our “wolfSSL / wolfCrypt Benchmarks” page (http://wolfssl.com/yaSSL/benchmarks-cyassl.html).


AVX1:1.8GHz, Intel Core i5
AVX2: Intel Broadwell

AVX2:    SHA-256  50 megs took 0.320 seconds, 156.118 MB/s Cycles per byte =  9.75  = 47%
AVX1:   SHA-256  50 megs took 0.272 seconds, 184.068 MB/s Cycles per byte = 11.89  = 39%
Normal: SHA-256  50 megs took 0.376 seconds, 132.985 MB/s Cycles per byte = 16.46

AVX2:    SHA-384  50 megs took 0.226 seconds, 221.318 MB/s Cycles per byte =  6.88  = 42%
AVX1:   SHA-384  50 megs took 0.192 seconds, 260.975 MB/s Cycles per byte =  8.39  = 9%
Normal: SHA-384  50 megs took 0.209 seconds, 239.743 MB/s Cycles per byte =  9.13

AVX2:    SHA-512  50 megs took 0.224 seconds, 223.120 MB/s Cycles per byte =  6.82  = 75%
AVX1:   SHA-512  50 megs took 0.188 seconds, 266.126 MB/s Cycles per byte =  8.22  = 50%
Normal: SHA-512  50 megs took 0.281 seconds, 177.997 MB/s Cycles per byte = 12.29
===