@Jacob I am able to compile and benchmark with AES and I see great improvements , however I don't see all of the intel optimization applied specifically AVX/AVX2
can you please share how to enable AVX/AVX2 for SHA256 and ECC SIGN optimizations
it seems like we --enable-sp-asm --enable-sp --enable-sp-asm are required to see performance gain in non sgx mode which is not working in PIE as per your comment
#SP assembly needs investigated for use with PIE
ifeq ($(HAVE_WOLFSSL_SP), 1)
Wolfssl_C_Extra_Flags += -DWOLFSSL_SP_X86_64_ASM\
-DWOLFSSL_SP_X86_64\
-DWOLFSSL_SP_ASM
endif
benchmarks without intel optimizations
------------------------------------------------------------------------------
wolfSSL version 5.7.6
------------------------------------------------------------------------------
Math: Multi-Precision: Wolf(SP) word-size=64 bits=4096 sp_int.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG 90 MiB took 1.017 seconds, 88.508 MiB/s Cycles per byte = 26.94
AES-128-CBC-enc 250 MiB took 1.009 seconds, 247.804 MiB/s Cycles per byte = 9.62
AES-128-CBC-dec 265 MiB took 1.015 seconds, 261.081 MiB/s Cycles per byte = 9.13
AES-192-CBC-enc 215 MiB took 1.004 seconds, 214.131 MiB/s Cycles per byte = 11.13
AES-192-CBC-dec 225 MiB took 1.011 seconds, 222.462 MiB/s Cycles per byte = 10.72
AES-256-CBC-enc 190 MiB took 1.017 seconds, 186.830 MiB/s Cycles per byte = 12.76
AES-256-CBC-dec 145 MiB took 1.015 seconds, 142.812 MiB/s Cycles per byte = 16.69
AES-128-GCM-enc 125 MiB took 1.038 seconds, 120.413 MiB/s Cycles per byte = 19.80
AES-128-GCM-dec 130 MiB took 1.026 seconds, 126.718 MiB/s Cycles per byte = 18.81
AES-192-GCM-enc 120 MiB took 1.035 seconds, 115.896 MiB/s Cycles per byte = 20.57
AES-192-GCM-dec 120 MiB took 1.021 seconds, 117.481 MiB/s Cycles per byte = 20.29
AES-256-GCM-enc 105 MiB took 1.062 seconds, 98.914 MiB/s Cycles per byte = 24.10
AES-256-GCM-dec 110 MiB took 1.041 seconds, 105.655 MiB/s Cycles per byte = 22.57
GMAC Table 4-bit 266 MiB took 1.003 seconds, 265.202 MiB/s Cycles per byte = 8.99
CHACHA 400 MiB took 1.002 seconds, 399.055 MiB/s Cycles per byte = 5.97
CHA-POLY 310 MiB took 1.012 seconds, 306.208 MiB/s Cycles per byte = 7.79
MD5 555 MiB took 1.002 seconds, 553.812 MiB/s Cycles per byte = 4.31
POLY1305 1320 MiB took 1.001 seconds, 1318.270 MiB/s Cycles per byte = 1.81
SHA 455 MiB took 1.008 seconds, 451.191 MiB/s Cycles per byte = 5.28
SHA-224 200 MiB took 1.005 seconds, 199.090 MiB/s Cycles per byte = 11.98
SHA-256 200 MiB took 1.009 seconds, 198.129 MiB/s Cycles per byte = 12.03
SHA-384 295 MiB took 1.010 seconds, 292.141 MiB/s Cycles per byte = 8.16
SHA-512 290 MiB took 1.014 seconds, 286.068 MiB/s Cycles per byte = 8.33
SHA-512/224 290 MiB took 1.005 seconds, 288.523 MiB/s Cycles per byte = 8.26
SHA-512/256 295 MiB took 1.013 seconds, 291.224 MiB/s Cycles per byte = 8.19
SHA3-224 270 MiB took 1.015 seconds, 266.066 MiB/s Cycles per byte = 8.96
SHA3-256 250 MiB took 1.029 seconds, 242.942 MiB/s Cycles per byte = 9.81
SHA3-384 170 MiB took 1.023 seconds, 166.142 MiB/s Cycles per byte = 14.35
SHA3-512 135 MiB took 1.018 seconds, 132.631 MiB/s Cycles per byte = 17.98
HMAC-MD5 555 MiB took 1.000 seconds, 554.846 MiB/s Cycles per byte = 4.30
HMAC-SHA 455 MiB took 1.005 seconds, 452.664 MiB/s Cycles per byte = 5.27
HMAC-SHA224 200 MiB took 1.003 seconds, 199.370 MiB/s Cycles per byte = 11.96
HMAC-SHA256 200 MiB took 1.010 seconds, 197.956 MiB/s Cycles per byte = 12.04
HMAC-SHA384 295 MiB took 1.007 seconds, 292.876 MiB/s Cycles per byte = 8.14
HMAC-SHA512 295 MiB took 1.003 seconds, 293.995 MiB/s Cycles per byte = 8.11
PBKDF2 24 KiB took 1.001 seconds, 24.265 KiB/s Cycles per byte = 100614.98
RSA 2048 public 21700 ops took 1.005 sec, avg 0.046 ms, 21595.531 ops/sec
RSA 2048 private 400 ops took 1.156 sec, avg 2.891 ms, 345.949 ops/sec
DH 2048 key gen 1808 ops took 1.000 sec, avg 0.553 ms, 1807.474 ops/sec
DH 2048 agree 900 ops took 1.048 sec, avg 1.164 ms, 859.112 ops/sec
ECC [ SECP256R1] 256 key gen 1900 ops took 1.031 sec, avg 0.543 ms, 1842.949 ops/sec
ECDHE [ SECP256R1] 256 agree 1900 ops took 1.026 sec, avg 0.540 ms, 1851.708 ops/sec
ECDSA [ SECP256R1] 256 sign 1900 ops took 1.054 sec, avg 0.555 ms, 1802.688 ops/sec
ECDSA [ SECP256R1] 256 verify 2700 ops took 1.031 sec, avg 0.382 ms, 2618.782 ops/sec
benchmarks with intel optimizations
------------------------------------------------------------------------------
wolfSSL version 5.7.6
------------------------------------------------------------------------------
Math: Multi-Precision: Disabled
Single Precision: ecc 256 384 521 rsa/dh 2048 3072 4096 asm sp_x86_64.c
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG 145 MiB took 1.018 seconds, 142.467 MiB/s Cycles per byte = 16.73
AES-128-CBC-enc 1085 MiB took 1.004 seconds, 1080.199 MiB/s Cycles per byte = 2.21
AES-128-CBC-dec 4680 MiB took 1.000 seconds, 4678.302 MiB/s Cycles per byte = 0.51
AES-192-CBC-enc 920 MiB took 1.002 seconds, 918.098 MiB/s Cycles per byte = 2.60
AES-192-CBC-dec 3900 MiB took 1.001 seconds, 3897.837 MiB/s Cycles per byte = 0.61
AES-256-CBC-enc 690 MiB took 1.000 seconds, 689.769 MiB/s Cycles per byte = 3.46
AES-256-CBC-dec 3335 MiB took 1.000 seconds, 3333.659 MiB/s Cycles per byte = 0.72
AES-128-GCM-enc 4445 MiB took 1.001 seconds, 4442.137 MiB/s Cycles per byte = 0.54
AES-128-GCM-dec 4365 MiB took 1.000 seconds, 4363.181 MiB/s Cycles per byte = 0.55
AES-192-GCM-enc 3720 MiB took 1.001 seconds, 3715.518 MiB/s Cycles per byte = 0.64
AES-192-GCM-dec 3790 MiB took 1.001 seconds, 3785.582 MiB/s Cycles per byte = 0.63
AES-256-GCM-enc 3275 MiB took 1.001 seconds, 3273.023 MiB/s Cycles per byte = 0.73
AES-256-GCM-dec 3270 MiB took 1.000 seconds, 3269.829 MiB/s Cycles per byte = 0.73
GMAC Table 4-bit 1357 MiB took 1.001 seconds, 1356.186 MiB/s Cycles per byte = 1.76
CHACHA 2300 MiB took 1.001 seconds, 2297.314 MiB/s Cycles per byte = 1.04
CHA-POLY 1465 MiB took 1.001 seconds, 1463.308 MiB/s Cycles per byte = 1.63
MD5 555 MiB took 1.001 seconds, 554.676 MiB/s Cycles per byte = 4.30
POLY1305 4020 MiB took 1.001 seconds, 4017.260 MiB/s Cycles per byte = 0.59
SHA 445 MiB took 1.004 seconds, 443.391 MiB/s Cycles per byte = 5.38
SHA-224 370 MiB took 1.007 seconds, 367.583 MiB/s Cycles per byte = 6.49
SHA-256 370 MiB took 1.013 seconds, 365.324 MiB/s Cycles per byte = 6.53
SHA-384 555 MiB took 1.008 seconds, 550.396 MiB/s Cycles per byte = 4.33
SHA-512 545 MiB took 1.003 seconds, 543.423 MiB/s Cycles per byte = 4.39
SHA-512/224 555 MiB took 1.007 seconds, 551.325 MiB/s Cycles per byte = 4.32
SHA-512/256 555 MiB took 1.005 seconds, 552.241 MiB/s Cycles per byte = 4.32
SHA3-224 365 MiB took 1.010 seconds, 361.357 MiB/s Cycles per byte = 6.60
SHA3-256 345 MiB took 1.006 seconds, 342.997 MiB/s Cycles per byte = 6.95
SHA3-384 265 MiB took 1.005 seconds, 263.624 MiB/s Cycles per byte = 9.04
SHA3-512 185 MiB took 1.021 seconds, 181.240 MiB/s Cycles per byte = 13.15
HMAC-MD5 555 MiB took 1.000 seconds, 554.745 MiB/s Cycles per byte = 4.30
HMAC-SHA 455 MiB took 1.013 seconds, 449.068 MiB/s Cycles per byte = 5.31
HMAC-SHA224 365 MiB took 1.001 seconds, 364.591 MiB/s Cycles per byte = 6.54
HMAC-SHA256 370 MiB took 1.006 seconds, 367.959 MiB/s Cycles per byte = 6.48
HMAC-SHA384 560 MiB took 1.002 seconds, 558.961 MiB/s Cycles per byte = 4.27
HMAC-SHA512 560 MiB took 1.011 seconds, 553.892 MiB/s Cycles per byte = 4.30
PBKDF2 39 KiB took 1.001 seconds, 39.166 KiB/s Cycles per byte = 62334.62
RSA 2048 public 49500 ops took 1.001 sec, avg 0.020 ms, 49429.632 ops/sec
RSA 2048 private 1600 ops took 1.034 sec, avg 0.646 ms, 1547.866 ops/sec
DH 2048 key gen 3124 ops took 1.000 sec, avg 0.320 ms, 3123.815 ops/sec
DH 2048 agree 3100 ops took 1.012 sec, avg 0.326 ms, 3064.695 ops/sec
ECC [ SECP256R1] 256 key gen 65100 ops took 1.001 sec, avg 0.015 ms, 65039.664 ops/sec
ECDHE [ SECP256R1] 256 agree 17200 ops took 1.005 sec, avg 0.058 ms, 17107.267 ops/sec
ECDSA [ SECP256R1] 256 sign 42500 ops took 1.001 sec, avg 0.024 ms, 42477.882 ops/sec
ECDSA [ SECP256R1] 256 verify 16000 ops took 1.005 sec, avg 0.063 ms, 15922.289 ops/sec
benchmarks with sgx and without HAVE_WOLFSSL_ASSEMBLY=1
Benchmark Test:
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG_HEALTH_TEST_CHECK_SIZE = 128
sizeof(seedB_data) = 128
RNG_HEALTH_TEST_CHECK_SIZE = 128
sizeof(seedB_data) = 128
RNG 80 MiB took 1.015 seconds, 78.788 MiB/s
AES-128-CBC-enc 250 MiB took 1.005 seconds, 248.653 MiB/s
AES-128-CBC-dec 260 MiB took 1.006 seconds, 258.542 MiB/s
AES-192-CBC-enc 185 MiB took 1.036 seconds, 178.652 MiB/s
AES-192-CBC-dec 180 MiB took 1.006 seconds, 178.903 MiB/s
AES-256-CBC-enc 190 MiB took 1.008 seconds, 188.399 MiB/s
AES-256-CBC-dec 195 MiB took 1.018 seconds, 191.475 MiB/s
AES-128-GCM-enc 55 MiB took 1.035 seconds, 53.141 MiB/s
AES-128-GCM-dec 55 MiB took 1.035 seconds, 53.119 MiB/s
AES-192-GCM-enc 55 MiB took 1.063 seconds, 51.730 MiB/s
AES-192-GCM-dec 55 MiB took 1.063 seconds, 51.716 MiB/s
AES-256-GCM-enc 55 MiB took 1.099 seconds, 50.053 MiB/s
AES-256-GCM-dec 50 MiB took 1.005 seconds, 49.752 MiB/s
AES-128-GCM-enc-no_AAD 55 MiB took 1.039 seconds, 52.954 MiB/s
AES-128-GCM-dec-no_AAD 55 MiB took 1.027 seconds, 53.528 MiB/s
AES-192-GCM-enc-no_AAD 55 MiB took 1.062 seconds, 51.776 MiB/s
AES-192-GCM-dec-no_AAD 55 MiB took 1.062 seconds, 51.789 MiB/s
AES-256-GCM-enc-no_AAD 50 MiB took 1.021 seconds, 48.959 MiB/s
AES-256-GCM-dec-no_AAD 50 MiB took 1.007 seconds, 49.642 MiB/s
GMAC Default 70 MiB took 1.008 seconds, 69.425 MiB/s
3DES 30 MiB took 1.117 seconds, 26.865 MiB/s
MD5 555 MiB took 1.004 seconds, 552.952 MiB/s
SHA 465 MiB took 1.011 seconds, 459.931 MiB/s
SHA-256 200 MiB took 1.003 seconds, 199.344 MiB/s
HMAC-MD5 555 MiB took 1.000 seconds, 554.892 MiB/s
HMAC-SHA 460 MiB took 1.008 seconds, 456.516 MiB/s
HMAC-SHA256 205 MiB took 1.016 seconds, 201.676 MiB/s
PBKDF2 24 KiB took 1.000 seconds, 24.363 KiB/s
RSA 2048 public 6300 ops took 1.004 sec, avg 0.159 ms, 6274.439 ops/sec
RSA 2048 private 200 ops took 1.741 sec, avg 8.703 ms, 114.901 ops/sec
DH 2048 key gen 276 ops took 1.002 sec, avg 3.630 ms, 275.450 ops/sec
DH 2048 agree 300 ops took 1.083 sec, avg 3.611 ms, 276.969 ops/sec
ECC [ SECP256R1] 256 key gen 8300 ops took 1.008 sec, avg 0.121 ms, 8231.554 ops/sec
ECDHE [ SECP256R1] 256 agree 3300 ops took 1.020 sec, avg 0.309 ms, 3236.528 ops/sec
ECDSA [ SECP256R1] 256 sign 5800 ops took 1.002 sec, avg 0.173 ms, 5790.810 ops/sec
ECDSA [ SECP256R1] 256 verify 3100 ops took 1.029 sec, avg 0.332 ms, 3011.562 ops/sec
benchmarks with sgx and with HAVE_WOLFSSL_ASSEMBLY=1
Benchmark Test:
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG_HEALTH_TEST_CHECK_SIZE = 128
sizeof(seedB_data) = 128
RNG_HEALTH_TEST_CHECK_SIZE = 128
sizeof(seedB_data) = 128
RNG 90 MiB took 1.055 seconds, 85.307 MiB/s
AES-128-CBC-enc 1085 MiB took 1.005 seconds, 1080.136 MiB/s
AES-128-CBC-dec 4675 MiB took 1.000 seconds, 4672.958 MiB/s
AES-192-CBC-enc 925 MiB took 1.005 seconds, 920.766 MiB/s
AES-192-CBC-dec 3910 MiB took 1.001 seconds, 3906.660 MiB/s
AES-256-CBC-enc 800 MiB took 1.006 seconds, 795.614 MiB/s
AES-256-CBC-dec 3350 MiB took 1.000 seconds, 3348.383 MiB/s
AES-128-GCM-enc 3570 MiB took 1.000 seconds, 3568.704 MiB/s
AES-128-GCM-dec 3645 MiB took 1.001 seconds, 3641.420 MiB/s
AES-192-GCM-enc 3320 MiB took 1.001 seconds, 3316.531 MiB/s
AES-192-GCM-dec 2520 MiB took 1.002 seconds, 2516.130 MiB/s
AES-256-GCM-enc 2610 MiB took 1.001 seconds, 2607.546 MiB/s
AES-256-GCM-dec 2935 MiB took 1.001 seconds, 2930.619 MiB/s
AES-128-GCM-enc-no_AAD 3545 MiB took 1.001 seconds, 3540.200 MiB/s
AES-128-GCM-dec-no_AAD 3660 MiB took 1.000 seconds, 3659.309 MiB/s
AES-192-GCM-enc-no_AAD 3285 MiB took 1.001 seconds, 3280.230 MiB/s
AES-192-GCM-dec-no_AAD 3300 MiB took 1.000 seconds, 3299.703 MiB/s
AES-256-GCM-enc-no_AAD 2920 MiB took 1.001 seconds, 2917.791 MiB/s
AES-256-GCM-dec-no_AAD 2895 MiB took 1.001 seconds, 2893.033 MiB/s
GMAC Default 1384 MiB took 1.000 seconds, 1383.696 MiB/s
3DES 30 MiB took 1.115 seconds, 26.913 MiB/s
MD5 560 MiB took 1.009 seconds, 555.151 MiB/s
SHA 460 MiB took 1.010 seconds, 455.671 MiB/s
SHA-256 205 MiB took 1.021 seconds, 200.842 MiB/s
HMAC-MD5 555 MiB took 1.002 seconds, 554.146 MiB/s
HMAC-SHA 450 MiB took 1.001 seconds, 449.578 MiB/s
HMAC-SHA256 180 MiB took 1.077 seconds, 167.062 MiB/s
PBKDF2 23 KiB took 1.000 seconds, 23.186 KiB/s
RSA 2048 public 6000 ops took 1.007 sec, avg 0.168 ms, 5960.138 ops/sec
RSA 2048 private 200 ops took 1.819 sec, avg 9.097 ms, 109.927 ops/sec
DH 2048 key gen 261 ops took 1.001 sec, avg 3.836 ms, 260.721 ops/sec
DH 2048 agree 300 ops took 1.154 sec, avg 3.847 ms, 259.965 ops/sec
ECC [ SECP256R1] 256 key gen 8200 ops took 1.012 sec, avg 0.123 ms, 8106.580 ops/sec
ECDHE [ SECP256R1] 256 agree 3200 ops took 1.008 sec, avg 0.315 ms, 3174.178 ops/sec
ECDSA [ SECP256R1] 256 sign 5800 ops took 1.004 sec, avg 0.173 ms, 5774.885 ops/sec
ECDSA [ SECP256R1] 256 verify 2800 ops took 1.002 sec, avg 0.358 ms, 2794.868 ops/sec
Thank you