During a recent industry expo, the wolfSSL team demonstrated the wolfCrypt benchmark and received frequent questions about memory usage, particularly for post-quantum algorithms.
We happened to be working on a feature which would provide exactly what was being asked for at the time, and I’m happy to report that it is wolfSSL’s GitHub repository now and will be included in the next wolfSSL release.
How it works
When building wolfSSL, memory and stack tracking can be enabled via ./confgure options –enable-memory –enable-trackmemory=verbose –enable-stacksize=verbose, or by adding the following macro definitions:
#define WOLFSSL_TRACK_MEMORY
#define WOLFSSL_TRACK_MEMORY_VERBOSE
#define HAVE_STACK_SIZE
#define HAVE_STACK_SIZE_VERBOSE
These options activate both heap allocation tracking and detailed stack usage reporting within the wolfCrypt benchmark.
This allows the benchmark to track the peak and total memory usage during the benchmark for each algorithm, as well as totals for the entire application at the end.
Unlike the regular performance benchmark recording, this mode also tracks memory during the setup phase of each algorithm so that you can see how much RAM is used for the entire initialization and execution of an algorithm.
Example
This example is with an STM32U585 compiled with hardware acceleration for RNG, AES and SHA-256, demonstrating how memory tracking behaves on a resource-constrained embedded system. It is using the small memory model for the post-quantum algorithms:
wolfCrypt Benchmark (block bytes 1024, min 1.0 sec each)
RNG 425 KiB took 1.039 seconds, 409.047 KiB/s [heap 494 bytes (8 allocs), stack 1448 bytes]
AES-128-CBC-enc 9 MiB took 1.000 seconds, 8.521 MiB/s [heap 312 bytes (1 allocs), stack 736 bytes]
AES-128-CBC-dec 8 MiB took 1.000 seconds, 8.472 MiB/s [heap 312 bytes (1 allocs), stack 736 bytes]
AES-256-CBC-enc 8 MiB took 1.000 seconds, 7.910 MiB/s [heap 312 bytes (1 allocs), stack 736 bytes]
AES-256-CBC-dec 8 MiB took 1.000 seconds, 7.861 MiB/s [heap 312 bytes (1 allocs), stack 736 bytes]
AES-128-GCM-enc 8 MiB took 1.000 seconds, 7.935 MiB/s [heap 344 bytes (3 allocs), stack 992 bytes]
AES-128-GCM-dec 8 MiB took 1.000 seconds, 7.886 MiB/s [heap 312 bytes (1 allocs), stack 984 bytes]
AES-256-GCM-enc 7 MiB took 1.000 seconds, 7.397 MiB/s [heap 344 bytes (3 allocs), stack 976 bytes]
AES-256-GCM-dec 7 MiB took 1.000 seconds, 7.349 MiB/s [heap 312 bytes (1 allocs), stack 984 bytes]
AES-128-GCM-enc-no_AAD 8 MiB took 1.000 seconds, 7.983 MiB/s [heap 344 bytes (3 allocs), stack 976 bytes]
AES-128-GCM-dec-no_AAD 8 MiB took 1.000 seconds, 7.935 MiB/s [heap 312 bytes (1 allocs), stack 944 bytes]
AES-256-GCM-enc-no_AAD 7 MiB took 1.000 seconds, 7.422 MiB/s [heap 344 bytes (3 allocs), stack 976 bytes]
AES-256-GCM-dec-no_AAD 7 MiB took 1.000 seconds, 7.397 MiB/s [heap 312 bytes (1 allocs), stack 944 bytes]
GMAC Small 14 MiB took 1.000 seconds, 14.154 MiB/s [heap 0 bytes (0allocs), stack 1536 bytes]
CHACHA 6 MiB took 1.000 seconds, 5.688 MiB/s [heap 68 bytes (1 allocs), stack 624 bytes]
CHA-POLY 4 MiB took 1.004 seconds, 3.623 MiB/s [heap 232 bytes (4 allocs), stack 672 bytes]
POLY1305 16 MiB took 1.000 seconds, 15.918 MiB/s [heap 40 bytes (1 allocs), stack 800 bytes]
SHA-256 14 MiB took 1.000 seconds, 14.429 MiB/s [heap 344 bytes (2 allocs), stack 624 bytes]
SHA-384 1 MiB took 1.012 seconds, 1.158 MiB/s [heap 400 bytes (3 allocs), stack 624 bytes]
SHA-512 1 MiB took 1.016 seconds, 1.153 MiB/s [heap 416 bytes (3 allocs), stack 658 bytes]
SHA-512/224 1 MiB took 1.012 seconds, 1.158 MiB/s [heap 380 bytes (3 allocs), stack 624 bytes]
SHA-512/256 1 MiB took 1.012 seconds, 1.158 MiB/s [heap 384 bytes (3 allocs), stack 624 bytes]
SHA3-224 1 MiB took 1.003 seconds, 1.290 MiB/s [heap 436 bytes (2 allocs), stack 656 bytes]
SHA3-256 1 MiB took 1.000 seconds, 1.221 MiB/s [heap 440 bytes (2 allocs), stack 656 bytes]
SHA3-384 975 KiB took 1.016 seconds, 959.646 KiB/s [heap 456 bytes (2 allocs), stack 656 bytes]
SHA3-512 675 KiB took 1.008 seconds, 669.643 KiB/s [heap 472 bytes (2 allocs), stack 656 bytes]
SHAKE128 2 MiB took 1.012 seconds, 1.496 MiB/s [heap 576 bytes (2 allocs), stack 672 bytes]
SHAKE256 1 MiB took 1.003 seconds, 1.217 MiB/s [heap 544 bytes (2 allocs), stack 656 bytes]
HMAC-SHA256 14 MiB took 1.000 seconds, 14.014 MiB/s [heap 768 bytes (1 allocs), stack 784 bytes]
HMAC-SHA384 1 MiB took 1.008 seconds, 1.138 MiB/s [heap 896 bytes (2 allocs), stack 840 bytes]
HMAC-SHA512 1 MiB took 1.008 seconds, 1.138 MiB/s [heap 896 bytes (2 allocs), stack 784 bytes]
RSA 2048 public 58 ops took 1.000 sec, avg 17.241 ms, 58.000 ops/sec [heap 6725 bytes (6 allocs), stack 1040 bytes]
RSA 2048 private 2 ops took 2.047 sec, avg 1023.500 ms, 0.977 ops/sec [heap 2860 bytes (4 allocs), stack 1096 bytes]
DH 2048 key gen 3 ops took 1.278 sec, avg 426.000 ms, 2.347 ops/sec [heap 7752 bytes (10 allocs), stack 1072 bytes]
DH 2048 agree 4 ops took 1.706 sec, avg 426.500 ms, 2.345 ops/sec [heap 10428 bytes (9 allocs), stack 1376 bytes]
ML-KEM 512 128 key gen 290 ops took 1.004 sec, avg 3.462 ms, 288.845 ops/sec[heap 1530 bytes (2 allocs), stack 1096 bytes]
ML-KEM 512 128 encap 278 ops took 1.004 sec, avg 3.612 ms, 276.892 ops/sec[heap 3578 bytes (2 allocs), stack 1088 bytes]
ML-KEM 512 128 decap 206 ops took 1.000 sec, avg 4.854 ms, 206.000 ops/sec[heap 4346 bytes (3 allocs), stack 1088 bytes]
ML-KEM 768 192 key gen 176 ops took 1.000 sec, avg 5.682 ms, 176.000 ops/sec[heap 2042 bytes (2 allocs), stack 1096 bytes]
ML-KEM 768 192 encap 164 ops took 1.008 sec, avg 6.146 ms, 162.698 ops/sec[heap 5114 bytes (2 allocs), stack 1792 bytes]
ML-KEM 768 192 decap 128 ops took 1.012 sec, avg 7.906 ms, 126.482 ops/sec[heap 6202 bytes (3 allocs), stack 1792 bytes]
ML-KEM 1024 256 key gen 108 ops took 1.004 sec, avg 9.296 ms, 107.570 ops/sec[heap 2554 bytes (2 allocs), stack 1096 bytes]
ML-KEM 1024 256 encap 102 ops took 1.008 sec, avg 9.882 ms, 101.190 ops/sec[heap 6650 bytes (2 allocs), stack 1792 bytes]
ML-KEM 1024 256 decap 84 ops took 1.019 sec, avg 12.131 ms, 82.434 ops/sec[heap 8218 bytes (3 allocs), stack 1792 bytes]
ECC [ SECP256R1] 256 key gen 12 ops took 1.008 sec, avg 84.000 ms, 11.905 ops/sec [heap 4628 bytes (6 allocs), stack 1080 bytes]
ECDHE [ SECP256R1] 256 agree 12 ops took 1.004 sec, avg 83.667 ms, 11.952 ops/sec [heap 9393 bytes (15 allocs), stack 1416 bytes]
ECDSA [ SECP256R1] 256 sign 58 ops took 1.023 sec, avg 17.638 ms, 56.696 ops/sec [heap 308 bytes (5 allocs), stack 1112 bytes]
ECDSA [ SECP256R1] 256 verify 54 ops took 1.000 sec, avg 18.519 ms, 54.000 ops/sec [heap 152 bytes (2 allocs), stack 1432 bytes]
CURVE 25519 key gen 3 ops took 1.086 sec, avg 362.000 ms, 2.762 ops/sec [heap 119 bytes (3 allocs), stack 1000 bytes]
CURVE 25519 agree 4 ops took 1.447 sec, avg 361.750 ms, 2.764 ops/sec [heap 119 bytes (3 allocs), stack 1768 bytes]
ED 25519 key gen 3 ops took 1.102 sec, avg 367.333 ms, 2.722 ops/sec [heap 128 bytes (3 allocs), stack 1136 bytes]
ED 25519 sign 4 ops took 1.494 sec, avg 373.500 ms, 2.677 ops/sec [heap 256 bytes (4 allocs), stack 1792 bytes]
ED 25519 verify 2 ops took 1.538 sec, avg 769.000 ms, 1.300 ops/sec [heap 128 bytes (1 allocs), stack 1792 bytes]
ML-DSA 44 key gen 66 ops took 1.000 sec, avg 15.152 ms, 66.000 ops/sec [heap 26531 bytes (6 allocs), stack 1072 bytes]
ML-DSA 44 sign 16 ops took 1.051 sec, avg 65.688 ms, 15.224 ops/sec [heap 15528 bytes (3 allocs), stack 1416 bytes]
ML-DSA 44 verify 62 ops took 1.027 sec, avg 16.565 ms, 60.370 ops/sec [heap 8104 bytes (1 allocs), stack 1416 bytes]
ML-DSA 65 key gen 38 ops took 1.016 sec, avg 26.737 ms, 37.402 ops/sec [heap 31651 bytes (6 allocs), stack 1040 bytes]
ML-DSA 65 sign 8 ops took 1.008 sec, avg 126.000 ms, 7.937 ops/sec [heap 20648 bytes (3 allocs), stack 1416 bytes]
ML-DSA 65 verify 38 ops took 1.039 sec, avg 27.342 ms, 36.574 ops/sec [heap 9128 bytes (1 allocs), stack 1416 bytes]
ML-DSA 87 key gen 24 ops took 1.079 sec, avg 44.958 ms, 22.243 ops/sec [heap 37795 bytes (6 allocs), stack 1072 bytes]
ML-DSA 87 sign 6 ops took 1.146 sec, avg 191.000 ms, 5.236 ops/sec [heap 26792 bytes (3 allocs), stack 1416 bytes]
ML-DSA 87 verify 22 ops took 1.024 sec, avg 46.545 ms, 21.484 ops/sec [heap 11432 bytes (1 allocs), stack 1416 bytes]
Benchmark complete
Better rendering
If you find this data a little complicated to read at a glance, there are two things that can help you. The first is that the wolfCrypt benchmark can be configured to output CSV data; this additional memory tracking will appear in the CSV results. Alternatively, we have a Python script which can reinterpret the benchmark data into a nice table format. Currently, the Python rendering script targets the STM32U585 demo, but it can be easily adapted for other platforms upon request.

Conclusion
With this new enhancement, wolfCrypt users can now quantify both performance and memory footprint under different compile-time configurations. This is especially valuable for post-quantum cryptography and embedded use cases where memory efficiency is critical.
For now this works with bare-metal and POSIX systems, but we intend to adapt it to work with other operating systems, including RTOSs, in the future.
If you have questions about any of the above, please contact us at facts@wolfssl.com or call us at +1 425 245 8247.
Download wolfSSL Now