1 (edited by yechenz37 2024-12-05 06:34:16)

Topic: Best practice to avoid dynamic memory allocations.

Hi there! I'm a new user exploring whether wolfSSL can provide better performance for my use case compared to OpenSSL.

I'm aiming to use it in a low-latency service(as TLS client), so I need to avoid any calls to malloc or free during encryption and decryption, as they would impact latency. This does not means malloc or free are forbidden, but I want to avoid them as possible as I can.

To get started with the library, I wrote a simple echo client using BIO model(which I used in OpenSSL), I just replace all openssl symbols with wolfssl's(like wolfSSL_BIO_write vs BIO_write, etc), and it works well.

However, when I tried setting a custom malloc function to print each time it was called, I discovered that every time I perform a write or feed data to the rbio, it triggers a malloc.

My question is: Is there a way to avoid these malloc/free calls? I have a few ideas but am unsure which approach is best or supported by the library:

1. Is there a configuration or setting that would prevent the library from freeing an already allocated buffer and instead reuse it later? The buffer could grow to meet the need(I don't care how many memory it cost if it's reasonable), and if it’s large enough, it should avoid triggering any further malloc calls (or at least very few).

2. From reading the documentation, I know I can use the static buffer allocation option to avoid system-level malloc, but I’ve been unable to get my TLS client working with the details from this manual. Are there any full TLS client examples that use static buffers?

I think approach 1 might be the easiest solution, but I haven’t found relevant documentation. Did I miss something?

Thank you for your help!

Below is output of my print_malloc, the tls client sends 100000 messages and received 100000 echos, and malloc count is 200251:

malloc count 200247, size 145
malloc count 200248, size 181
malloc count 200249, size 145
malloc count 200250, size 181
malloc count 200251, size 145
nround: 100000

Share

Re: Best practice to avoid dynamic memory allocations.

Hi yechenz37,

Please share your wolfSSL version and build settings (user_settings.h or ./configure line), in particular if you are using small stack or SP no malloc.  These settings, especially small stack, will lead to an increase in malloc/frees.
We do support fully disabling malloc when using static memory as you've found but if you only want to minimize mallocs rather than disabling them completely we can accommodate that as well.

Are you able to share any information on your project and whether it is personal or commercial?  Feel free to email us at support [AT] wolfssl [DOT] com if this information is sensitive.

Thanks,
Kareem

Share

3 (edited by yechenz37 2024-12-08 02:19:10)

Re: Best practice to avoid dynamic memory allocations.

Hi Kareem, thank you for replying

I’m currently using wolfSSL 5.7.4, which I downloaded from wolfSSL’s website.

I built it using ./configure --enable-all. Initially, when I configured it without enabling anything, I encountered some linking errors, so I switched to --enable-all, and that resolved the issue. Does this mean that I’ve also enabled the small stack feature? If so, I’d like to adjust my enable flags accordingly.

It would be very helpful if you could share any insights on how to minimize the call of malloc!

I think that avoiding allocations altogether might be a more effective approach for achieving low-latency compared to using static memory. Since static memory still requires some form of "allocation" from a pool, it won’t beat simply reusing a growable buffer with no allocations when buffer is big enough. And it will be easier to integrate wolfssl to our system without static memory.

I’m using wolfSSL as the TLS layer in my WebSocket secure client stack. My TCP layer may not use system sockets directly, as all input and output are handled via memory buffers. I assume that using the BIO model is the right approach for this setup? This client is intended for low-latency crypto trading, which is why I’m particularly concerned about the latency introduced by malloc.

Update:
I did a quick try and only set enable-tls13,opensslall,intelasm, the malloc number is still about 200000. But after add CFLAGS="-DLARGE_STATIC_BUFFERS" to ./configure, the malloc number reduced.

I suppose that this is the proper way?

kareem_wolfssl wrote:

Hi yechenz37,

Please share your wolfSSL version and build settings (user_settings.h or ./configure line), in particular if you are using small stack or SP no malloc.  These settings, especially small stack, will lead to an increase in malloc/frees.
We do support fully disabling malloc when using static memory as you've found but if you only want to minimize mallocs rather than disabling them completely we can accommodate that as well.

Are you able to share any information on your project and whether it is personal or commercial?  Feel free to email us at support [AT] wolfssl [DOT] com if this information is sensitive.

Thanks,
Kareem

Share

Re: Best practice to avoid dynamic memory allocations.

Hi yechenz37,

We don't generally recommend --enable-all as this will enable every single feature leading to an inefficient library build.  Instead we would recommend tuning your build for your use case, if you are receiving linking errors it means you are missing a build option for an API you are using.
--enable-all does not enable small stack, they are separate defines.

We do support reallocating a buffer as long as your system supports the realloc system call, this is used in a few places in the code.

Great find on LARGE_STATIC_BUFFERS, that will help avoid allocations for connections as your TLS buffer is now larger.  How many mallocs are you seeing with this option enabled?

Thanks,
Kareem

Share

5 (edited by yechenz37 2024-12-11 03:36:37)

Re: Best practice to avoid dynamic memory allocations.

Hi Kareem,

After enabling LARGE_STATIC_BUFFERS, only 226 malloc calls are reported, and all of them seem to occur during the TLS handshake. Once the TLS connection is established, no additional malloc calls are reported, no matter how much data (of fixed size 256Byte) I send or receive.

I think it's enough for my usage now.

BTW, I switched from using BIO to IOCallback, and the performance has improved even further.

Thank you for your outstanding work!

Share