Topic: Possible WolfMQTT/Wolf SSL Memory Leak
I have a successfully gotten WolfMQTT and WolfSSL to communicate with Azure IOT hub through a Cell modem. For some time I have been working on adding some robustness to the software as the cellular modem can frequently lose connection I have to gracefully terminate and restart. (NRF52 is the platform).
My system will Tx once a minute and I have been leaving it run to see where potential hangups are.
1.) The TCP connection will go down about every few hours. I modified the MQTT statemachine to go through the WMQ_DISCONNECT and WMQ_NET_DISCONNECT states and then restart at WMQ begin.
2.) It looks like MqttClient_NetDisconnect() calls functions to free resources use by WolfSSL.
3.) After about 24 hours I always get an error in the TLS setup. Today it was:
MqttSocket_TlsConnect Error -1: Num -112, mp_exptmod error state
4.) Other times I never get through the TLS connection as it always returns a "CONTINUE". It takes about 24 hours (roughly 50 or 60 restarts) for this error to pop up.
5.) Since it takes so long for it occur, it hard to capture statistics but today I was ablw to attach a debugger. What was interesting is that my IRQ routines are still running (as there are diagnostic messages from the modem to a serial terminal).
In every case where there was an issue it was locked up in the std C lib "Free" function. Unfortunately I did not have any more data from the call stack.
6.) I believe the only place where my code does malloc/free in in WolfSSL. WolfMQTT did have a malloc for it's context struct but I change it to statically allocate.
Right now I am trying to make this problem happen quicker but I think there is something happening with freeing resources. I am going to try to fix the state machine so it will keep trying to disconnect and reconnect to see if the issue still shows up.
I am also going to look at using static allocation for WolfSSL (currently my stack is 65k and the heap is 72k, the WolfSSL test function pass OK).
I am posting this to see if there is any info on "mp_exptmod error state". I'll post updates as this might be useful to others.