Topic: Connection losses due to segmentation errors
Hi there,
(and sorry for the wall of text, but I have put quite some time into this issue and want to share my discoveries)
Problem: I am regularly experiencing connection losses to the mqtt broker.
I found out that the reason for that is a payload mix up in the tcp segments (or malformed packages?)
As you can see in the attached wireshark trace, the last segment before the reset (Packet starts at No. 191) is incorrect.
First, some data within the payload is sent before the start of the payload (No. 194).
Second the packet No. 195 is marked "malformed" in wireshark. It is shown as a "Publish release" packet, but should only be a normal "publish message" packet. That is not alway the case! Sometimes its a normal tcp packet, sometimes something different.
I am using: MQTT v1.4, Microchip Harmony 2.6, Processor PIC32MZ2048EFM144, FreeRTOS
I discovered that the issue is not present when I put all the web-tasks into the same FreeRTOS Task. (See example below)
Also, when I only send packages with a payload that is smaller than 1460 Bytes (the maximum segment size) the issue also does not happen.
So this seems to be a multitasking issue. (Using WOLFMQTT_MULTITHREAD does not fix the problem but introduced others on top)
I am writing this to you, since I made a test project with plain BSD-Sockets that does not show any signs of this issue.
Also in my real application, we are using tcp and http quite often with the same payloads as for mqtt and have never experiences those issues.
I am still unsure if the problem is in the wolfMQTT library or in microchip harmony. Probably it is only a issue with my specific configuration?
I did dive into the wolfmqtt implementation and could not find anything suspicious, but maybe something is happening in the background (callbacks, interrupts)?
I would be glad if you could look into it.
I have modified a harmony-example with the mqtt-example for azure. I got it running with a local mosquitto broker.
To reproduce:
* build the attached project (the project is using the "PIC32MZ EF Starter Kit".) DHCP is enabled.
* install mosquitto (https://mosquitto.org/) (the standard configuration is ok) and run it.
* Set the correct ip "local_broker_ip" in the attached python script an run it
The python script will continuously trigger a publish on the PIC and will stop if the answer is not correct. This could take a few minutes - the error happens typically within 15 minutes - sometimes much faster.
If errors happen, the PIC needs to be restarted so it can connect to the broker again.
In the system_tasks.c is a define (#define COMBINE_TASKS ) that will enable a configuration that does not have those issues.