Topic: TLSv1.3 bad_record_mac alert
We're using WolfSSL v4.4.0 with the Atmel port, and the WolfSSL peer is sending a bad_record_mac alert when trying to do TLSv1.3 with a HAProxy 2.0.12 server. Our WolfSSL product is an embedded one which uses a Microchip (formerly Atmel) cryptographic hardware chip.
We see the alert when the WolfSSL peer downloads ~500KiB over HTTPS. The alert manifests itself in different ways:
When it occurs during connection establishment, we get a "bad certificate" error
During data transfer, we get a TCP connection reset as the peer side shuts down its end of the connection
I have not confirmed, but I believe the WolfSSL peer is always the one failing to validate the MAC of a message and responding with an alert sent to the HAProxy peer.
The anomaly occurs pretty frequently when the WolfSSL peer is communicating from $job's network (nearly 50% of transfers fail to complete), but much less frequently when communicating via my home AT&T fiber ISP (<5% of transfers fail). The failure may occur at any point during the transfer.
Googling suggests several possible culprits:
One peer is using an old OpenSSL library with a known bug that causes this issue (disproven; HAProxy was built with OpenSSLv1.1.1)
The peer sending the bad message is using OpenSSL incorrectly in a multithreaded application (unlikely; HAProxy was designed as a multithreaded application)
TCP packet corruption (unlikely, as TCP checksumming should rarely allow a bad packet to pass a checksum check)
Questions:
Could a sporadic crytpo operation failure cause this behavior?
If the WolfSSL peer is only receiving corrupt messages, would that exonerate our WolfSSL peer from blame?
Any tips on pinpointing the culprit?
Thanks in advance!