Fragmented UDP packets are dropped

henkpls · 21. Mai 2020

Hi all,

I have a RS2000 G8 server with 4 CPU and I'm having problems with large (>1472) SIP (UDP) packets which have a loss of almost 100%. Of course changing to TCP fixes the problem, but not all users can switch so easy. I have servers with 2 other providers where there is 0% packet loss. Support was helpful and changed me to another rack, but still the same problem.

To test, I'm using socat on an open port like this:

socat -v UDP-LISTEN:35000,fork PIPE

and send packets from a remote server with udpping.py (https://github.com/wangyu-/UDPping/blob/master/udpping.py). This works 100% to one of my other servers, so I think the test is correct.

Today I did a final test and tested in recovery mode with the same results. So now I'm out of idea's and hope that someone can help me.

Henk

ThomasChr · 21. Mai 2020

Could you give me some more details of what commands you‘re using exactly? Then I can comfirm the problemwith a few of my netcup servers.

At first glance it looks like a problem of tge router infrastructure of netcup to me... but I first want to make the same tests and confirm the problem.

Thomas

ThomasChr · 21. Mai 2020

Yes, I can confirm. The mentioned programs start getting timeouts at about 1500 Length. This happens with my netcup servers, not with another server from another hoster which gets timeout at about 9000 Byte Length.

This does not happen when using the loopback interface but it also happens when communicating from one netcup server to another.

henkpls · 21. Mai 2020

Good to hear that this is repeatable. My commands:

server 1:

socat -v UDP-LISTEN:35000,fork PIPE

server 2 (script in my first post):

./udpping.py <ip> 35000 "LEN=1800"

ThomasChr · 21. Mai 2020

Did you make a ticket with support?

henkpls · 21. Mai 2020

Yes, but I was on a dead end. This is the last answer I got:

Code

i've discussed the issue with one of our network engineers and he is also convinced, that this is in fact an issue with mtu and therefore no issue in our infrastructure.

So I expect to get this reopened now. Thanks for the help!

m_ueberall · 21. Mai 2020

Zitat von henkpls
Yes, but I was on a dead end. This is the last answer I got:
Code
i've discussed the issue with one of our network engineers and he is also convinced, that this is in fact an issue with mtu and therefore no issue in our infrastructure.
So I expect to get this reopened now. Thanks for the help!

Sorry–You expect or you don't expect (would make more sense from what's stated above, but just to be sure)?

ThomasChr · 21. Mai 2020

I think he expects to get the ticket reopened. I can provide pcap Files if needed.

Datacenter He***er: no problem

Datacenter netcup: Timeout at~1500 Length

henkpls · 21. Mai 2020

Zitat von m_ueberall

Sorry–You expect or you don't expect (would make more sense from what's stated above, but just to be sure)?

I don't see the problem in above sentence, but to be clear I asked by email to reopen this, Supportanfrage [NC#2020050710009379]

eripek · 21. Mai 2020

The default MTU at netcup (and most other Hosters I know of) is 1500 Bytes. UDP packets have a header of 8 Bytes and IP is 20 Bytes. Hence unfragmented packets with a maximum payload of 1472 Byte are quite normal.

If your SIP packets (really SIP or RTP, rather) exceed a length of 1500 Byte, you are doing something wrong in my opinion.

Try not to block the following ICMP types: echo request, echo reply and message too big. That way, path MTU discovery will work, which should avoid that you receive such big, unfragmented packets in the first place.

ThomasChr · 21. Mai 2020

Could be that the mentioned tool forces the whole length into one packet without any fragmentation. And that doesn‘t work...

eripek · 22. Mai 2020

Zitat von ThomasChr

Could be that the mentioned tool forces the whole length into one packet without any fragmentation. And that doesn‘t work...

Quite to be expected. Check your maximum MTU. ping -M do -c 2 -s 1472 targetip would send an unfragmented ICMP Ping (UDP has exactly the same size) to the target's IP address. For an MTU of 1500 this will most probably receive an icmp echo reply, while a payload of 1473 and above would even fail locally, if the (ethernet) interface is limited to a MTU of 1500 Byte.

socat, which is a highly evolved program like netcat, up to my knowledge, would rely on the kernel's abilities to use or not to use fragmentation. Therefore make sure, that pMTUd works in any case.

http://www.dest-unreach.org/socat/

henkpls · 22. Mai 2020

RTP packets are not a problem. For SIP, packets have increased over the years as functionality was extended. It's a known problem that SIP packets over UDP are fragmented. But over a total of 5 servers I have, I can send large UDP packets from one to the other including the route over internet. Only on the NETCUP network these are dropped due to unknown reasons to me.

My eth0 device has an MTU of 1500, so there is in my opinion no need for MTU discovery (and I also tested in recovery mode without firewall). Also, the large test packet is fragmented at the source, as otherwise I think (I'm not an expert on this) it cannot travel the internet and arrive on servers from other providers.

I did some more testing and found that this could be caused by a network firewall, in the captured packets I see Netcup is using a Juniper device. This lead me to this post which might explain the dropped packets:

https://kb.juniper.net/InfoCen…x?page=content&id=KB31437

henkpls · 22. Mai 2020

All my systems give a response like this on a large ping, so on all of my systems MTU is 1500

ping: local error: Message too long, mtu=1500

eripek · 22. Mai 2020

Well, I suppose you should report your findings to netcup support, once again. You can also open a new ticket and reference to the recently closed ticket number.

henkpls · 22. Mai 2020

It has been confirmed by Netcup support, so thank you all for helping!

ThomasChr · 22. Mai 2020

What did they exactly confirm?

I‘m curious

henkpls · 23. Mai 2020

I understand that you are curious , so here's the reply:

Code

"we are currently investigating this problem and we involved our NOC. Thank you very much for the nice script you have written for testing purposes.
We can definitely reproduce the issue."

Do you have any experience how long it will take for the NOC to fix this? Just Curious

[Anexia] Theo V. · 27. Mai 2020

Hi all,

the issue was caused by a complex DDoS protection filtering, which was impacting fragmented UDP packets and should now be resolved. Can you please check and provide feedback? Please also let the official support team know. We apologize for the inconvenience caused, DDoS filter tuning is always a tightrope walk.

Best regards,

Theo

henkpls · 27. Mai 2020

Tested and fully working now!

Thanks!

Regards,

Henk

Tags