MTU and TCP MSS

UPDATE on 20 Aug 2018:
Johannes – @webernetz from blog.webernetz.net pointed to some inconsistency in my numbers considering the Ethernet frame size in this article.

It was mentioned that Ethernet MTU is by default 1522 bytes and further on that the Ethernet MTU is 1500 bytes. The article is now corrected into stating that Ethernet frame is usually 1522 bytes long if VLAN tagging is used or 1518 bytes long without 802.1Q tag. That leaves a maximum of 1500 bytes for the payload which is effectively Ethernet MTU.

thanks, Johannes

sorry
————————————————-

Intro

The largest chunk of bytes that a transport protocol can forward across specific medium is called MTU – Maximum Transmission Unit. If we speak about Ethernet, which is today the most common, he has by default a frame size of 1522 bytes with the payload of 1500 bytes which is the amount of useful data he can carry in a single frame.

The story about MTU is that the MTU of specific protocol basically defines how much payload (or highest protocols headers + their payload) it can carry in its biggest packet, not counting his own headers. Putting more payload into a single packet than the MTU allows will result in fragmentation, the process of slicing the frame into more smaller frames so that they can get through the path.

Ethernet frame

So if we look at the wire and catch a standard Ethernet frame, we will see that he is able to carry 1500 bytes of data (Ethernet default MTU of 1500 bytes) and additional Ethernet header which is 18 bytes (14 bytes in header and 4 bytes in trailer) or 22 bytes when it has 802.1q VLAN tag added inside.

Different MTUs: Interface MTU, IP MTU and TCP MSS

Interface MTU (Ethernet MTU)

The image above is showing different MTU types. It is important for to note that Interface MTU (Ethernet MTU) is the maximum MTU size that a specific hardware port on our network device can forward. In the case of Ethernet (IEEE 802.3) the maximum frame size is 1522 bytes which was raised from 1518 bytes to allow VLAN tagging using 802.1q. Minimum Ethernet frame size can be 64 bytes. After some time, when Gigabit and faster ethernet ports started to appear, jumbo frames were introduced with the ability to increase the interface MTU to 9000 bytes for performance reasons (slightly less header overhead).

You can change interface MTU with this command on Cisco device:

R2(config)#interface GigabitEthernet0/1
R2(config-if)#mtu 1400

IP MTU

IP MTU should always be aligned with interface MTU, on some devices you can not even configure the IP MTU to be more that 1500 bytes in order to avoid making IP MTU larger than interface MTU. If we configure IP MTU to be bigger than interface MTU (let’s say we configure interface MTU of 1400 bytes like with the example above and leave IP MTU at its default of 1500 bytes) the packets will probably be dropped or in the best scenario fragmented in order to get small enough to pass the max interface MTU capacity.

You can modify IP MTU with this command on Cisco device interface:

R2(config)#interface GigabitEthernet0/1 
R2(config-if)#ip mtu 1400

TCP MSS

TCP MSS is signalled to neighbour sender device inside SYN packet of three-way handshake in order to be sure that he will only send a packet with TCP payload that can fit into IP MTU configured frame size.

If we shrink the IP MTU to 1400 bytes, we should also configure the “ip tcp adjust-mss 1360” with 1360 bytes so that the TCP payload together with outer header does not exceed 1400 bytes which would cause drop or fragmentation.

Here’s the command to do that:

R2(config)#interface GigabitEthernet0/1 
R2(config-if)#ip tcp adjust-mss 1360

In our topology, this would enable R1 to get the info that TCP payload can be as large as 1360 bytes, not more.

 

IN MORE DETAILS

If we continue to speak about Ethernet, his header will be 14 bytes. If you are sending something across an Ethernet-based network and you use IP for transmission, you will need to put 20 bytes of IP header inside that chunk of data. If you use TCP as a control protocol, to be sure that the packet will get to a destination, you will need an additional TCP header of 20 bytes too. So IP and TCP, which are higher-level protocols than Ethernet, their headers and then the real user payload all need to fit into Ethernet MTU size in order to avoid fragmentation.

Fragmentation

The thing with fragmentation is that sometimes high-level protocols create packets that are bigger than maximum size supported by a link. IP protocol enables fragmentation process at Layer 3, slicing the datagram into more pieces based on interface MTU configuration. That way, every piece is able to get through that single link. IP will slice the packets and mark each piece with the order number so that receiver will be able to recreate the original datagram. Usage of high-level protocols like GRE tunnelling, IPSec etc. will use all before mentioned headers of IP and TCP protocols but they will also put additional headers for their own use. With one or more of those additional headers, which IP is not aware of, we are getting a segment which is bigger than medium MTU.

TCP MSS

When speaking of fragmentation and how to avoid it, we get to the TCP MSS (Maximum Segment Size). TCP MSS is max payload size which can fit into TCP segment so that no fragmentation need to occur. TCP MSS is basically the remaining space after putting IP and TCP header into Ethernet frame.

In the image above, you can see that in normal TCP data transfer and with interface and IP MTU of default 1500 bytes, 1460 bytes remaining for payload, this is basically the TCP MSS size which will by default be signaled to sender, from the receiver side, so that sender knows how much payload he can put in each datagram and not cause fragmentation.

In case of IP/TCP communication across GRE tunnel, GRE protocol will spend additional 24 bytes in the datagram for its own stuff and if we do not signal to the sender that our TCP MSS needs to be 24 bytes smaller (1460-24=1436), then the fragmentation will occur in order for communication to be possible.

TCP MSS is sent inside SYN packet of a receiver during the 3-way-handshake process of establishing TCP connection in a way that the host on the other side can know how much payload he is allowed to put into frames when sending to him in order to avoid fragmentation.

Fragmentation should always be avoided because it increases the overhead and also breaks the communication completely when some applications are unable to work with fragmented packets.

If the application, which is exchanging the data, does not support fragmentation, then we need to configure intermediary network devices in the data path with lower maximum segment size (TCP MSS) so that the segments do not get too large before being sent from Layer 3 towards Data Link and further out onto the medium at Layer 1.

Interface MTU and IP MTU will need to be of the same value for fragmentation to function in optimal way. If IP MTU is bigger than Interface MTU, transfer of IP payload bigger than interface MTU will fail. MTU is used to control and initiate fragmentation process at Layer 3. TCP MSS is used to avoid fragmentation if possible and when needed by applications that do not work with fragmented packets.

HOW TO CHECK MTU SIZE

We take this small network from the image above and configure devices IP addresses and routes in this way:

Server:

interface GigabitEthernet0/0
ip address 10.10.10.1 255.255.255.0
Default GW: 10.10.10.2

R1:

interface GigabitEthernet0/0
ip address 10.10.10.2 255.255.255.0
!
interface GigabitEthernet0/1
ip address 10.10.20.1 255.255.255.0
!
ip route 10.10.30.0 255.255.255.0 10.10.20.2

 

R2:

interface GigabitEthernet0/1
ip address 10.10.20.2 255.255.255.0
!
interface GigabitEthernet0/2
ip address 10.10.30.1 255.255.255.0
!
ip route 10.10.10.0 255.255.255.0 10.10.20.1

 

PC:

interface GigabitEthernet0/2
ip address 10.10.30.2 255.255.255.0
Default GW: 10.10.30.1

 

After that we check the default MTU configuration of R2 interface towards R1:

R2#show interface gi 0/1 | include MTU
  MTU 1500 bytes, BW 1000000 Kbit/sec, DLY 10 usec,

R2#show ip interface gi 0/1 | include MTU
  MTU is 1500 bytes

R2#

Above, we checked interface MTU and then the IP MTU, both of them 1500 bytes by default. We should now try to send some packets through that router with different sizes and with DF bit set at Layer 3. DF bit tell to the router: “please don’t fragment if the packet is bigger than interface MTU, and if it is a packet that should be fragmented just drop it rather than fragment”.

This exercise will show that with interface MTU at 1500, only packets with size up to 1500 bytes will get through and bigger ones will be dropped by the router if DF bit is set at Layer 3.

We connect to the PC (10.10.30.2) and try to do an extended PING test with DF bit set and variable packet size towards the Server (10.10.10.1):

PC#ping 
Protocol [ip]: 
Target IP address: 10.10.10.1
Repeat count [5]: 1
Datagram size [100]: 
Timeout in seconds [2]: 
Extended commands [n]: y
Source address or interface: 
Type of service [0]: 
Set DF bit in IP header? [no]: y
Validate reply data? [no]: 
Data pattern [0xABCD]: 
Loose, Strict, Record, Timestamp, Verbose[none]: v
Loose, Strict, Record, Timestamp, Verbose[V]: 
Sweep range of sizes [n]: y
Sweep min size [36]: 1490
Sweep max size [18024]: 1510
Sweep interval [1]: 
Type escape sequence to abort.
Sending 21, [1490..1510]-byte ICMP Echos to 10.10.10.1, timeout is 2 seconds:
Packet sent with the DF bit set
Reply to request 0 (6 ms) (size 1490)
Reply to request 1 (7 ms) (size 1491)
Reply to request 2 (8 ms) (size 1492)
Reply to request 3 (7 ms) (size 1493)
Reply to request 4 (14 ms) (size 1494)
Reply to request 5 (12 ms) (size 1495)
Reply to request 6 (13 ms) (size 1496)
Reply to request 7 (9 ms) (size 1497)
Reply to request 8 (16 ms) (size 1498)
Reply to request 9 (9 ms) (size 1499)
Reply to request 10 (3 ms) (size 1500)
Request 11 timed out (size 1501)
Request 12 timed out (size 1502)
Request 13 timed out (size 1503)
Request 14 timed out (size 1504)
Request 15 timed out (size 1505)
Request 16 timed out (size 1506)
Request 17 timed out (size 1507)
Request 18 timed out (size 1508)
Request 19 timed out (size 1509)
Request 20 timed out (size 1510)
Success rate is 52 percent (11/21), round-trip min/avg/max = 3/9/16 ms
PC#

MTU misconfiguration

If I change IP MTU between two routers to 1400 bytes and leave the TCP MSS at default 1460 bytes, this will show the issue with MTU misconfiguration where IP MTU is used by IP protocol to initiate fragmentation after payload size gets bigger that IP MTU. If we do not configure TCP MSS to the right size of “IP MTU – 40 bytes” (in the case of IP forwarding), then IP will not initiate fragmentation and the packet will be sent from Layer 3 of a sending device to Data Link Layer with more bytes than supported by the other side and it will be dropped on arival. If we configure both IP MTU and TCP MSS to right values, then the packets will get fragmented into pieces of size small enough so that they can cross the medium in question.

Smaller TCP MSS size needs to be signalled to sender

TCP MSS size is signalled by the receiver to the sender in the process of establishing TCP connection. It will be sent inside SYN packet in three-way handshake telling to the sender: “Hey, when you start to send me something, I can receive TCP segments not bigger than X”. Default for X (TCP MSS) is 1460 bytes (1460 + IP 20 + TCP 20 = 1500) and can be smaller or bigger than this default. TCP MSS is signalled in each direction separately and it can be different in size for two directions, it depends on what the receiver is able to receive.

Interface MTU, IP MTU and TCP MSS are not automatically synchronized. In most cases, there is no mechanism that will let IP protocol know that some higher-level protocol is being sent inside his payload (which will effectively need to shrink TCP MSS for the size of that higher-size header).

Since there is no such automatic synchronization, most of the time, segments with additional high-level protocol inside will have payload bigger than supported TCP MSS and will get fragmented. Everything will work fine if the application in question supports fragmentation.

If communication does not allow fragmentation, and we know that some high-level tunnelling is sent across that link, we have the option to change TCP MSS settings on network device so that it signals to the sender that he need to send segments with smaller TCP MSS.

On Cisco device this will be done on the interface connecting the sender, with the command:

interface GigabitEthernet0/2
ip tcp mss-adjust 1360

Cisco will then rewrite TCM MSS of any TCP SYN packet sent out that interface and the received packets from the sender will have a smaller payload and will not need to get fragmented.

 

One Response

  1. Johannes Weber September 26, 2018

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: