TCP provides reliable byte stream data transfer service for the application layer, while TCP is built on top of the IP layer. When forwarding the packet, the IP layer device simply sends the IP packet down and forwards the IP packet as far as possible. When the IP packet is abandoned by the router or receiver in the transmission process, the user data carried by it will be permanently lost. The IP layer has no remedial measures for the loss of data. In addition, the packet size limit of IP layer depends on the ability of data link layer, so there will be the IP packet size limitation, in another word, when the application layer of the byte stream arrived at the IP layer, the data volume of more than a certain value, they will be divided into multiple IP data packet transmissions, there will be out of order arrival IP packets problem and duplicated IP packets problem. The IP layer can not be trust. How does the TCP protocol over an untrusted IP network ensure reliable byte traffic for the application layer?
Unreliability of the IP layer
The meaning of IP layer existence in the network is to look for hosts in the Internet and address them uniformly among different data link layers, so that hosts in various data link layers can communicate with each other according to the uniform rule of IP addressing. The IP layer is not responsible for caching data, and when an IP packet is sent to the network, the IP layer does not track it. Therefore, data sent to the IP layer may be lost or discarded by the network routing device. Due to the variability of IP network equipment, such as router restarting, power off or line load sharing, the routing table changes, and the order of IP arriving at the destination host is different from the IP packet departure. IP packets may also be copied in the network, which may cause the same IP packet to reach the target host multiple times at the same time, resulting in data redundancy.
In the network, it’s very ordinaty for the IP layer to appear packet loss, packet discard, out-of-order, redundant data. Therefore, the IP layer is unreliable in the process of data transmission. To achieve reliable data transmission dependent on the IP layer, the complex control mechanism of the transmission layer is required.
Unreliability of the receiver
The reliable transmission of data not only depends on the network, but also depends on the working state of the receiver. For example, when the load on the receiving end is very high and the data cache of the receiving end is full, the receiving end will directly ignore the data that has been delivered. Therefore, the state of the receiving end is also an important fact that should be considered in the process of data transmission.
How does the TCP layer ensure reliable byte flow
TCP layer realizes reliable data transmission in unreliable IP network through complex mechanism. It includes data sorting, de-redundancy, packet loss and re-transmission, data verification, and flow control considering the operation of the receiver and congestion control considering the operation of the intermediate router.
Byte sorting
When one end sends a stream of bytes to the other end, how does the other end know if the data in the middle is missing or duplicated? Simple, each byte in the byte stream is number labeled. When a TCP segment is lost during transmission, the receiver knows the sequence number of the missing TCP segment by sequence number, and can also discard duplicate data by sequence number. In the figure below, in order to illustrate the sorting of data, the byte number starts from 1. In practice, the initial number of the byte does not start from 1, but the value generated by the system according to a certain algorithm. This value is called the Initial Sequence Number. ISN value in the range of [0, 2 ^ 32-1), after the value of more than 2 ^ 32-1, start counting from 0 again. When TCP layer segments the byte stream data of application layer, it will not only put 3 bytes in one Segment, but Segment according to the specific situation. The maximum value of byte stream Segment is called MSS (Max Size Segment).
Data verification
When each byte of the byte stream arrives in sequence at the receiving end, the data at the receiving end may be inconsistent with the data content at the sending end due to unpredictable accidents in the transmission link. So the TCP layer provides data validation before and after sending. When a TCP segment data before sending, TCP module add temporary segment data on the source IP address, destination IP address, TCP protocol type, calculation of the length of the TCP segment, will be appended to these a few elements to send TCP segment, forming a temporary data structure, and for the calibration and checksum calculation, calculation result will be added to the TCP segment data.
When this data arrives at the receiving end, the receiving end performs the above calculation again and gets checksum again, and then compares it with checksum in the TCP segment. If it is consistent, it indicates that the data is reliable.
Acknowledge mechanism
Sorting the byte stream data lets the receiver know whether data is lost in transit, but what can be done to remedy the loss? In TCP, the receiver informs the sender of the received data, that is, confirms the received data. When the sending end fails to receive the confirmation from the receiving end for the segment data starting with serial number N within a certain period of time, the sending end considers that the segment data of serial number N is lost, and the sending end retransmits the TCP segment marked with serial number N.
As shown in the figure below, the segment DATA [DATA: 5,6,7] is sent from the client to the server. After the server receives [DATA: 5,6,7], it replies [ACK: 8], indicating that it has received all DATA before the seventh byte and expects to receive DATA starting from the eighth byte. The DATA [DATA: 8,9,10] is sent from the client to the server. This DATA does not reach the server, but is lost in the network. The client starts timing when sending the DATA. Piece of DATA [DATA: 11] from the client sends to the server, the server receives DATA: 11 after, reply [ACK: 14], reply period of DATA loss in the network, the same as the last time, timing starts when the client sends a DATA, after a certain period of time, the client didn’t receive the service side of this piece of DATA confirmation reply, the client to send the DATA.
Flow control
When the rate of sending data exceeds the processing capacity of the receiver, the receiver informs the sender to slow down the speed of sending data. This will be explained in detail when TCP sliding window mechanism is explained.
Congestion control
This section will be explained in detail in the TCP congestion control instructions.