This article has been published in the peer-reviewed journal Open Medicine (). Click to view the published version.

Internet Protocol version 4

From HandWiki
Short description: Fourth version of the Internet Protocol
Internet Protocol version 4
Protocol stack
IPv4 Packet-en.svg
IPv4 packet
Purposeinternetworking protocol
Developer(s)DARPA
Introduced1981; 43 years ago (1981)
InfluencedIPv6
OSI layerNetwork layer
RFC(s)791

Internet Protocol version 4 (IPv4) is the fourth version of the Internet Protocol (IP). It is one of the core protocols of standards-based internetworking methods in the Internet and other packet-switched networks. IPv4 was the first version deployed for production on SATNET in 1982 and on the ARPANET in January 1983. It is still used to route most Internet traffic today,[1] even with the ongoing deployment of Internet Protocol version 6 (IPv6),[2] its successor.

IPv4 uses a 32-bit address space which provides 4,294,967,296 (232) unique addresses, but large blocks are reserved for special networking purposes.[3]REFERENCE FOR RFC6890 IS NOT DEFINED YET. You are invited to add it here.

History

Internet Protocol version 4 is described in IETF publication RFC 791 (September 1981), replacing an earlier definition of January 1980 (RFC 760). In March 1982, the US Department of Defense decided on the Internet Protocol Suite (TCP/IP) as the standard for all military computer networking.[4]

Purpose

The Internet Protocol is the protocol that defines and enables internetworking at the internet layer of the Internet Protocol Suite. In essence it forms the Internet. It uses a logical addressing system and performs routing, which is the forwarding of packets from a source host to the next router that is one hop closer to the intended destination host on another network.

IPv4 is a connectionless protocol, and operates on a best-effort delivery model, in that it does not guarantee delivery, nor does it assure proper sequencing or avoidance of duplicate delivery. These aspects, including data integrity, are addressed by an upper layer transport protocol, such as the Transmission Control Protocol (TCP).

Addressing

Decomposition of the quad-dotted IPv4 address representation to its binary value

IPv4 uses 32-bit addresses which limits the address space to 4294967296 (232) addresses.

IPv4 reserves special address blocks for private networks (~18 million addresses) and multicast addresses (~270 million addresses).

Address representations

IPv4 addresses may be represented in any notation expressing a 32-bit integer value. They are most often written in dot-decimal notation, which consists of four octets of the address expressed individually in decimal numbers and separated by periods.

For example, the quad-dotted IP address in the illustration (172.16.254.1) represents the 32-bit decimal number 2886794753, which in hexadecimal format is 0xAC10FE01.

CIDR notation combines the address with its routing prefix in a compact format, in which the address is followed by a slash character (/) and the count of leading consecutive 1 bits in the routing prefix (subnet mask).

Other address representations were in common use when classful networking was practiced. For example, the loopback address 127.0.0.1 was commonly written as 127.1, given that it belongs to a class-A network with eight bits for the network mask and 24 bits for the host number. When fewer than four numbers were specified in the address in dotted notation, the last value was treated as an integer of as many bytes as are required to fill out the address to four octets. Thus, the address 127.65530 is equivalent to 127.0.255.250.

Allocation

In the original design of IPv4, an IP address was divided into two parts: the network identifier was the most significant octet of the address, and the host identifier was the rest of the address. The latter was also called the rest field. This structure permitted a maximum of 256 network identifiers, which was quickly found to be inadequate.

To overcome this limit, the most-significant address octet was redefined in 1981 to create network classes, in a system which later became known as classful networking. The revised system defined five classes. Classes A, B, and C had different bit lengths for network identification. The rest of the address was used as previously to identify a host within a network. Because of the different sizes of fields in different classes, each network class had a different capacity for addressing hosts. In addition to the three classes for addressing hosts, Class D was defined for multicast addressing and Class E was reserved for future applications.

Dividing existing classful networks into subnets began in 1985 with the publication of RFC 950. This division was made more flexible with the introduction of variable-length subnet masks (VLSM) in RFC 1109 in 1987. In 1993, based on this work, RFC 1517 introduced Classless Inter-Domain Routing (CIDR),[5] which expressed the number of bits (from the most significant) as, for instance, /24, and the class-based scheme was dubbed classful, by contrast. CIDR was designed to permit repartitioning of any address space so that smaller or larger blocks of addresses could be allocated to users. The hierarchical structure created by CIDR is managed by the Internet Assigned Numbers Authority (IANA) and the regional Internet registries (RIRs). Each RIR maintains a publicly searchable WHOIS database that provides information about IP address assignments.

Special-use addresses

The Internet Engineering Task Force (IETF) and IANA have restricted from general use various reserved IP addresses for special purposes.REFERENCE FOR RFC6890 IS NOT DEFINED YET. You are invited to add it here. Notably these addresses are used for multicast traffic and to provide addressing space for unrestricted uses on private networks. <section begin=IPv4-special-address-blocks/>

Special address blocks
Address block Address range Number of addresses Scope Description
0.0.0.0/8 0.0.0.0–0.255.255.255 16777216 Software Current (local, "this") networkREFERENCE FOR RFC6890 IS NOT DEFINED YET. You are invited to add it here.
10.0.0.0/8 10.0.0.0–10.255.255.255 16777216 Private network Used for local communications within a private networkREFERENCE FOR RFC1918 IS NOT DEFINED YET. You are invited to add it here.
100.64.0.0/10 100.64.0.0–100.127.255.255 4194304 Private network Shared address spaceREFERENCE FOR RFC6598 IS NOT DEFINED YET. You are invited to add it here. for communications between a service provider and its subscribers when using a carrier-grade NAT
127.0.0.0/8 127.0.0.0–127.255.255.255 16777216 Host Used for loopback addresses to the local host[6]
169.254.0.0/16 169.254.0.0–169.254.255.255 65536 Subnet Used for link-local addressesREFERENCE FOR RFC3927 IS NOT DEFINED YET. You are invited to add it here. between two hosts on a single link when no IP address is otherwise specified, such as would have normally been retrieved from a DHCP server
172.16.0.0/12 172.16.0.0–172.31.255.255 1048576 Private network Used for local communications within a private network[7]
192.0.0.0/24 192.0.0.0–192.0.0.255 256 Private network IETF Protocol Assignments, DS-Lite (/29)[6]
192.0.2.0/24 192.0.2.0–192.0.2.255 256 Documentation Assigned as TEST-NET-1, documentation and examplesREFERENCE FOR RFC5737 IS NOT DEFINED YET. You are invited to add it here.
192.88.99.0/24 192.88.99.0–192.88.99.255 256 Internet Reserved.REFERENCE FOR RFC7526 IS NOT DEFINED YET. You are invited to add it here. Formerly used for IPv6 to IPv4 relayREFERENCE FOR RFC3068 IS NOT DEFINED YET. You are invited to add it here. (included IPv6 address block 2002::/16).
192.168.0.0/16 192.168.0.0–192.168.255.255 65536 Private network Used for local communications within a private network[7]
198.18.0.0/15 198.18.0.0–198.19.255.255 131072 Private network Used for benchmark testing of inter-network communications between two separate subnetsREFERENCE FOR RFC2544 IS NOT DEFINED YET. You are invited to add it here.
198.51.100.0/24 198.51.100.0–198.51.100.255 256 Documentation Assigned as TEST-NET-2, documentation and examples[8]
203.0.113.0/24 203.0.113.0–203.0.113.255 256 Documentation Assigned as TEST-NET-3, documentation and examples[8]
224.0.0.0/4 224.0.0.0–239.255.255.255 268435456 Internet In use for multicastREFERENCE FOR RFC5771 IS NOT DEFINED YET. You are invited to add it here. (former Class D network)
233.252.0.0/24 233.252.0.0-233.252.0.255 256 Documentation Assigned as MCAST-TEST-NET, documentation and examples (Note that this is part of the above multicast space.)[9]REFERENCE FOR RFC6676 IS NOT DEFINED YET. You are invited to add it here.
240.0.0.0/4 240.0.0.0–255.255.255.254 268435455 Internet Reserved for future useREFERENCE FOR RFC3232 IS NOT DEFINED YET. You are invited to add it here. (former Class E network)
255.255.255.255/32 255.255.255.255 1 Subnet Reserved for the "limited broadcast" destination addressREFERENCE FOR RFC6890 IS NOT DEFINED YET. You are invited to add it here.
<section end=IPv4-special-address-blocks/>

Private networks

Of the approximately four billion addresses defined in IPv4, about 18 million addresses in three ranges are reserved for use in private networks. Packets addresses in these ranges are not routable in the public Internet; they are ignored by all public routers. Therefore, private hosts cannot directly communicate with public networks, but require network address translation at a routing gateway for this purpose.

<section begin=IPv4-private-networks/>

Reserved private IPv4 network rangesREFERENCE FOR RFC1918 IS NOT DEFINED YET. You are invited to add it here.
Name CIDR block Address range Number of addresses Classful description
24-bit block 10.0.0.0/8 10.0.0.0 – 10.255.255.255 16777216 Single Class A
20-bit block 172.16.0.0/12 172.16.0.0 – 172.31.255.255 1048576 Contiguous range of 16 Class B blocks
16-bit block 192.168.0.0/16 192.168.0.0 – 192.168.255.255 65536 Contiguous range of 256 Class C blocks
<section end=IPv4-private-networks/>

Since two private networks, e.g., two branch offices, cannot directly interoperate via the public Internet, the two networks must be bridged across the Internet via a virtual private network (VPN) or an IP tunnel, which encapsulates packets, including their headers containing the private addresses, in a protocol layer during transmission across the public network. Additionally, encapsulated packets may be encrypted for transmission across public networks to secure the data.

Link-local addressing

RFC 3927 defines the special address block 169.254.0.0/16 for link-local addressing. These addresses are only valid on the link (such as a local network segment or point-to-point connection) directly connected to a host that uses them. These addresses are not routable. Like private addresses, these addresses cannot be the source or destination of packets traversing the internet. These addresses are primarily used for address autoconfiguration (Zeroconf) when a host cannot obtain an IP address from a DHCP server or other internal configuration methods.

When the address block was reserved, no standards existed for address autoconfiguration. Microsoft created an implementation called Automatic Private IP Addressing (APIPA), which was deployed on millions of machines and became a de facto standard. Many years later, in May 2005, the IETF defined a formal standard in RFC 3927, entitled Dynamic Configuration of IPv4 Link-Local Addresses.

Loopback

Main page: Localhost

The class A network 127.0.0.0 (classless network 127.0.0.0/8) is reserved for loopback. IP packets whose source addresses belong to this network should never appear outside a host. Packets received on a non-loopback interface with a loopback source or destination address must be dropped.

First and last subnet addresses

The first address in a subnet is used to identify the subnet itself. In this address all host bits are 0. To avoid ambiguity in representation, this address is reserved.REFERENCE FOR RFC923 IS NOT DEFINED YET. You are invited to add it here. The last address has all host bits set to 1. It is used as a local broadcast address for sending messages to all devices on the subnet simultaneously. For networks of size /24 or larger, the broadcast address always ends in 255.

For example, in the subnet 192.168.5.0/24 (subnet mask 255.255.255.0) the identifier 192.168.5.0 is used to refer to the entire subnet. The broadcast address of the network is 192.168.5.255.

Type Binary form Dot-decimal notation
Network space 11000000.10101000.00000101.00000000 192.168.5.0
Broadcast address 11000000.10101000.00000101.11111111 192.168.5.255
In red, is shown the host part of the IP address; the other part is the network prefix. The host gets inverted (logical NOT), but the network prefix remains intact.

However, this does not mean that every address ending in 0 or 255 cannot be used as a host address. For example, in the /16 subnet 192.168.0.0/255.255.0.0, which is equivalent to the address range 192.168.0.0192.168.255.255, the broadcast address is 192.168.255.255. One can use the following addresses for hosts, even though they end with 255: 192.168.1.255, 192.168.2.255, etc. Also, 192.168.0.0 is the network identifier and must not be assigned to an interface.REFERENCE FOR RFC1122 IS NOT DEFINED YET. You are invited to add it here. The addresses 192.168.1.0, 192.168.2.0, etc., may be assigned, despite ending with 0.

In the past, conflict between network addresses and broadcast addresses arose because some software used non-standard broadcast addresses with zeros instead of ones.REFERENCE FOR RFC1122 IS NOT DEFINED YET. You are invited to add it here.

In networks smaller than /24, broadcast addresses do not necessarily end with 255. For example, a CIDR subnet 203.0.113.16/28 has the broadcast address 203.0.113.31.

Type Binary form Dot-decimal notation
Network space 11001011.00000000.01110001.00010000 203.0.113.16
Broadcast address 11001011.00000000.01110001.00011111 203.0.113.31
In red, is shown the host part of the IP address; the other part is the network prefix. The host gets inverted (logical NOT), but the network prefix remains intact.

As a special case, a /31 network has capacity for just two hosts. These networks are typically used for point-to-point connections. There is no network identifier or broadcast address for these networks.REFERENCE FOR RFC3021 IS NOT DEFINED YET. You are invited to add it here.

Address resolution

Main page: Domain Name System

Hosts on the Internet are usually known by names, e.g., www.example.com, not primarily by their IP address, which is used for routing and network interface identification. The use of domain names requires translating, called resolving, them to addresses and vice versa. This is analogous to looking up a phone number in a phone book using the recipient's name.

The translation between addresses and domain names is performed by the Domain Name System (DNS), a hierarchical, distributed naming system that allows for the subdelegation of namespaces to other DNS servers.

Unnumbered interface

A unnumbered point-to-point (PtP) link, also called a transit link, is a link that does not have an IP network or subnet number associated with it, but still has an IP address. First introduced in 1993,[10]REFERENCE FOR RFC1716 IS NOT DEFINED YET. You are invited to add it here.'REFERENCE FOR RFC1812 IS NOT DEFINED YET. You are invited to add it here.[11] Phil Karn from Qualcomm is credited as the original designer.

The purpose of a transit link is to route datagrams. They are used to free IP addresses from a scarce IP address space or to reduce the management of assigning IP and configuration of interfaces. Previously, every link needed to dedicate a /31 or /30 subnet using 2 or 4 IP addresses per point-to-point link. When a link is unnumbered, a router-id is used, a single IP address borrowed from a defined (normally a loopback) interface. The same router-id can be used on multiple interfaces.

One of the disadvantages of unnumbered interfaces is that it is harder to do remote testing and management.

Address space exhaustion

Main page: IPv4 address exhaustion
IPv4 address exhaustion timeline

In the 1980s, it became apparent that the pool of available IPv4 addresses was depleting at a rate that was not initially anticipated in the original design of the network.[12] The main market forces that accelerated address depletion included the rapidly growing number of Internet users, who increasingly used mobile computing devices, such as laptop computers, personal digital assistants (PDAs), and smart phones with IP data services. In addition, high-speed Internet access was based on always-on devices. The threat of exhaustion motivated the introduction of a number of remedial technologies, such as:

By the mid-1990s, NAT was used pervasively in network access provider systems, along with strict usage-based allocation policies at the regional and local Internet registries.


The primary address pool of the Internet, maintained by IANA, was exhausted on 3 February 2011, when the last five blocks were allocated to the five RIRs.[13][14] APNIC was the first RIR to exhaust its regional pool on 15 April 2011, except for a small amount of address space reserved for the transition technologies to IPv6, which is to be allocated under a restricted policy.[15]

The long-term solution to address exhaustion was the 1998 specification of a new version of the Internet Protocol, IPv6.REFERENCE FOR RFC2460 IS NOT DEFINED YET. You are invited to add it here. It provides a vastly increased address space, but also allows improved route aggregation across the Internet, and offers large subnetwork allocations of a minimum of 264 host addresses to end users. However, IPv4 is not directly interoperable with IPv6, so that IPv4-only hosts cannot directly communicate with IPv6-only hosts. With the phase-out of the 6bone experimental network starting in 2004, permanent formal deployment of IPv6 commenced in 2006.REFERENCE FOR RFC3701 IS NOT DEFINED YET. You are invited to add it here. Completion of IPv6 deployment is expected to take considerable time,[16] so that intermediate transition technologies are necessary to permit hosts to participate in the Internet using both versions of the protocol.

Packet structure

An IP packet consists of a header section and a data section. An IP packet has no data checksum or any other footer after the data section. Typically the link layer encapsulates IP packets in frames with a CRC footer that detects most errors, many transport-layer protocols carried by IP also have their own error checking.[17]

Header

The IPv4 packet header consists of 14 fields, of which 13 are required. The 14th field is optional and aptly named: options. The fields in the header are packed with the most significant byte first (network byte order), and for the diagram and discussion, the most significant bits are considered to come first (MSB 0 bit numbering). The most significant bit is numbered 0, so the version field is actually found in the four most significant bits of the first byte, for example.

IPv4 header format
Offsets Octet 0 1 2 3
Octet Bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
0 0 Version IHL DSCP ECN Total length
4 32 Identification Flags Fragment offset
8 64 Time to Live Protocol Header checksum
12 96 Source address
16 128 Destination address
20 160 Options (if IHL > 5)
56 448
Version
The first header field in an IP packet is the four-bit version field. For IPv4, this is always equal to 4.
Internet Header Length (IHL)
The IPv4 header is variable in size due to the optional 14th field (options). The IHL field contains the size of the IPv4 header; it has 4 bits that specify the number of 32-bit words in the header. The minimum value for this field is 5,REFERENCE FOR RFC791 IS NOT DEFINED YET. You are invited to add it here. which indicates a length of 5 × 32 bits = 160 bits = 20 bytes. As a 4-bit field, the maximum value is 15; this means that the maximum size of the IPv4 header is 15 × 32 bits = 480 bits = 60 bytes.
Differentiated Services Code Point (DSCP)
Originally defined as the type of service (ToS), this field specifies differentiated services (DiffServ).REFERENCE FOR RFC2474 IS NOT DEFINED YET. You are invited to add it here. Real-time data streaming makes use of the DSCP field. An example is Voice over IP (VoIP), which is used for interactive voice services.
Explicit Congestion Notification (ECN)
This field allows end-to-end notification of network congestion without dropping packets.REFERENCE FOR RFC3168 IS NOT DEFINED YET. You are invited to add it here. ECN is an optional feature available when both endpoints support it and effective when also supported by the underlying network.
Total Length
This 16-bit field defines the entire packet size in bytes, including header and data. The minimum size is 20 bytes (header without data) and the maximum is 65,535 bytes. All hosts are required to be able to reassemble datagrams of size up to 576 bytes, but most modern hosts handle much larger packets. Links may impose further restrictions on the packet size, in which case datagrams must be fragmented. Fragmentation in IPv4 is performed in either the sending host or in routers. Reassembly is performed at the receiving host.
Identification
This field is an identification field and is primarily used for uniquely identifying the group of fragments of a single IP datagram. Some experimental work has suggested using the ID field for other purposes, such as for adding packet-tracing information to help trace datagrams with spoofed source addresses,[18] but any such use is now prohibited.REFERENCE FOR RFC6864 IS NOT DEFINED YET. You are invited to add it here.
Flags
A three-bit field follows and is used to control or identify fragments. They are (in order, from most significant to least significant):
  • bit 0: Reserved; must be zero.[lower-alpha 1]
  • bit 1: Don't Fragment (DF)
  • bit 2: More Fragments (MF)
If the DF flag is set, and fragmentation is required to route the packet, then the packet is dropped. This can be used when sending packets to a host that does not have resources to perform reassembly of fragments. It can also be used for path MTU discovery, either automatically by the host IP software, or manually using diagnostic tools such as ping or traceroute.
For unfragmented packets, the MF flag is cleared. For fragmented packets, all fragments except the last have the MF flag set. The last fragment has a non-zero Fragment Offset field, differentiating it from an unfragmented packet.
Fragment offset
This field specifies the offset of a particular fragment relative to the beginning of the original unfragmented IP datagram. The fragmentation offset value for the first fragment is always 0. The field is 13 bits wide, so that the offset can be from 0 to 8191 (from (20  – 1) to (213 – 1)). Fragments are specified in units of 8 bytes, which is why fragment length must be a multiple of 8.[19] Therefore, the 13-bit field allows a maximum offset of (213 – 1) × 8 = 65,528 bytes, with the header length included (65,528 + 20 = 65,548 bytes), supporting fragmentation of packets exceeding the maximum IP length of 65,535 bytes.
Time to live (TTL)
An eight-bit time to live field limits a datagram's lifetime to prevent network failure in the event of a routing loop. It is specified in seconds, but time intervals less than 1 second are rounded up to 1. In practice, the field is used as a hop count—when the datagram arrives at a router, the router decrements the TTL field by one. When the TTL field hits zero, the router discards the packet and typically sends an ICMP time exceeded message to the sender.
The program traceroute sends messages with adjusted TTL values and uses these ICMP time exceeded messages to identify the routers traversed by packets from the source to the destination.
Protocol
This field defines the protocol used in the data portion of the IP datagram. IANA maintains a list of IP protocol numbers.REFERENCE FOR RFC790 IS NOT DEFINED YET. You are invited to add it here.
Header checksum
The 16-bit IPv4 header checksum field is used for error checking of the header. When a packet arrives at a router or its destination, the network device calculates the checksum of the header including the checksum field. A value of 0xFFFF is expected. If a different result is obtained, the device discards the packet. Errors in the data portion of the packet are handled separately by the encapsulated protocol. Both UDP and TCP have separate checksums that apply to their data.
When a packet arrives at a router, the router decreases the TTL field in the header. Consequently, the router must calculate a new header checksum.
The checksum field is the 16-bit ones' complement of the ones' complement sum of all 16-bit words in the header. For purposes of computing the checksum, the value of the checksum field is zero.
Source address
This 32-bit field is the IPv4 address of the sender of the packet. It may be changed in transit by network address translation (NAT).
Destination address
This 32-bit field is the IPv4 address of the receiver of the packet. It may be affected by NAT.
Options
The options field is not often used. Packets containing some options may be considered as dangerous by some routers and be blocked.[20] The value in the IHL field must include sufficient extra 32-bit words to hold all options and any padding needed to ensure that the header contains an integral number of 32-bit words. If IHL is greater than 5 (i.e., it is from 6 to 15) it means that the options field is present and must be considered. The list of options may be terminated with the option EOOL (End of Options List, 0x00); this is only necessary if the end of the options would not otherwise coincide with the end of the header. The possible options that can be put in the header are as follows:
Field Size (bits) Description
Copied 1 Set to 1 if the options need to be copied into all fragments of a fragmented packet.
Option Class 2 A general options category. 0 is for control options, and 2 is for debugging and measurement. 1 and 3 are reserved.
Option Number 5 Specifies an option.
Option Length 8 Indicates the size of the entire option (including this field). This field may not exist for simple options.
Option Data Variable Option-specific data. This field may not exist for simple options.
The table below shows the defined options for IPv4. The Option Type column is derived from the Copied, Option Class, and Option Number bits as defined above.[21]
Option Type (decimal/hexadecimal) Option Name Description
0/0x00 EOOL End of Option List
1/0x01 NOP No Operation
2/0x02 SEC Security (defunct)
7/0x07 RR Record Route
10/0x0A ZSU Experimental Measurement
11/0x0B MTUP MTU Probe
12/0x0C MTUR MTU Reply
15/0x0F ENCODE ENCODE
25/0x19 QS Quick-Start
30/0x1E EXP RFC3692-style Experiment
68/0x44 TS Time Stamp
82/0x52 TR Traceroute
94/0x5E EXP RFC3692-style Experiment
130/0x82 SEC Security (RIPSO)
131/0x83 LSR Loose Source Route
133/0x85 E-SEC Extended Security (RIPSO)
134/0x86 CIPSO Commercial IP Security Option
136/0x88 SID Stream ID
137/0x89 SSR Strict Source Route
142/0x8E VISA Experimental Access Control
144/0x90 IMITD IMI Traffic Descriptor
145/0x91 EIP Extended Internet Protocol
147/0x93 ADDEXT Address Extension
148/0x94 RTRALT Router Alert
149/0x95 SDB Selective Directed Broadcast
151/0x97 DPS Dynamic Packet State
152/0x98 UMP Upstream Multicast Packet
158/0x9E EXP RFC3692-style Experiment
205/0xCD FINN Experimental Flow Control
222/0xDE EXP RFC3692-style Experiment

Data

The packet payload is not included in the checksum. Its contents are interpreted based on the value of the Protocol header field.

List of IP protocol numbers contains a complete list of payload protocol types. Some of the common payload protocols include:

Protocol Number Protocol Name Abbreviation
1 Internet Control Message Protocol ICMP
2 Internet Group Management Protocol IGMP
6 Transmission Control Protocol TCP
17 User Datagram Protocol UDP
41 IPv6 encapsulation ENCAP
89 Open Shortest Path First OSPF
132 Stream Control Transmission Protocol SCTP

Fragmentation and reassembly

Main page: IP fragmentation

The Internet Protocol enables traffic between networks. The design accommodates networks of diverse physical nature; it is independent of the underlying transmission technology used in the link layer. Networks with different hardware usually vary not only in transmission speed, but also in the maximum transmission unit (MTU). When one network wants to transmit datagrams to a network with a smaller MTU, it may fragment its datagrams. In IPv4, this function was placed at the Internet Layer and is performed in IPv4 routers limiting exposure to these issues by hosts.

In contrast, IPv6, the next generation of the Internet Protocol, does not allow routers to perform fragmentation; hosts must perform Path MTU Discovery before sending datagrams.

Fragmentation

When a router receives a packet, it examines the destination address and determines the outgoing interface to use and that interface's MTU. If the packet size is bigger than the MTU, and the Do not Fragment (DF) bit in the packet's header is set to 0, then the router may fragment the packet.

The router divides the packet into fragments. The maximum size of each fragment is the outgoing MTU minus the IP header size (20 bytes minimum; 60 bytes maximum). The router puts each fragment into its own packet, each fragment packet having the following changes:

  • The total length field is the fragment size.
  • The more fragments (MF) flag is set for all fragments except the last one, which is set to 0.
  • The fragment offset field is set, based on the offset of the fragment in the original data payload. This is measured in units of 8-byte blocks.
  • The header checksum field is recomputed.

For example, for an MTU of 1,500 bytes and a header size of 20 bytes, the fragment offsets would be multiples of [math]\displaystyle{ \frac{1{,}500-20}{8}=185 }[/math] (0, 185, 370, 555, 740, etc.).

It is possible that a packet is fragmented at one router, and that the fragments are further fragmented at another router. For example, a packet of 4,520 bytes, including a 20 bytes IP header is fragmented to two packets on a link with an MTU of 2,500 bytes:

Fragment Size
(bytes)
Header size
(bytes)
Data size
(bytes)
Flag
More fragments
Fragment offset
(8-byte blocks)
1 2,500 20 2,480 1 0
2 2,040 20 2,020 0 310

The total data size is preserved: 2,480 bytes + 2,020 bytes = 4,500 bytes. The offsets are [math]\displaystyle{ 0 }[/math] and [math]\displaystyle{ \frac{0+2{,}480}{8}=310 }[/math].

When forwarded to a link with an MTU of 1,500 bytes, each fragment is fragmented into two fragments:

Fragment Size
(bytes)
Header size
(bytes)
Data size
(bytes)
Flag
More fragments
Fragment offset
(8-byte blocks)
1 1,500 20 1,480 1 0
2 1,020 20 1,000 1 185
3 1,500 20 1,480 1 310
4 560 20 540 0 495

Again, the data size is preserved: 1,480 + 1,000 = 2,480, and 1,480 + 540 = 2,020.

Also in this case, the More Fragments bit remains 1 for all the fragments that came with 1 in them and for the last fragment that arrives, it works as usual, that is the MF bit is set to 0 only in the last one. And of course, the Identification field continues to have the same value in all re-fragmented fragments. This way, even if fragments are re-fragmented, the receiver knows they have initially all started from the same packet.

The last offset and last data size are used to calculate the total data size: [math]\displaystyle{ 495 \times 8+540=3{,}960+540=4{,}500 }[/math].

Reassembly

A receiver knows that a packet is a fragment, if at least one of the following conditions is true:

  • The flag more fragments is set, which is true for all fragments except the last.
  • The field fragment offset is nonzero, which is true for all fragments except the first.

The receiver identifies matching fragments using the source and destination addresses, the protocol ID, and the identification field. The receiver reassembles the data from fragments with the same ID using both the fragment offset and the more fragments flag. When the receiver receives the last fragment, which has the more fragments flag set to 0, it can calculate the size of the original data payload, by multiplying the last fragment's offset by eight and adding the last fragment's data size. In the given example, this calculation was [math]\displaystyle{ 495 \times 8+540=4{,}500 }[/math] bytes. When the receiver has all fragments, they can be reassembled in the correct sequence according to the offsets to form the original datagram.

Assistive protocols

IP addresses are not tied in any permanent manner to networking hardware and, indeed, in modern operating systems, a network interface can have multiple IP addresses. In order to properly deliver an IP packet to the destination host on a link, hosts and routers need additional mechanisms to make an association between the hardware address[lower-alpha 2] of network interfaces and IP addresses. The Address Resolution Protocol (ARP) performs this IP-address-to-hardware-address translation for IPv4. In addition, the reverse correlation is often necessary. For example, unless an address is preconfigured by an administrator, when an IP host is booted or connected to a network it needs to determine its IP address. Protocols for such reverse correlations include Dynamic Host Configuration Protocol (DHCP), Bootstrap Protocol (BOOTP) and, infrequently, reverse ARP.

See also

Notes

  1. As an April Fools' joke, proposed for use in RFC 3514 as the "Evil bit"
  2. For IEEE 802 networking technologies, including Ethernet, the hardware address is a MAC address.

References

  1. "BGP Analysis Reports" (in en-US). http://bgp.potaroo.net/index-bgp.html. 
  2. "IPv6 – Google". https://www.google.com/intl/en/ipv6/statistics.html. 
  3. "IANA IPv4 Special-Purpose Address Registry". https://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xhtml. 
  4. "A Brief History of IPv4" (in en). https://ipv4marketgroup.com/a-brief-history-of-ipv4/. 
  5. "Understanding IP Addressing: Everything You Ever Wanted To Know". 3Com. http://www.3com.com/other/pdfs/infra/corpinfo/en_US/501302.pdf. 
  6. 6.0 6.1 Cite error: Invalid <ref> tag; no text was provided for refs named rfc6890
  7. 7.0 7.1 Cite error: Invalid <ref> tag; no text was provided for refs named rfc1918
  8. 8.0 8.1 Cite error: Invalid <ref> tag; no text was provided for refs named rfc5737
  9. Cite error: Invalid <ref> tag; no text was provided for refs named rfc5771
  10. Almquist, Philip; Kastenholz, Frank (December 1993). "Towards Requirements for IP Routers". Internet Engineering Task Force. https://datatracker.ietf.org/doc/draft-ietf-rreq-iprouters-require/00. 
  11. "Understanding and Configuring the ip unnumbered Command" (in en). https://www.cisco.com/c/en/us/support/docs/ip/hot-standby-router-protocol-hsrp/13786-20.html. 
  12. "World 'running out of Internet addresses'". http://technology.inquirer.net/infotech/infotech/view/20110121-315808/World-running-out-of-Internet-addresses. 
  13. Smith, Lucie; Lipner, Ian (3 February 2011). "Free Pool of IPv4 Address Space Depleted". Number Resource Organization. http://www.nro.net/news/ipv4-free-pool-depleted. 
  14. ICANN,nanog mailing list. "Five /8s allocated to RIRs – no unallocated IPv4 unicast /8s remain". http://mailman.nanog.org/pipermail/nanog/2011-February/032107.html. 
  15. Asia-Pacific Network Information Centre (15 April 2011). "APNIC IPv4 Address Pool Reaches Final /8". http://www.apnic.net/publications/news/2011/final-8. 
  16. 2016 IEEE International Conference on Emerging Technologies and Innovative Business Practices for the Transformation of Societies (EmergiTech). Piscataway, NJ: University of Technology, Mauritius, Institute of Electrical and Electronics Engineers. August 2016. ISBN 9781509007066. OCLC 972636788. 
  17. Partridge, C.; Kastenholz, F. (December 1994), Technical Criteria for Choosing IP The Next Generation (IPng), p. 26. sec. 6.2, doi:10.17487/RFC1726, RFC 1726, https://tools.ietf.org/html/rfc1726 
  18. Savage, Stefan (2000). "Practical network support for IP traceback". ACM SIGCOMM Computer Communication Review 30 (4): 295–306. doi:10.1145/347057.347560. 
  19. Bhardwaj, Rashmi (2020-06-04). "Fragment Offset - IP With Ease" (in en-US). https://ipwithease.com/fragmentation-offset/. 
  20. "Cisco unofficial FAQ". http://www.faqs.org/faqs/cisco-networking-faq/section-23.html. 
  21. "Internet Protocol Version 4 (IPv4) Parameters". https://www.iana.org/assignments/ip-parameters/ip-parameters.xhtml#ip-parameters-1. 

External links