TDTS02 - 3 - IPv4 - ida.liu.seTDTS02/TDTS02-M3.pdf · 3 IPv4 Header §Version §Header length §TOS/DS §Total length §Identification §Flags §Fragmentation offset §Time to live

1

David [email protected]/ADIT/IISLAB

IPv4PROTOCOL – ADDRESSING – DEPLOYMENT – TRICKS

2

IPv4 (Review)

§ Connectionless§ Best effort delivery§ Transport independent

VERSION HEADER LENGTHTOS/DS TOTAL LENGTH

IDENTIFICATION FLAGS FRAGMENTATION OFFSET

TIME TO LIVE PROTOCOL CHECKSUM

SOURCE ADDRESS

DESTINATION ADDRESS

§ Fixed header§ Options (optional)§ Data payload

3

IPv4 Header

§ Version§ Header length§ TOS/DS§ Total length§ Identification§ Flags

§ Fragmentation offset§ Time to live§ Protocol§ Header checksum§ Source address§ Destination address

The IPv4 header contains a number of header fields, and we’ll look at some of the more important ones. The version and length fields are pretty obvious, so we’ll disregard these for now.

Beginning from the top, we have the type of service field. Early on this field was considered very importan, but in actual factit was not supported by TCP/IP implementations for many years, and by today’s standards, the definition of the field is less than optimal. The field consists of three bits of precedence which are usually ignored, followed by four bits that define the type of service and one bit that must be set to zero. The four TOS bits are Minimize Delai, Maximize throughput, Maximizereliability and Minimize monetary cost. One of these should be set as appropriate by the application. For example, an interactive application should set the minimize delay bit, while a bulk data transfer application should set the maximizethroughput bit. Today we find that operating systems and routers can make decisions based on the TOS field. For example, a Linux-based workgroup router can use the TOS field to classify packets, thereby implementing a crude form of QoS.

The meaning of the TOS field has changed considerably from its original definition. Today, the TOS field is on the way to becoming the DSCP and ECN fields.

DSCP stands for Differentiated Services Code Point, and identifies the per-hop behavior requested for the packet in a diffserv-enabled network. The differentiated services proposal is a standards-track RFC published in 1998. It defines a six-bitDSCP field in IPv4 and IPv6. Each value of the field indicates a particular per-hop behavior requested for the packet. This behavior might be to maximize throughput or minimize delay. Unlike the original TOS field, which requested a particularend-to-end treatment of the packet, the DSCP field only requests behavior on a hop-by-hop basis, which is far morecompatible with the thinking behind IP. The DCSP field is further subdivided into classes in a similar manner to how IP addresses were once divided into classes. Code points ending with 0 are reserved for standard PHBs. Code points endingwith 01 are allocated for expermiental purposes. All other code points (ending with 11) are reserved for future use.

The ECN field is used to provide explicit congestion notification. It has been observed that higher-level protocols couldbenefit from explicit congestion notification instead of having to rely on second-hand data such as missingacknowledgements or timeouts. The field is designed in such a way that it should never be confused with a valid TOS field. We’ll look at ECN a little later.

The following 32 bits are mostly concerned with fragmentation, and we’ll talk about fragmentation and its consequences a little later.

The identification field is assigned by the sending host, and is used to identify a particular IP packet. Initially one mightassume that identification is a very important feature, used to ensure that packets arrive in sequence or at least that they’renot duplicated, but on second thought you should realize that as IP is a best-effort service, this type of functionality belongsin higher level protocols. So what is the identification field for? We’ll return to the identification field when we look at ICMP and at IP fragmentation.

Moving on we have the flags field. Although three bits are reserved, only two are used. One is the DF bit, or don’t fragment bit, and the other is the more fragments bit. We’ll look at these later. The third bit (which is the high order bit) must be zero. There is, however, an RFC that proposes to call this the Evil bit, and require that it be set on all packets that have evil intents. The RFC was published on April 1 2003. . .

The fragmentation offset field is also used for fragmentation. We’ll get to that later.

The next field is the TTL field. This eight-bit field is used to limit the lifetime of a packet. The most important use of this fieldis to ensure that packets are not forwarded forever around the network, but the field can actually be used in certainapplications.

4

IP Addressing

5

IPv4 Addresses

§ Five address classes:� A – 16 million addresses� B – 65000 addresses� C – 255 addresses� D – Multicast groups� E – Reserved

§ Prefix determines class� 0 – Class A� 10 – Class B� 110 – Class C� 1110 – Class D� 11110 – Class E

§ Addresses are all the same� No address classes� No fixed network boundaries

§ Explicit netmask� Determines network size� Determines address prefix

§ Addresses are attached to interfaces

IPv4 addresses are 32-bit unsigned numbers. They are usually written out in dotted-quad notation (A.B.C.D).

Once upon a time, IPv4 addresses were divided into a number of address classes. Class A addresses belonged to networks with up to 16 million addresses. Class B addresses belonged to networks with up to 65000 addresses. Class C addresses belonged to networks with up to 255 addresses. There were also classes D, for multicast group addressesand E, reserved for future use.

The whole sordid business with address classes is something to be remembered with horror. At a time when 32 bits of address space seemed plentiful, and processing power and memory was expensive, it may have been a good idea, but it was not an idea that scaled. The main problem was that the scheme resulted in so much unused space. Class C networks, with 255 addresses, were far too small for many organizations, who would then allocate a class B network. Few of these organizations actually used all 65535 addresses available to them. Even worse, large organizations wouldallocate class A networks, possibly because a class B was too small and possible because it was more convenient that way. Few organizations actually used all the address space allocated to them.

Furthermore, the network sizes were not always convenient, and to deal with that problem the concepts of subnetting and supernetting were born. These were confusing days where classful addressing was mixed with something else. We hadto deal with ”subnet masks”, ”supernet masks” and VLSM on equipment and with protocols that only understood classfuladdresses.

After some time, address space exhaustion became a real issue, and in 1993 the concept of Classless Inter-DomainRouting (CIDR) was introduced in RFC1518. In 1996, RFC 1917 was published, asking the Internet community to returnunused address space to IANA, so that it could be reallocated.

The fundamental difference between CIDR and classful addressing is that in CIDR the address space is viewed as a single contiguous space. There are no network boundaries inherent in the address space. Instead, they are created as needed. In this way, CIDR is more powerful, yet simpler and more intuitive than the old system. Addresses are alwaysaddresses. When we need to talk about networks, we define their boundaries explicitly. We can easily talk about parts of a network or aggregates of networks. Everything fits into the same scheme.

Understanding CIDR and network numbering is very important when looking at network design and routing. In networkdesign, dividing an available address space in a suitable way can be key to long-term maintainability. In routing, all modern protocols depend on CIDR addressing.

Despite the fact that CIDR has been in active use for over ten years, classful addressing is still covered in detail in manytextbooks, sometimes to the exclusion of CIDR. I actually use that as a test of quality – a textbook on computer networkspublished in the last three years should mention classful addressing only as an anachronism.

6

CIDR Notation

§ A.B.C.D/L� A.B.C.D – IPv4 Address� L – Prefix length

§ The prefix� Long prefix: more networks� Short prefix: larger networks� Corresponds to the netmask

Examples§ 130.236.178.12/32

� A single address

§ 130.236.178.0/24� Network with 255 addresses

§ 130.236.0.0/16� Network with 65k addresses� An old class B network

§ 130.128.0.0/12� Network with 1M addresses� Aggregate of 16 old class B

networks

CIDR notation is pretty simple.

The departure point is that when we talk about networks and not just individual addresses, we have to draw a line in the address between the part that denotes the network and the part that denots an address within that network. The classfulsystem did this by hard-coding a couple of different variants. In CIDR we explicitly state where that line is drawn.

The part of the address that denots the network itself is called a prefix. The prefix length in CIDR notation defines howmany bits long the prefix is. For example, when we write 130.236.178.0/27, the prefix length is 27 bits. The first 27 bits of the address denote the network; addresses within that network are created by setting the last five bits of the address.

There are some semi-special cases here. For example a /32 is a network with a single address. In practise, however, a /32 prefix is used to indicate a single host or interface address, not a network with no addresses in it. The other semi-special case is the empty prefix, /0. You’ll rarely ever see it, but it plays a very important part in routing. Essentially a /0 prefix indicates that we’re talking about the entire address space as a single network. These cases aren’t reallyexceptions, just edge cases that tend to be used in specialized ways.

In the literature, and in practice, you’ll often see something called a netmask. The netmask and prefix length are more or less equivalent. At one point in time they weren’t, but not they are. The netmask is simply a 32-bit word that indicateswhich bits of an IPv4 address belong to the network part of the address. If bit number n in the netmask is 1, then the corresponding bit in the address is part of the network number. Using this definition it would appear that the network part of the address could consist of non-consecutive bits (consider, for example, the netmask 255.255.255.1). In the earlydays, this was possible, but today netmasks must consist of zero or more ones followed by enough zeros to fill out 32 bits.

The correspondence between prefix length and netmask is simple. The prefix length is the number of bits that are one in the netmaks. For example, the prefix length 16 gives a netmask of 255.255.0.0 and the prefix length 28 give the netmask255.255.255.240.

In some cases, particularly if you start looking at Cisco equipment, you’ll see inverted netmasks, where the ones indicatethe host part of the address and the zeros indicate the network part of the address.

One final point about addressing. With a prefix length of n, the number of addresses on the network is 2(32-n)-1, but the number of usable addresses is two fewer than that. One address is used for broadcast (host part is all ones) and one(host part all zeros) is used to indicate the network itself (and was at one point used for broadcasts).

7

Try it out!

§ What is the netmask of� 130.236.0.0/16� 112.54.67.0/28� 54.128.0.0/9

§ How many hosts on� 212.112.0.64/28� 64.128.0.0/9� 122.14.68.12/30� 130.236.189.0/31

§ What prefix length� 255.255.255.0� 255.255.192.0� 255.252.0.0

§ What the…� 0.0.0.255� 0.3.255.255

8

Aggregation and subnetting

Subnetting:§ Creating more but smaller

networks by extending the prefix.

§ Exposes internal details of the network

Aggregation§ Creating fewer but larger

networks by shortening the prefix.

§ Hides internal details of the network.

9

Subnetting example (basic)

192.0.2.0/24192.0.2.0/26

192.0.2.128/26

192.0.2.64/26

192.0.2.192/261 1

1 0

0 1

0 0

A concept important to IPv4 networking is network aggregation. Several networkscan be combined to a single network with a shorter prefix. When combiningnetworks, this is called aggregation. When splitting them, it is called subnetting.

Here’s an example of subnetting a /24 into four /26s. By extending the prefix by two bits, I get four networks where previously I had one. Subnetting is usedextensively when deploying IPv4 networks.

10

Subnetting example (VLSM)

192.0.2.0/24

192.0.2.64/26

192.0.2.0/27

192.0.2.128/25

192.0.2.32/27

0 1

1

0 0 0

0 0 1

Here’s a more complicated example. We take the same network, but this time weextend the prefix by several different amounts to create four networks of differingsizes.

Extending the prefix by one bit gives us two /25s. We’ll allocate the one wherethe additional bit is one to the purple network. We extend the prefix another bit, noting that the first high bit of these two bits now has to be zero. We allocate the prefix where the second bit is also one to the blue network. Next we extend the prefix by another bit, noting that the top two bits have to be zero, and allocate the two new networks to green and red.

This is another example that demonstrates that you don’t need to create equal-sized subnets. This technique used to be called variable-length subnetting(VLSM). That’s a useful term to know, but with CIDR, variable-sized subnetsaren’t all that special. They used to be.

11

Aggregation example

§ Combine networks� A shorter prefix covers more

networks than a longer one

§ What about holes?� If you own the space – OK!� If you don’t – be careful

192.0.2.64/26

192.0.2.0/27

192.0.2.128/25

192.0.2.0/24

12

Subnet planning

Good planning§ Minimal renumbering§ Support for expansion§ Minimal waste

Bad planning§ Frequent renumbering§ Expansion is difficult§ Need more address space

I was bound to use one of these eventually…

When configuring a network, proper address planning is important. If you get it right, networkmanagement can focus on the real issues. If you get it wrong, you’re in trouble.

The biggest problem with getting it wrong is that you end up renumbering things a lot, and renumbering is not fun at all. It may seem simple, but it’s not. A real-world example might be in order:

A network had to be renumbered because it was necessary to aggregate it with a neighboringnetwork that had run out of space. For each host on the network (about 30 of them), the followingthings had to be changed:

•The static configuration of the network address and netmask.

•The DNS address and pointer records.

•Access control lists on three file servers (the ACLs limited what machines could connect).

•The address in monitoring software.

•The address in remote management software.

•The address in the asset database.

Furthermore it was necessary to update software licenses on several machines because thesewere tied to the host address. All in all, quite a lot of work.

13

Subnet it!

Problem 1:You have 130.236.0.0 to play with. You have subdividedthis network at two levels. You’ve chopped the networkinto as many /24s as possible, and each /24 is chopped up into as many /28s as possible.

How many useable addressesdoes your network have?

Problem 2:You have 130.236.189.0 to play with. You need ninesubnets with at 12 useableaddresses, eight with at threeuseable addresses and 12 addresses left over that are not part of any subnet.

How should you divide the network?

14

Addressing PtP Links

112.

212.

6.34

/31

112.212.6.35/31

112.212.6.34/31

§ Use /30s� Safe but wastes space

§ Use unnumbered interfaces� Special-case management� Not always available

§ Use /32s� Tricky to manage correctly� May introduce new problems

(e.g. no broadcast)

§ Use /31s� Requires some configuration� Standard and portable

Broadcast:255.255.255.255

Point-to-point links are something of a special case. These frequently show up in larger networks as links betweenrouters. Such a link is typically implemented as a two-address network. The problem is that we have to use a /30 to get a two-address network, using a total of four addresses – a 100% overhead! In other words, implemented this way, a point-to-point link wastes as many addresses as it uses. Although using a /30 is the standard interoperable solutions, there are other options.

One solution to the problem is to allocate endpoints as /32s, i.e. not create a network out of the link. Although this can be made to work, it can cause problems, since mosr routing protocols assume that directly connected endpoints constitute a network.

Some vendors support unnumbered interfaces. Because of the nature of a point-to-point link, the endpoints don’t reallyneed to be adressable. Each endpoint knows that if it sees traffic it did not initiate, the traffic is destined for that endpoint. There are no other possible targets. Although using unnumbered interfaces will work, managing the endpoints becomesless straightforward.

Another approach, defined in RFC 3021 is to use /31s. This technique illustrates some interesting points with IP addressing.

Remember that an IP address is essentially a tuple <NET,HOST>. HOST set to zero and HOST set to -1 (all ones) are reserved. The former is the network address, used by some obsolete equipment as the broadcast address, and the latteris the broadcast address. This kind of broadcast address is called a directed broadcast. The key point to making /31s work seamlessly is to realize that there are actually two broadcast addresses. Besides the directed broadcast address, there’s an undirected broadcast address, <-1,-1> (255.255.255.255). If it were possible to convince the endpoints to useundirected broadcast over the link rather than directect broadcast, the directed broadcast address could be used for something else. The network address can also be used for something else, provided the endpoints aren’t too old.

The end result is that we can use a /31 for the endpoints and 255.255.255.255 for broadcasts. Configuring this on a router or host is usually not a problem at all.

In a Cisco box, all you do is sayinterface if0ip address 10.1.2.3 255.255.255.254no ip directed broadcast

On a Linux host you could say sayip addr add 10.1.2.3/31 broadcast 255.255.255.255 dev if0

15

Special addresses

Multicast groups224.0.0.0/4

Network interconnectbenchmark testing

192.18.0.0/15

6to4 Relay Anycast192.88.99.0

Test-Net192.0.2.0

RFC1918172.16.0.0/12

Link local addresses169.254.0.0/16

Loopback127.0.0.0/8

Public-data networks14.0.0.0/8

RFC191810.0.0.0/8

”This network”0.0.0.0/8

Even though address classes are a thing of the past, there are a number of special-use address blocks, and they’repretty important to know about when doing serios practical networking.

The block that most people know about is the RFC1918 block of private addresses. These are 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16. These addresses should never appear on the Internet as they are non-routable betweenadministrative domains (within a single domain, routing may work). The availability of RFC1918 space, and the use of network address translation to kluge mostly one-way connectivity for hosts with these addresses has helped slow the depletion of IPv4 addresses and delay deployment of IPv6. I don’t know if IPv6 will have a corresponding block of addresses. For a while it appeared that we might be spared the horror, but I think they’re back in again.

Another block that shows up a lot is the 169.254 block, which is used for link-local addresses. The idea here is that hostson the same network link (same broadcast domain, presumably) can use these addresses safely. If you turn on a Windows machine, set it to get its address through DHCP and then don’t have a DHCP server, you’ll get an address from this block.

The 127 block is used for something called loopback. When a host wants to talk to itself, it’s inconvenient and slow to putthe data on the actual wire. Instead, hosts have a loopback interface, which generally has the address 127.0.0.1 (there’snothing stopping you from having other addresses too).

The 0 block is for ”this net”. The address 0.0.0.0 can be used to refer to ”this host”, whereas addresses with the networkportion of the address set to zero (and hence in the 0.0.0.0 block) can be used to refer to hosts on this network. The 0.0.0.0 address is used quite a bit, but I’ve never encountered any of the others in practice.

The last important block is the 224.0.0.0/4 block. This block is used for multicast group addresses. Each application (e.g. the RIPv2 routing protocol) is assigned an address from this block, and multicast packets for that application are thensent using that address as the destination.

Finally, 255.255.255.255 is reserved for non-directed broadcast.

In the past, more blocks were reserved. The highest and lowest network in each address class was reserved, and therewere reservations for a few specific applications. These blocks are still unused, but will probably be allocated in the future.

16

Other networksOther networksI can reach8.6.3.0/24

I can reach8.6.3.0/24

PI and PA addresses

§ Multihoming� Better reliability� Better performance

§ Two kinds of addresses� Provider independent� Provider aggregable

QwestQwestTeliaNetTeliaNet





Using provider independent addressesUsing provider aggregable addresses

Among the regular addresses, there are two kinds. We talk about provider independent addresses and provider aggregable addresses. The difference is important when attempting to deploy multi-homing(connecting to more than one network provider).

Multihoming means connecting to more than one network provider. The most important reason to multihomeis that it gives redundancy. If the link to one provider stops working (or that provider loses connectivity with others), there is an alternate path for packets to take. Most organizations don’t multihome, because it’sdifficult and most people don’t need that level of reliability. Another reason to multihome is for performance. Although it’s overly optimistic to expect to do load balancing over the two connections, performance can be increased for traffic going to the directly connected providers and for networks to which the two providershave different performance.

When you start thinking about multihoming, you notice that not all IP addresses are created equal.

PI addresses belong to the organization to which they are assigned. They are not part of a block assigned to a service provider. This means that the organization can connect to any (willing) network provider and get connectivity without changing any addresses. They will have to set up a proper routing session, but it willwork.

PA addresses belong to a block of addresses allocated to a service provider. They are non-portable, so if an organization using PA addresses moves to a new provider, they will have to renumber their network. Multi-homing using PA addresses is tricky, because for it to work you have to convice a provider to announceaddress space belonging to another provider through whatever routing protocol (i.e. BGP) they’re using. It can be done, but I think multi-homing is easier with PI addresses.

Multihoming isn’t easy to do regardless of what kind of addresses you have. At the Internet level, announcements of address blocks smaller than /24 are generally filtered out, and some providers filter at /22 or even /20. This means that in order to get reliable multihoming using PI addresses, you need to get a /20 or larger, and in order to do so, you need to demonstrate a need for that many addresses.

Interestingly enough, multihoming using PA addresses does not suffer from the same problem. Anyoneknow why?

17

Fun with RFC1918

§ ”Private addresses”� 192.168.0.0/16� 172.16.0.0/12� 10.0.0.0/8

§ Eliminates peer-to-peerproperties of the Internet

From 10.1.2.1To 44.1.3.2

From 64.6.1.1To 44.1.3.2

From 44.1.3.2To 64.6.6.1

RFC1918 defines a class of ”private” addresses.

Creation of this address space was motivated by several factors. One was the fact that at the time IPv4 space wasrunning out. Another was that for hosts that do not need connectivity to the Internet, using a private address space wouldbe convenient since those addresses would not need to be changed if the enterprise switched network providers. The final motivation is that some hosts don’t need, or shouldn’t have, network-level connectivity.

Today, RFC1918 addresses are becoming the norm. The factor that has allowed this disaster to happen is somethingcalled NAT – Network Address Translation. This is a technology that makes it possible for hosts with private addresses to make outgoing connections to hosts that have public addresses.

Question – does everyone know how NAT works? If not, improvise an explanation.

A lot of people claim that NAT is a good thing, and to a point they’re right. NAT is great for hosts that shouldn’tcommunicate outside the borders of the enterprise. NAT is convenient when your network provider won’t give you moreaddresses and you’re too lazy to make them. But those aren’t the reasons that are being bandied about.

I hear two arguments for NAT, and I don’t buy either one.

The first argument is that IPv4 address space will run out without NAT. This is true, but not necessarily a bad thing. If weuse NAT for things that NAT are good fore, we’ll avoid wasting a lot of addresses. If corporations who have multiple /8s allocated give one back, that’ll help. And the fact is that there still are a lot of /8s available. So yes, without NAT we’d runout of addresses faster, but all that means is that IPv6 would happen sooner.

The other argument, which is completely and utterly bogus, is that NAT gives you great security. After all, if the hacker can’t connect, they can’t crack you. Right? Wrong. Firstly, there are attacks that can go through a NAT gateway, especially if there are non-RFC1918 addresses behind it (and there often are). Secondly not all attacks come through the firewall. The Slammer worm should have made that clear to anyone who doubted it. Bank of America lost their cash machines because they were running MS-SQL, and that network wasn’t even connected to the outside world. A couple of airlines were grounded for a day for the same reason. What NAT gives you is a false sense of security, and that’s a bad thing.

The big problem with NAT is that it makes the Internet a one-way thing. You’ve probably all heard about this new-fangledthing the kids call ”peer-to-peer”? Well, all of the Internet used to be like that. Peer-to-peer basically means that two hostscan connect to each other and be equals. With NAT, that’s not possible. One will always be the client (the one makingthe connection) and the other always the server. I can talk to you, but you can’t talk to me.

This property of NAT stands in the way of many applications. Now that people in general have discovered peer-to-peertechnology, we have an opportunity to really start doing innovative stuff based on those old ideas. But as long as thosepeople sit behind NAT gateways, we can’t. IP telephony is another example of an application that doesn’t like NAT. As long as your computer is behind a NAT gateway, doing IP telephony will require assistance and resources from whoeverput you there.

So what are RFC1918 addresses really good for. I’ll give you some examples.

18

Network Address Translation

52358092.11.2.101240110.1.2.4

52348066.20.12.1623910.1.2.6

52332566.20.12.14452110.1.2.1

My port

Outsideport

Outsidehost

Inside port

Inside host

SRC: 66.20.12.1 port 80DST: 130.236.178.16 port 5235SRC: 66.20.12.1 port 80DST: 130.236.178.16 port 5235


52348066.20.12.1623910.1.2.6

52332566.20.12.14452110.1.2.1

My port

Outsideport

Outsidehost

Inside port

Inside host



Network Address Translation, or NAT, is a technology designed to make it possible to have limited Internet connectivityeven when using RFC1918 address space. In general, I advise against using NAT – it makes life difficult and give little or no benefit.

NAT is basically a transport protocol aware router.

When a host inside a NAT gateway initiates a connection to the outside – lets assume it’s a TCP connection – the NAT router makes a note of which private address initiated the connection and on what port. This goes into a table togetherwith the destination address. The router then changes the source address to its own address and changes the sourceport to a source port of its own choosing (that port also goes into the table). Finally it forwards the packet.

When the destination host gets the packet, it appears to come from the NAT router, so that’s where it’s returned. The NAT router, upon receiving a packet, looks up the source host, source port and destination port in the table to find whichhost the packet should go to. It then rewrites the destination address to the private address of the real destination hostand the destination port to the real destination port.

So far so good.

We get into trouble with ICMP packets – the NAT router needs to be able to rewrite these too. That can be done. We get into trouble with connectionless protocols such as UDP. That’s not easy to solve. We get into trouble with unknown IP protocols. That’s also hard to solve.

Finally we get into trouble when someone wants to connect from the outside. This actually happens in some protocols. For example, in FTP, the server initiates connections to the client when it wants to send the client data. Dealing with this sort of thing in a NAT router (or for that matter any packet-filtering firewall) is hairy, to say the least.

Not only does NAT break the Internet, but putting large network behind NAT means that the NAT box is a bottleneck. Itsreliability and its performance determine the reliability and performance of all outside connections.

19

Loopback

IP INPUT QUEUE

ENQ

UEU

E

DEQ

UEU

E

Sent toLoopback?

Broadcast orMulticast?

ETHERNETSEND

ETHERNETRECEIVE

Sent to IFaddress?

NO

NO

NO

YES

YES

YES

Receive IP packet

Send IP packet

ETHERNET INTERFACE

Loopback is not really an addressing concept, but something you see on equipment connected to the network. I bring it up only because loopback addresses are important when deploying networks of routers, and to understand what a loopback address is all about, it’s pretty important to understand what the loopback interface is.

This slide shows a simplified diagram of how a packet flows through the network code of a host. There are plenty of details I’ve ignored, but you get the picture

Hosts often have to talk to themselves. You know how people sometimes do the same thing? Well, when people talk out loud to themselves, we think they’re a little strange. It’s the same with computers. Instead of putting data out on the wire, the computer places data to itself directly on its input queue, so it doesn’t pointlessly waste resources. You get prettywicked performance that way too. Putting data from the host to itself directly on its input queue is called loopback.

The loopback interface is a special interface, like the ethernet interfaces or serial interfaces or VPN endpoints, that is directly connected to the input queue. Any data sent on the loopback interface will be available to the computer itself as received data.

The point of this is that the loopback interface can have a well-known address that never changes, regardless of whetherthe external addresses change, and regardless of how many addresses the host has (if the host has ten addresses, which one should be used to send data to itself). Although the 0.0.0.0 address always refers to ”this computer”, it’s not always a good idea to use it. Instead, we use a well-known non-zero IP address. That address is almost always127.0.0.1. In fact, the entire 127/8 address space is reserved for the loopback interface. A bit of a waste if you ask me, but it’s traditional.

Since 127.0.0.1 exists on every piece of equipment connected to the network, it’s not permissible to make that addressvisible to the outside. That means a router can never announce to the world ”i know how to reach 127.0.0.1” (it would be ignored if it did). It means that packets for a particular box will never arrive with the destination address 127.0.0.1 (and ifthey did, the box might still ignore them).

A trick that’s commonly used in the network world is to add a real IP address to the loopback interface. For example, I have a router that connects five networks in the lab. The loopback interface on this machine has the address130.236.189.4 and the netmask 255.255.255.255. That means that that particular address is not part of a network. It’s an address all on its own. The router announces this address to the world, so it’s actually included in all the routing tables.

Any idea why I did this? Think about it for a couple of minutes. We’ll get to that when I talk about routing.

20

IANA allocates blocks (usually /8s) to regional registries (RIRs)RIRs allocate blocks to national and local registries (NIRs and LIRs)NIRs and LIRs allocate (mostly PA) addresses to end users

Address Allocation

21

Explicit Congestion Notification

22

Dealing With Congestion

End-to-end (e.g. IP)§ No support for congestion

control in the networklayer

§ Congestion is inferred from other events

Network assisted (e.g. ATM)§ Explicit network layer

support for congestioncontrol

§ Congestion can be managed

ACTIVE QUEUE MANAGEMENT

As you may recall from the basic network course, there are two approaches to congestion control. One is sometimescalled end-to-end congestion control, and the other network-assisted congestion control. They key difference is that in end-to-end congestion control, the endpoints have to infer congestion from secondary events, such as packet loss, whereas in network-assisted congestion control the network layer itself can explicitly indicate congestion.

TCP/IP is the usual example of end-to-end congestion control. Network assisted congestion control appears in ATM, DECnet and SNA, to mention a few.

End-to-end congestion control is nice because it’s simple. But in the face of congestion, it’s not terribly efficient. There is no way for the network to tell communicating parties to back off before congestion occurs. When congestion finally doeshappen, network performance drops rapidly.

Recall why congestion happens. Full queues.

When a router has a full queue, it has to start dropping packets. There are several strategies for this. One, perhaps the most common strategy, is to drop incoming packets (tail drop). Another is to drop packets at the head of the queue (headdrop). A third is to randomly drop packets from the queue (random drop). Other strategies, such as dropping packets with a specific TOS, TC or DSCP are also possible.

Regardless of which approach is taken, there is a problem. The queues are still full, and full queues are a bad thing.

End-to-end congestion control in the IP world is not built in such a way that a congested network is moved from a congested state to a steady state where the queues are not full. In fact, a congested network will have a tendency to staycongested. Coupled with bad queue management, the end result can be pretty horrible. For example, one cable modem type I know of has a queue management strategy that guarantees practically zero TCP throughput if the incoming packet rate is higher than the outgoing packet rate.

What we want is something that will encourage the network to find a steady state where the queues are not full.

One strategy being used on the Internet is called Random Early Drop (RED). This is an example of an active queuemanagement algorithm (AQM). With RED, routers will start dropping packets randomly when it seems that congestion is about to occur. Since endpoints tend to use packet drops as an indication of congestion, this will cause the endpoints to cut back on their sending rate. If you recall TCP congestion control, a dropped packet would cause TCP to cut the congestion window in half. By dropping random packets, there is no bias to any particular type of connection.

So AQM is one approach that helps prevent congestion, which is good, but it still drops packets, which isn’t.

A proposal for network-assisted congestion control in IP was published in 1994. After testing a proposed standard (RFC3168) for Explicit Congestion Notification in IP was published. This is an attempt to introduce network-assistedcongestion control in IP.

23

ECN

§ New field in IP header� Low two bits of TOS� Four ECN codepoints

§ New flags in TCP header� Explicit Congestion Echo� Congestion Window Reduced

DSCP ECN

Not-ECT 0 0ECT(1) 0 1ECT(0) 1 0CE 1 1

FIN

SYN

RST

AC

K

UR

G

ECE

CW

R

PSHIP Header

TCP header

Reserved

Explicit congestion notification is a non-backwards compatible change to IP. As it happens, the significant location of the change and the details of the implementation minimize the potential compatibility problems.

The ECN codepoint is a two-bit field at the bottom of the old TOS field. There are a number of reasons for using two, rather than one, bit. Those are all in the RFC, but I’m not going to go over them here (and I don’t remember them all). When the two bits are both zero, the packet does not use ECN. Either the endpoints don’t know ECN or there are otherreasons for not using it. The values 10 and 01 both indicate that the endpoints are ECN-aware and the network shoulduse ECN. The value 11 indicates a Congestion Experience (CE), which tells the receiver that the network is experiencingcongestion.

Note that ECN is a form of forward congestion notification. The network tells the receiver that congestion has occurred, not the sender. The receiver is then responsible for notifying the sender. You should compare this to how EFCI works in ATM networks.

The ECN proposal also defines additions to the TCP header. Although the transport layer is outside the scope of this course, we have to look at how it reacts to ECN, and hence how it has been extended.

ECN is pretty simple in theory. A host wanting to use ECN will set ECT(0) or ECT(1) in the IP header. This signals to anyrouters on the way that the host wants to use ECN. Setting Non-ECT indicates that the host does not want to use (or does not know about) ECN. Routers that are experiencing congestion may set the CE (Congestion Experience) codepoint. This should be set on any packets that would otherwise have been dropped due to AQM. A host receiving a packet with the CE codepoint set should react to it exactly as it would react to a congestion event detected throughpacket loss. For TCP, that means cutting the congestion window in half. The host may take other action.

The reason the CE codepoint is treated almost exactly like a regular packet loss is that this makes gradual deploymentpossible without introducing bias towards hosts that do (or don’t) use ECN. Once ECN is widely deployed, it will be possible to look at other ways of handling congestion, but until then it’s important not to introduce any serious bias in the system.

For ECN to be effective, the transport layer has to handle it. For the most part that means TCP. The TCP protocol has been extended by adding two one-bit flags to the TCP header and introducing a number of rules for ECN. The two bits in the header are the ECE (Explicit Congestion Echo) and CWR (Congestion Window Reduced).

Since ECN is a forward notification, the receiver needs to tell the sender when it’s time to throttle back. In TCP this is done by having the receiver set the ECE flag of any ACKs (perhaps any packets – I’m not sure) it returns. The ECE flag is kept set until the sender responds with a packet that has the CWR flag set. To avoid multiple reactions to ECN, the endpoints do not need to react to more than one ECN or ECE event per window, where a window is roughly the round trip time. Otherwise, the endpoints might throttle back several times in response to one congestion event, even whenthrottling back once would be sufficuent. There are a lot of other details, but you’ll have to get them from the RFC.

There are also rules for negotiating ECN between endpoints during the TCP handshake. I won’t go through them here. They’re in the RFC.

The introduction of ECN has brought some interesting behavior to light. The biggest problem are the two new flags in the TCP header. Since those were previously reserved (which means they had to be set to zero), there are implementations

24

IP Fragmentation

25

Fragmentation

§ Identification� Which IP datagram is this?

§ Flags� Are there fragments?� May I fragment you?

§ Fragmentation offset� Which fragment is this?

IDENTIFICATION FLAGS FRAGMENTATION OFFSET

Fragmentation is neededwhen the datagram is largerthan the path MTU.

Fragmentation is an integral part of IP.

Fragmentation becomes necessary when a datagram exceeds the MTU of the path between sender and receiver. In the IP header, fragmentation is supported by the identification, flags and fragmentation offset fields.

The identification field is set by the sender, or when set to zero by the sender, set to a different value by the entity that fragments the datagram. The flags control fragmentation. The fragmentation offset indicateswhere in the datagram the fragment belongs.

To reassemble a fragment, the entity doing the job has to be able to recognize whether or not a datagram has been fragmented, which fragments belong together and in what order. It also has to be able to determine when all fragments have been reassembled.

The identification field together with the source address, destination address, and protocol fields identifieseach datagram. All fragments in a datagram share the same identification value. It’s important not to re-usethe same identification value for more than one ”live” datagram at a time, or you may end up piecingtogether fragments from different datagrams at reassembly.

The flags field contains two flags (the high bit is reserved). The middle bit is called ”DF” and means ”don’tfragment”. If a datagram with this flag set exceeds the MTU of a link, the network has to signal a failure. It may not fragment the datagram and sent it on. It has to drop the datagram and respond with an ICMP fragmentation needed message. The low bit is the ”more fragments” bit. When set, it indicates that the packet is not the last fragment of the datagram.

The fragmentation offset indicates where in the datagram the fragment belongs. It is an offset (counted in units of 8 octets) from the start of the datagram. Here’s an important question: why use an offset and not just number the fragments? Consider the following facts: packets can be sent down multiple paths; packets canbe duplicated; the smallest MTU may not be on the first link.

Questions: how many fragments can a datagram be split into? How small can the fragments be? How long can an IP packet be in total (with and without fragmentation)? Why are the addresses and protocol fieldspart of the datagram identification? Why not just the identification field?

26

Reassembly of fragments

ID: 131Flags: MFOffset: 0Length: 100IP

HEA

DER


HEA

DER

ID: 131Flags: Offset: 300Length: 45IP

HEA

DER

REASSEMBLY BUFFER

This is a simplified example of reassembly. I’ve assumed that the ID fieldcontains all the identification necessary, and I’ve ignored the details of how the header of the datagram is handled.

Reassembly is pretty straightforward. The receiver has a buffer in which the datagram is reassembled. When a packet arrives, it is placed in the buffer. If it is not fragmented, the datagram is ready and is delivered to the next processingstep. If the datagram, however, is a fragment, it is placed at the approproateposition in the buffer. The header is taken from the first fragment.

Fragments can arrive in any order. That poses no problem for reassembly.

27

Reassembly of fragments


HEA

DER

IP H

EAD

ER

ID: 131Flags: Offset: 300Length: 45IP

HEA

DER

REASSEMBLY BUFFER

ID: 131Flags: MFOffset: 50Length: 200

This caused some implementations to crash and burn. The exploit that was developed was calledLAND.

Here’s a second example of reassembly. Because we’re using offsets and not numbers, it’s possible for fragments to overlap. Overlapping shouldn’t happen, but could since IP is a best-effort service that allows duplication of packets.

The RFC describing IP indicates that the fragment last received has priority. So ifthe second fragment partially overlaps the first, the data in the overlapping part should be taken from the second, not the first, fragment.

Question: why is this important?

Question: this can be exploited in the security domain? How?

28

Frag Attack!

§ ”Ping of Death”� Create IP datagram larger

than 65535 bytes� Some IP implementations

would crash duringreassembly

§ Fragment 1� Size: 65500 bytes� Offset: Zero

§ Fragment 2� Size 2048 bytes� Offset: 65500 bytes

How could this happen? Implementors didn’t fully grasp all the details of the protocols they were implementing, or at the very least didn’tconsider all the borderline cases. The moral of the story? Be careful.

29

Frag Attack!

§ IDS Rules:� Datagrams starting with

”HACKME” are rejected� Pass all others

ID: 131; MF; FO: 0 H E A R N O T H

ID: 131; MF; FO: 8 I N G S C A R Y

ID: 131; MF; FO: 1 A C K M E I

H E A R N O T H I N G S C A R YA C K M E I

ID: 131; --; FO: 9 A M E V I L !

A M E V I L !

”Packet starts with HEARNO. Safe! I’ll pass it along.”

That’s a fragment and it doesn’t look scary.



REASSEMBLY BUFFER (ON VICTIM)

FOOLED YOU!

The specification of IP indicates that fragments are assumed to be reassembled by the endpoints, not by routers. After all, why should a router waste expensive resources reassembling datagrams?

Let’s look at something that could happen. It doesn’t happen much any more, at least not as easilyas this, but at one point it did because developers didn’t consider all the aspects of IP fragmentation and what consequences they had.

Assume we’re running some kind of intrusion detection system. It scans network traffic for evilcontent. At the same time it wants to be as efficient as possible, so the programmer has optimizedthings a bit. When the software is looking for a pattern at the start of a packet, all it really needs is the head of the packet. Other fragments, if there are any, are irrelevant. Right?

In this example, we want to stop packets starting with the words HACKME. Offsets are in bytes, not words. We also know that it’s sufficient to stop one fragment of the packet to stop the entiredatagram.

This sort of attack doesn’t really work any more. It came to the attention of IDS designers a fewyears ago that it’s necessary to reassemble the entire packet before examining it. This is regrettable, because it means NIDS are less efficient.

This is just one example of how not understanding all the details of how the network layer workscan get you into big trouble.

30

Interruption!

31

Subnet 130.236.0.0/20

1. 40 point-to-point links2. 20 offices with 10-30

hosts3. 50 routers or switches4. 10 servers on one network5. 14 servers on another6. 1 network with 400 hosts

§ Plan out the addresses!

§ My solution1. 120.236.2.0/242. 130.236.0.0/213. 130.236.3.0/244. 130.236.4.0/255. 130.236.4.128/256. 130.236.6.0/23

� Optimal? No!� Correct? Maybe . . .

There’s plenty of space available to us.

Start with the 400-host network. 400 hosts can fit on a single /23, leaving some room for expansion.

Next, look at those offices. Each office has 10-30 hosts, so they would each fit in a /27. But that would leave no room for expansion at all, and from this description I’d guess that the number might change. So I want to allocate at least a /26 for each office, leaving room for expansion up to 62 hosts. I’m guessing that’ll be enough. There are 20 of them. I’ll leaveroom for expansion up to 32 offices. That means I need 32 /26s, which is equivalent to a single /21. That’s half the address space!

The point-to-point links and routers are easy. For the point-to-point links, I’ll allocate 64 /30s, which is equivalent of a single /24. That leaves room for a little expansion. If I want to expand more, I can convert my /30s to /31s.

The routers and switches need single /32s. I’ll allocate a /24 to get those addresses from, leaving me lots of room for expansion. Note that I have similar amounts of room for both point-to-point links and routers. That’s not an accident.

The networks with 10 and 14 hosts could fit on /28s, but again, to have headroom, I’ll allocate two /25s. Servers pile up.

That gives me:

Offices: 32 /26 (or 1 /21)Big network: 1 /23Point-to-points: 1 /24Routers: 1 /24Servers: 2 /25 (or 1 /24)

Next, I allocate addresses, starting with the large networks and moving down. I can split the space into two /21s: 130.236.8.0/21 and 130.236.0.0/21. I’ll allocate 130.236.8.0/21 to the offices and split that further later. I can then split 130.236.0.0/21 into four /23s: 130.236.0.0/23, 130.236.2.0/23, 130.236.4.0/23 and 130.236.6.0/23. I’ll allocate130.236.6.0/23 to the big network.

Now all I need are two /24s. I’ll grab 130.236.2.0/24 and 130.236.3.0/24 for the network stuff.

Finally, 130.236.4.0 and 130.236.4.128 go to the two server networks.

This division is probably not the best. I haven’t looked at details that are important, such as aggregating free space as much as possible. This is just one solution that I came up with in about 15 minutes worth of work.

32

One more . . .

Aggregate§ 10.7.13.96/27§ 10.7.13.128/25§ 10.6.13.0/24§ 10.6.12.0/23

33

One more . . .

0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 0 00 0 0 0 1 0 1 0

10 7 13 96

0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 00 0 0 0 1 0 1 0

10 7 13 128

0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 00 0 0 0 1 0 1 0

10 6 13 0

0 0 0 0 0 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 00 0 0 0 1 0 1 0

10 6 12 00

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

0

0

0

0

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

0.0.0.0/10.0.0.0/20.0.0.0/30.0.0.0/48.0.0.0/58.0.0.0/610.0.0.0/710.0.0.0/810.0.0.0/910.0.0.0/1010.0.0.0/1110.0.0.0/1210.0.0.0/1310.4.0.0/1410.6.0.0/15

0 0 0 0 1 1 0 1 0 1 1 0 0 0 0 0

0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0

0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0

0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0

1

1

0

1

34

End of interruption

35

TTL

§ TTL – Time To Live� Maximum hop limit� Each router devreases TTL� When TTL reaches 0 – poof

§ Maximum TTL – 255§ Normal starting TTL – 64

Example application:§ traceroute

TIME TO LIVE PROTOCOL CHECKSUM

36

Options

§ Some available options� Strict source routing� Loose source routing� Record route� Time stamp

§ Options are rarely used§ Options are often blocked

§ Strict source routing� Source specifies exact path

through network� Great for troubleshooting� A security risk

§ Loose source routing� Source specifies nodes the

datagram must pass through� Great for troubleshooting� A security risk

37

LSR Uses

§ Check return path� traceroute –g DEST ME

§ Detect packet sniffers� ping –g HOST SNIFFER

192.0.1.4

192.0.1.5

ICMP ECHO

TO: 192.0.1.4LSR: 192.0.1.5

Loose source routing has some uses. First of all, it can be useful to trace the return path of a connection. Loose source routing in traceroute will let you doexactly that. This is mainly useful for troubleshooting.

Another use is to detect hosts that are sniffing network traffic. This is done by sending a ping packet to a suspected host, but source routing it through a hostthat does not route. This host should drop the packet or respond with an error. Ifthe suspect happens to be sniffing, then the suspect will also see the packet and respond to it.

If there is a response to the packet, then the suspect may be sniffing traffic. A quick way to check is to examine the TTL field (included in the response) to see ifthe ping was received due to routing or sniffing. If the TTL field of the echoresponse for the source-routed packet is the same as the TTL field of the echoresponse of a non-source-routed ping, then the suspect is probably sniffing.

38

Attacker192.6.1.8

Trusted server147.12.1.6

Victim

Spoof 147.12.1.6LSR: 192.6.1.8

LSR Abuses

§ Implementations that reverse the source route makes spoofing easy

§ Gain access to private networks

10.0.1.5

147.12.1.1

To: 10.0.1.5LSR: 147.12.1.1

39

And That’s it for raw IP(but we’re not quite done yet)

40

Internet Control Message Protocol

Some common ICMP types0 Echo request3 Destination unreachable4 Source quench5 Redirect6 Alternate host address8 Echo9 Router advertisement10 Router selection11 Time Exceeded12 Parameter Problem13 Timestamp request14 Timestamp reply15 Information request16 Information reply17 Address mask request18 Address mask reply

Destination unreachable codes [partial]0 Net Unreachable1 Host Unreachable2 Protocol Unreachable3 Port Unreachable4 Fragmentation Needed5 Source Route Failed

Time Exceeded codes0 Time to live exceeded in transit1 Fragment reassembly time exceeded

TYPE CODE CHECKSUM

DATA

41

ICMP Uses

Traceroute§ Examine a patcket’s path

through the network

§ Operation:� Send UDP packets with

increasing TTL� Match returning time exceeded

ICMP messages with UDP packets

Path MTU discovery (RFC 1191)§ Discover the smallest MTU

between communicating hosts

§ Operation� Set DF bit in all packets� Routerse set next-hop MTU

when sending fragmentationneede messages

� Decrease MTU whenfragmentation needed ICMP messages are seen

42

ICMP Abuses

§ Various messages� Map out and get information

about hosts and networks

§ Redirect messages� Fool a host or router to re-

route traffic (a bad thing)

§ Source quench messages� Cause denial of service

10.6.0.0/16

192.0.2.26

Echo requestSRC: 192.0.2.26DST: 10.6.255.255

ICMP messages have been used to launch various attacks on remote networks, and as a result, the usefulness of some ICMP messages has been decreased. The risk they represent is not worththe gains.

The most common use of ICMP as an attack too is to map out information about a remotenetwork. The various information request types can be used to get information about networktopology, routers and timestamps.

More active attacks are possible with the redirect messages. A host can be fooled to route packets to an attacker if the host blindly listens to redirect messages. The source quenchmessage could probably be used to launch a denial of service attack by quenching a host.

Over the last few years we’ve seen some interesting application of source address spoofing and ICMP. A classic attack is the smurf attack. This attack involves an attacker, an intermediary and a victim, and is a type of amplification attack. Here the attacker sends an ICMP echo requestmessage to a network, using the network broadcast address as the destination and the victim’saddress as the source. If the intermediary doesn’t filter broadcast ICMP, this will cause all hostson the intermediary to send ICMP echo reply messages to the victim, overloading and killing it.

Discussion of ICMP filtering.

43

Summary

Topics covered:§ IP protocol format§ IP addressing§ Subnetting/supernetting§ Address planning§ Address allocation§ Fragmentation/reassembly§ Congestion control§ ICMP§ Applications and attacks

Missed topics:§ Differentiated services§ Most IP options§ Mobile IP§ IP routing (later)§ IPv6 (later)

TDTS02 - 3 - IPv4 - ida.liu.seTDTS02/TDTS02-M3.pdf · 3 IPv4 Header §Version §Header length §TOS/DS §Total length §Identification §Flags §Fragmentation offset §Time to live

Documents