gnanavelmeblog.files.wordpress.com · Web view... connection oriented byte stream service. ... This is because a flow that is sending a relatively large number of packets is ...

UNIT 4 TRANSPORT LAYER

Overview of Transport layer - UDP - Reliable byte stream (TCP) - Connection management

- Flow control - Retransmission – TCP Congestion control - Congestion avoidance

(DECbit, RED) – QoS – Application requirements

SIMPLE DEMULTIPLEXER (UDP)

The simplest possible transport protocol is one that extends the host-to host delivery service

of the underlying network into a process-to-process communication service. There are likely

to be many processes running on any given host, so the protocol needs to add a level of

demultiplexing, thereby allowing multiple application processes on each host to share the

network. Aside from this requirement, the transport protocol adds no other functionality to

the best-effort service provided by the underlying network. The Internet’s User Datagram

Protocol is an example of such a transport protocol.

The only interesting issue in such a protocol is the form of the address used to identify the

target process. Although it is possible for processes to directly identify each other with an

OS-assigned process id (pid), such an approach is only practical in a closed distributed

system in which a single OS runs on all hosts and assigns each process a unique id.

A more common approach, and the one used by UDP, is for processes to indirectly identify

each other using an abstract locater, usually called a port. The basic idea is for a source

process to send a message to a port and for the destination process to receive the message

from a port.

The header for an end-to-end protocol that implements this demultiplexing function typically

contains an identifier (port) for both the sender (source) and the receiver (destination) of the

message. For example, the UDP header is given in Figure . Notice that the UDP port field is

only 16 bits long.

This means that there are up to 64K possible ports, clearly not enough to identify all the

processes on all the hosts in the Internet. Fortunately, ports are not interpreted across the

entire Internet, but only on a single host. That is, a process is really identified by a port on

some particular host—a hport, hosti pair. In fact, this pair constitutes the demultiplexing key

for the UDP protocol.

CS 6551 CN S.GNANAVEL AP (SS)/ CSE REC PAGE 1

The next issue is how a process learns the port for the process to which it wants to send a

message. Typically, a client process initiates a message exchange with a server process. Once

a client has contacted a server, the server knows the client’s port (from the SrcPrt field

contained in the message header) and can reply to it.

The real problem, therefore, is how the client learns the server’s port in the first place. A

common approach is for the server to accept messages at a well-known port. That is, each

server receives its messages at some fixed port that is widely published, much like the

emergency telephone service available in the United States at the well-known phone number

911. In the Internet, for example, the Domain Name Server (DNS) receives messages at well-

known port 53 on each host, the mail service listens for messages at port 25, and the Unix

talk program accepts messages at well-known port 517, and so on.

This mapping is published periodically in an RFC and is available on most Unix systems in

file /etc/services. Sometimes a well-known port is just the starting point for communication:

The client and server use the well known port to agree on some other port that they will use

for subsequent communication, leaving the well-known port free for other clients.

An alternative strategy is to generalize this idea, so that there is only a single well-known port

—the one at which the port mapper service accepts messages.

A client would send a message to the port mapper’s well-known port asking for the port it

should use to talk to the “whatever” service, and the port mapper returns the appropriate port.

This strategy makes it easy to change the port associated with different services over time and

for each host to use a different port for the same service.


As just mentioned, a port is purely an abstraction. Exactly how it is implemented differs from

system to system, or more precisely, from OS to OS. implementation of ports. Typically, a

port is implemented by a message queue, as illustrated in Figure.

When a message arrives, the protocol (e.g., UDP) appends the message to the end of the

queue. Should the queue be full, the message is discarded. There is no flow-control

mechanism in UDP to tell the sender to slowdown. When an application process wants to

receive a message, one is removed from the front of the queue. If the queue is empty, the

process blocks until a message becomes available.

Finally, although UDP does not implement flow control or reliable/ ordered delivery, it does

provide one more function aside from demultiplexing messages to some application process

—it also ensures the correctness of the message by the use of a checksum.

(The UDP checksum is optional in IPv4 but is mandatory in IPv6.) The basic UDP checksum

algorithm is the same one used for IP, that is, it adds up a set of 16-bit words using ones

complement arithmetic and takes the ones complement of the result. But the input data that is

used for the checksum is a little counterintuitive.

The UDP checksum takes as input the UDP header, the contents of the message body, and

something called the pseudoheader. The pseudoheader consists of three fields from the IP

header—protocol number, source IP address, and destination IP address—plus the UDP

length field.

(Yes, the UDP length field is included twice in the checksum calculation.) The motivation

behind having the pseudo header is to verify that this message has been delivered between the

correct two endpoints. For example, if the destination IP address was modified while the

packet was in transit, causing the packet to be misdelivered, this fact would be detected by

the UDP checksum.


Summery

User Datagram Protocol (UDP) is a connectionless, unreliable transport protocol. Adds process-to-process communication to best-effort service provided by IP. Simple demultiplexer allows multiple processes on each host to communicate. Does not provide flow control / reliable / ordered delivery UDP is suitable for a

process that requires simple request-response communication with little concern for flow control/error control.

UDP packets are known as user datagrams. It has a 8-byte header. SrcPort and DstPort—Source and destination port number. Length—total length of the user datagram, i.e., header plus data. Checksum—computed over UDP header, data and pseudo header. Pseudo header consists of IP fields (Protocol, SourceAddr, DestinationAddr) and

UDP Length field. UDP delivers message to the correct recipient process using checksum.

Ports Processes (server/client) are identified by an abstract locator known as port. Server accepts message at well known port. Some well-known UDP ports are 7–Echo,

53–DNS, 111–RPC, 161–SNMP, etc. < port, host > pair is used as key for demultiplexing. Ports are implemented as a message queue. When a message arrives, UDP appends it to end of the queue. When queue is full, the message is discarded. When a message is read, it is removed

from the queue.

Applications


Used for management processes such as SNMP. Used for route updating protocols such as RIP. It is a suitable transport protocol for multicasting. UDP is suitable for a process with internal flow and error control mechanisms such as

Trivial File Transfer Protocol (TFTP).

TCP - RELIABLE BYTE STREAM

TCP is a more sophisticated transport protocol is one that offers a reliable, connection

oriented byte stream service. Such a service has proven useful to a wide assortment of

application because it frees the application from having to worry about missing or reordered

data.

TCP guarantees the reliable in order delivery of a stream of bytes. It is a full duplex protocol

meaning that each TCP connection supports a pair of byte streams, one flowing each

direction. It also includes a flow control mechanism for each of these byte streams that allow

the receiver to limit how much data the sender can transmit at a given me.

Finally, like UDP, TCP supports a demultiplexing mechanism that allows multiple

application programs on any given host to simultaneously carry on a conversation with their

peers. In addition to the above features, TCP also implements a highly tuned congestion

control mechanism.

END TO END ISSUES

At the heart of TCP is sliding window algorithm. TCP supports logical connections between

processes that are running on any two computers in the internet. This means that TCP needs

an explicit connection establishment phase during which the two sides of the connection

agree to exchange data with each other. This difference is analogous to having a dedicated

phone line. TCP also has an explicit connection teardown phase.

One of the things that happen during connection establishment is that the two parties establish

some shared state to enable the sliding window algorithm to begin. Connection teardown is

needed so each host known it is OK to free this state.

Whereas, a single physical link that always connects the same two computers has a fixed

RTT, TCP connection are likely to have widely different round trip times. Variations in the

RTT are even possible during a single TCP connection.


Packets may be reordered as they cross the internet, but this is not possible on a point-topoint

link where the first packet put into one end of the link must be the first to appear at the other

end. Packets that are slightly out of order don’t cause a problem since the sliding window

algorithm can reorder packets correctly using the sequence number.

TCP assumes that each packet has a maximum lifetime. The exact lifetime, known as the

maximum segment lifetime (MSL), is an engineering choice. The current recommended

setting is 120seconds.

The computers connected to a point to point link are generally engineered to support the link.

For example, if a link’s delay X bandwidth product is computed to be 8KB meaning that a

window size is selected to allow up to 8kb of data to be unacknowledgement at a given time

then it is likely that the computers at either end of the link have the ability to buffer up to 8kb

of data. Because the transmitting side of a directly connected link cannot send any faster than

the bandwidth of the link allows, and only one host is pumping data into the link, it is not

possible to unknowingly congest the link. Said another way, the load on the link is visible in

the form of a queue of packets at the sender. In contrast, the sending side of a TCP

connection has no idea what links will be traversed to reach the destination.

TCP is a byte oriented protocol, which means that the sender writes bytes into a TCP connection and the receiver reads bytes out of the TCP connection. Although “byte stream” describes the service TCP offers to application processes, TCP does not itself transmit individual bytes over the internet. Instead, TCP on the source host buffers enough bytes from the sending process to fill a reasonably sized packet and then sends this packet to its peer on the destination host. TCP on the destination host then empties the contents of the packet into a receiving process reads from this buffer at its leisure.


The packets exchanged between TCP peers are called segments, since each one carries a segment of the byte stream. The SrcPort and Distorts fields identify the source and destination ports, respectively, just as in UDP. These two fields, plus the source and destination IP addresses, combine to uniquely identify each TCP connection. That is, TCP’s demux key is given by the 4-tuple

(SrcPort, SrclPAddr, DstPort, DstlPAddr)

The acknowledgement, sequence num and advertised window fields are all involved in TCP’s sliding window algorithm. Because TCP is a byte oriented protocol, each byte of data has a sequence number, the sequence numfield contains the sequence number for the first byte of data carried in that segment. The acknowledgement and advertisement window values flowing in the opposite direction.

The 6-bit flags field is used to relay control information between TCP peers. The possible flags include SYN, FIN, RESET, PUSH, URG, and ACK. The SYN and FIN flags are used when establishing and terminating a TCP connection, respectively. The ACK flag is set any time the Acknowledgement field is valid, implying that the receiver should pay attention to it. The URG flag signifies that this segment contains urgent data. When this flag is set, the UrgPtr bytes into the segment. The PUSH flag signifies that the sender invoked the push operation which indicates to the receiving side of TCP that it should notify the receiving process of this fact. The RESET flag signifies that the receiver has become confused for example, because it received a segment it did not except to receive and so wants to abort the connection.


CONNECTION ESTABLISHMENT AND TERMINATION(TCP CONNECTION MANAGEMENT (OR) TCP ARCHITECTURE (OR) STATE TRANSITION DIAGRAM)A TCP connection begins with a client doing an active open to a server. Assuming that the server had earlier done a passive open, the two sides engage in an exchange of messages to establish the connection. Only after this connection establishment phase is over do the two sides begin sending data. Likewise, as soon as a participant is done sending data, it closes one direction of the connection, which causes TCP to initiate a round of connection termination messages.

Connection setup is an asymmetric activity (one side does a passive open and the other side does an active open) connection teardown is symmetric (each side has to close the connection independently). Therefore it is possible for one side to have done a close, meaning that it can no longer send data but for the other side to keep the other half of the bidirectional connection opens and to continue sending data.

THREE WAY HANDSHAKES:

The algorithm used by TCP to establish and terminate a connection is called a three way handshake. The client (the active participant) sends a segment to the server(the passive participation) stating the initial sequence number it plans to use(flag =SYN,SequenceNum =x).

The server then responds with a single segment that both acknowledges the client’s sequence number (Flags =ACK, Ack=x+1) and states its own beginning sequence number (Flags=SYN, SequenceNum=y).

That is, both the SYN and ACK bits are set in the Flags field of this second message. Finally, the client responds with a third segment that acknowledges the server’s sequence number Flags =ACK, Ack=y+1).


STATE TRANSITION DIAGRAM:

TCP is complex enough that its specification includes a state transition diagram. This diagram shows only the states involved in opening a connection and in closing a connection. Everything that goes on while a connection is open that is, the operation of the sliding window algorithm is hidden in the ESTABLISHED state.

Each circle denotes a state that one end of a TCP connection can find itself in. All

connections start in the CLOSED state. As the connection progresses, theconnection moves

from state to state according to the arcs. Each arc is labeled with a tag of the form

event/action.

Thus, if a connection is in the LISTEN state and a SYN segment arrives the connection

makes a transition to the SYN_RCVD state and takes the action of replying with an ACK +

SYN segment.

Notice that two kinds of events trigger a state transition 1. a segment arrives from the peer 2.

the local application process invokes an operation on TCP. When opening a connection, the

server first invokes a passive open operation on TCP, which causes TCP to move to the

LISTEN state. At some later time, the client does an active open, which causes its end of the

connection to send a SYN segment to the server and to move toe the SYN_SENT state.

When the SYN segment arrives at the server, it moves to the SYN_RCVD state and responds

with a SYN + ACK segment. The arrival of this segment causes the client to move to the


ESTABLISHED state and to send an ACK back to the server. When this ACK arrives, the

server finally moves to the ESTABLISHED state.

There are three things to notice about the connection establishment half of the state-transition

diagram. First, if the client’s ACK to the server is lost, corresponding to the third leg of the

three-way handshake, then the connection still functions correctly.

This is because the client side is already in the ESTABLISHED state, so the local application

process can start sending data to the order end. Each of these data segments will have the

ACK flag set and the correct value in the Acknowledgment field, so the server will move to

the ESTABLISHED state when the first data segment arrives. This is actually an important

point about TCP every segment reports what sequence number contained in one or more

previous segments.The second thing to notice about the state –transition diagram is that there

is a funny transition out of the LISTEN state whenever the local process invokes a send

operation on TCP. That it is, it is possible for a passive participant to identify both ends of the

connection and then for it to change its mind about waiting for the other side and instead

actively establish the connection. To the best of our knowledge, this is a feature of TCP that

no application process actually takes advantages of. The final thing to notice about the

diagram is the arcs that are not shown.Specifically, most of the states that involve sending a

segment to the other side also schedule a time out that eventually causes the segment to be

resent if the expected response does not happen. This retransmission is not depicted in the

state – transition diagram. If after several tries the expected response does not arrive, TCP

gives up and returns to the CLOSED state. There are three combinations of transition that get

a connection from the ESTABLISHED state to the CLOSED state:

This side closes first: ESTABLISHED -> FIN_WAIT_1 -> FIN_WAIT_2

->TIME _WAIT -> CLOSED.

The other side closes first: ESTABLISHED ->CLOSE _WAIT ->

LAST_ACK -> CLOSED.

Both sides close at the same time: ESTABLISHED -> FIN_WAIT _1 -

>CLOSING -> TIME_WAIT -> CLOSED.

There is actually a fourth, although rare, sequence of transitions that leads to the CLOSED

state, it follows the arc from FIN_WAIT_1 to TIME_WAIT. We leave it as an exercise for

you to figure out what combination of circumstances leads to this fourth possibility.


TCP FLOW CONTROL (OR) ADAPTIVE FLOW CONTROL (OR) TCP SLIDING

WINDOW IN DETAIL

TCP uses a variant of sliding window known as adaptive flow control that:

guarantees reliable delivery of data

ensures ordered delivery of data

enforces flow control at the sender

Receiver advertises its window size to the sender using AdvertisedWindow field.

Sender thus cannot have unacknowledged data greater than AdvertisedWindow.

Send Buffer Sending TCP maintains send buffer which contains 3 segments, acknowledged data, unacknowledged data and data to be transmitted.

Send buffer maintains three pointers LastByteAcked, LastByteSent, and LastByteWritten such that:

LastByteAcked ≤ LastByteSent ≤ LastByteWritten

A byte can be sent only after being written and only a sent byte can be acknowledged. Bytes to the left of LastByteAcked are not kept as it had been acknowledged.

Receive Buffer Receiving TCP maintains receive buffer to hold data even if it arrives out-of order. Receive buffer maintains three pointers namely LastByteRead, NextByteExpected, and LastByteRcvd such that: LastByteRead < NextByteExpected ≤ LastByteRcvd + 1 A byte cannot be read until that byte and all preceding bytes have been received. If data is received in order, then NextByteExpected = LastByteRcvd + 1 Bytes to the left of Last ByteRead are not buffered, since it is read by the application.

Flow ControlSize of send and receive buffer is MaxSendBuffer and MaxRcvBuffer respectively.


o Sending TCP prevents overflowing of send buffer by maintaining LastByteWritten −LastByteAcked ≤ MaxSendBuff er

o Receiving TCP avoids overflowing its receive buffer by maintaining LastByteRcvd −LastByteRead ≤ MaxRcvBuffer

o Receiver throttles the sender by having AdvertisedWindow based on free space available for buffering.

o Sending TCP adheres to AdvertisedWindow by computing EffectiveWindow that limits how much data it should send.

o When data arrives, LastByteRcvd moves to its right and AdvertisedWindow shrinks. Receiver acknowledges only, if preceding bytes have arrived.

o AdvertisedWindow expands when data is read by the application. If data is read as fast as it arrives then AdvertisedWindow =

MaxRcvBuffer If data is read slowly, it eventually leads to a

AdvertisedWindow of size 0. AdvertisedWindow field is designed to allow sender to keep the pipe full.

Fast Sender vs Slow Receiver

If sender transmits at a higher rate, receiver's buffer gets filled up. Hence, dvertised

Windowshrinks, eventually to 0.

Receiver advertises a window of size 0, thus sender cannot transmit as it gets blocked.

When receiving process reads some data, those bytes are acknowledged and

AdvertisedWindow expands.

When an acknowledgement arrives for x bytes, LastByteAcked is incremented by x

and

send buffer space is freed accordingly to send further data.

ADAPTIVE RETRANSMISSION ALGORITHMS. (OR) HOW IS TIMEOUT

ESTIMATED IN TCP

TCP guarantees reliability through retransmission when ACK arrives after timeout.

Timeout is based on RTT, but it is highly variable for any two hosts on the internet.

Appropriate timeout is chosen using adaptive retransmission.

Original Algorithm

SampleRTT is the duration between sending a segment and arrival of its ACK.

EstimatedRTT is weighted average of previous estimate and current sample.


EstimatedRTT = α × EstimatedRTT + (1 − α) × SampleRTT

(where α is known as smoothening factor with value in the range 0.8–0.9)

Timeout is determined as twice the value of EstimatedRTT .

TimeOut = 2 × EstimatedRTT

In original TCP, timeout is thus computed as function of running average of RTT.

Karn/Partridge Algorithm

Flaw discovered in TCP original algorithm was that an ACK segment, acknowledges receipt

of data, not a transmission. When an ACK arrives after retransmission, it is impossible to

decide, whether to pair it with original or retransmitted segment for SampleRTT estimation.

If ACK is associated with original one, then SampleRTT becomes too large ACK is

associated with retransmission, then SampleRTT becomes too small Karn and Partridge

proposed that SampleRTT should be taken for segments that are sent only once, i.e, for

segments that are not retransmitted. Each time TCP retransmits, timeout is doubled, since loss

of segments is mostly due to congestion


TCP CONGESTION CONTROL

The internet was suffering from congestion collapse-hosts would send their packets into the

internet fast as the advertised window would allow, congestion would occur at some

router(causing packets to be dropped), & the hosts would time to out& retransmits their

packets, resulting in even more congestion.

The idea of TCP congestion control is for each source to determine how much capacity is

available in the network, so it knows how many packets it can safely have in transit. Once a

given source has this many packets in their transit, it uses the arrival of an ACK as a signal

that one of its packets has the left the network, & that it is therefore safe to insert a new

packet into the network without adding to the level of congestion. By using ACKs to pace the

transmission of packets, TCP is said to self-clocking.

ADDITIVE INCREASE/MULTIPLICATIVE DECREASE:

TCP maintains a new state variable for each connection, called Congestion Window, which is

used by the source to limit how much data it is allowed to have in transit at a given time .The

congestion window is congestion control’s counterpart to flow control’s advertised window.

TCP is modified such that the maximum number of bytes of unacknowledged data allowed is

now the minimum of the congestion window and the advertised window .TCP’s effective

window is revised as follows:

Max Window =MIN (Congestion Window, Advertised Window)

Effective Window =Max Window – (Last Byte Sent- Last Byte Acked)

The problem, of course, is TCP comes to learn an appropriate value for Congestion Window .

Unlike the Advertised Window, which is sent by the receiving side of the connection, there is

no one to send a suitable Congestion Window based on the level of congestion it perceives to

exist in the network.

This involves decreasing the congestion goes up & increasing the congestion window when

the level of congestion goes down. Taken together, the mechanism is commonly called

additive increase/multiplicative decrease (AIMD);


SLOWSTART:

The additive increase mechanism just described is the right approach to use when the source

is operating close to the available capacity of the network, but it takes too long to ramp up a

connection when it is starting from scratch.

TCP therefore provides the second mechanism, ironically called slow chat that is used to

increase the congestion window rapidly from the cold start. Slow start effectively increases

the congestion window exponentially, rather than linearly.

The source starts out by setting Congestion Window to one packet. When the ACK for this

packet arrives, TCP adds 1 to Congestion Window and then sends two packets. Upon their

receiving the corresponding two ACKs, TCP increments the Congestion Window by 2-one

for each ACK –and next sends four packets. The end result is that TCP effectively doubles

the number of packets it has in transit every RTT.

There are actually two different situations in which slow start runs. The first is at the very

beginning o f a connection, at which time the source has no idea how many packets it is

going to be able to have in transit at and given time.


In this situation, slow start continues to double Congestion Window each RTT until there is

and loss, at which time a timeout cause’s multiplicative decrease to divide Congestion

Window by 2.

The second situation in which slow start is a bit more subtle; it occurs when the connection

goes dead while waiting for a timeout to occur. Recall how TCP ‘s sliding window algorithm

works – when the packet is lost, the source eventually reaches a point where it is sent as

much data as the advertised window allows, and so it blocks while waiting for an ACK that

will not arrive. Eventually, a timeout happens, but by this time there are no packets in transit;

meaning that the source will receive no ACKs to “clock” the transmission of new packets.

The source will instead receive a single commutative ACK that reopens the entire advertised

window, but as explained above, the source then uses slow start to restart the flow of data

rather than dumping a whole window’s worth of data on the network all at once.

FAST RETRANSMIT AND FAST RECOVERY:

The idea of fast retransmit is straight forward. Every time a data packet arrives at the

receiving side, the receiver responds with an acknowledgement, even if this sequence number

has already been acknowledged. Thus, when a packet arrives out of order – that is, TCP

cannot yet acknowledge the data the packet contains because earlier data has not yet arrived –

TCP resends the same acknowledgement it sent the last time. This second transmission of the

acknowledgement is called a duplicate ACK .

When the sending side sees a duplicate ACK , it knows that the other side must have received

a packet out of order, which suggests that an earlier packet has might have been lost. Since it

is also possible that the earlier packet has only been delayed rather than lost , the sender waits

until it sees some number of duplicate ACKs and then retransmits the missing packet. In

practice, TCP waits until it has seen three duplicate ACKs before retransmitting the packet.


QUALITY OF SERVICES (QoS):

Network should support multimedia applications that are those combine audio, video, and

data. For that it should provide sufficient bandwidth. The timeliness of delivery can be very

important. The applications that is sensitive to the timeliness of data as real time applications.

The data should be delivered correctly. A network that can provide thesedifferent levels of

services is often said to be support quality of services.

APPLICATION REQUIREMENTS:

Applications are divided into two classes.

They are

real time

non real time

they are called as traditional data applications. Since they have traditionally been the

major applications found on data networks. Examples are, Telnet, FTP, email, web

browsing etc.

TAXONOMY OF REAL TIME APPLICATIONS:

The characteristics used to categorize the applications are,

1. tolerance of loss of data

2. adaptability

APPROACHES TO QoS SUPPORT:

The approaches are divided into two broad categories. They are,

1. Fine-grained approaches, which provide QoS to individual applications of flows.

2. Coarse-grained approaches, which provides QoS to large class of data or aggregated

traffic.In the first category, integrated services are used and in the second category

differentiated services are used.

INTEGRATED SERVICES (RSVP)

The term “Integrated Services” refers to a body of work that was produced by the IETF

around 1995-97.The IntServ working group developed the specifications of a number of

service classes designed to meet the needs of some of the application types described above.

It also defined how RSVP could be used to make reservations using these service classes.

SERVICE CLASSES:

One of the service classes is designed for intolerant applications. These applications require

that a packet never arrive late. The network should guarantee that the maximum delay that

any packet will experience has some specified value; the application can then set its playback

point so that no packet will ever arrive after its playback time. The aim of the controlled load

service is to emulate a lightly loaded network for those applications that request service, even


though the network as a whole may in fact be heavily loaded. The trick to this is to use a

queuing mechanism such as WFQ to isolate the controlled load traffic from the other traffic

and some form of admission control to limit the total amount of controlled load traffic on a

link such that the load is kept reasonably low.

OVERVIEW OFMECHANISMS:

The set of information that we provide to the network is referred to as a flow spec. When we

ask the network to provide us with a particular service, the network needs to decide if it can

in fact provide that service.

The process of deciding when it says no is called admission control. We need a mechanism

by which the users of the network and the components of the network itself exchange the

information such requests for service, flow specs, and admission control decisions. This is

called signaling in the ATM world, but since this word has several meanings, we refer to this

process as resource reservation, and it is achieved using a Resource Reservation Protocol.

When flows and their requirements have been described, and admission control decisions

have been made, the network switches and routers need to meet the requirements of flows. A

key part of meeting these requirements is managing the way packets are queued and

scheduled for transmission in the switches and routers. This last mechanism is packet

scheduling.

FLOWSPECS:

There are two separable parts to the flow spec: the part that describes the flow’s traffic

characteristics and the part that describes the service requested from the network. The RSpec

is very service specific and relatively easy to describe. The TSpec is a little more

complicated.

ADMISSION CONTROL:

When some new flow wants to receive a particular level of service, admission control looks

at the TSpec and RSpec of the flow and tries to decide if the desired service can be provided

to that amount of traffic, given the currently available resources, without causing any

previously admitted flow to receive worse service it had requested. If it can provide the

service, the flow is admitted; if not then denied. The hard part is figuring out when to say yes

and when to say no.

Admission control is very dependent on the type of requested service and on the queuing

discipline employed in the routers; when discuss the latter topic later in this section. For a

guaranteed service, you need to have a good algorithm to make a definitive yes/no decision.


RESERVATION PROTOCOL:

While connection oriented networks have always needed some sort of setup protocol to

establish the necessary virtual circuit state in the switches, connectionless networks like the

internet have had no such protocols. While there have been a number of setup protocols

p[proposed for the internet, the one on which most current attention is focused is called

resource reservation protocol (RSVP).

The characteristics of RSVP are,

It tries to maintain the robustness by using the idea of soft state in the routers.

It aims to support multicast flows just as effectively unicast flows.

PACKET CLASSIFYING AND SCHEDULING:

Once we have described our traffic and our desired network service and have installed a

suitable reservation at all the routers on the path, the only thing that remains is for the routers

to actually deliver the requested service to the data packets. There are two things that need to

be done:

Associate each packet with the appropriate reservation so that it can be handled correctly, a

process known as classifying packets. It is done by examining five fields in the packet: the

source address, the destination address, protocol number, source port, destination port. o

Manage the packets in the queues so that they receive the service that has been requested, a

process known as packet scheduling.

CONGESTION-AVOIDANCE MECHANISMS

It is important to understand that TCP’s strategy is to control congestion once it happens, as

opposed to trying to avoid congestion in the first place. In fact, TCP repeatedly increases the

load it imposes on the network in an effort to find the point at which congestion occurs, and

then it backs off from this point. Said another way, TCP needs to create losses to find the

available bandwidth of the connection. An appealing alternative, but one that has not yet been

widely adopted, is to predict when congestion is about to happen and then to reduce the rate

at which hosts send data just before packets start being discarded. We call such a strategy

congestion avoidance, to distinguish it from congestion control.

This section describes three different congestion-avoidance mechanisms. The first two take a

similar approach: They put a small amount of additional functionality into the router to assist

the end node in the anticipation of congestion. The third mechanism is very different from the

first two: It attempts to avoid congestion purely from the end nodes.


1 DECbit

The first mechanism was developed for use on the Digital Network Architecture (DNA), a

connectionless network with a connection-oriented transport protocol. This mechanism could,

therefore, also be applied to TCP and IP. As noted above, the idea here is to more evenly split

the responsibility for congestion control between the routers and the end nodes. Each router

monitors the load it is experiencing and explicitly notifies the end nodes when congestion is

about to occur. This notification is implemented by setting a binary congestion bit in the

packets that flow through the router, hence the name DECbit. The destination host then

copies this congestion bit into the ACK it sends back to the source. Finally, the source adjusts

its sending rate so as to avoid congestion. The following discussion describes the algorithm in

more detail, starting with what happens in the router.

A single congestion bit is added to the packet header. A router sets this bit in a packet if its

average queue length is greater than or equal to 1 at the time the packet arrives. This average

queue length is measured over a time interval that spans the last busy+idle cycle, plus the

current busy cycle. (The router is busy when it is transmitting and idle when it is not.) Figure

6.14 shows the queue length at a router as a function of time. Essentially, the router calculates

the area under the curve and divides this value by the time interval to compute the average

queue length. Using a queue length of 1 as the trigger for setting the congestion bit is a trade-

off between significant queuing (and hence higher throughput) and increased idle time (and

hence lower delay). In other words, a queue length of 1 seems to optimize the power

function.

Now turning our attention to the host half of the mechanism, the source records how many of

its packets resulted in some router setting the congestion bit. In particular, the source

maintains a congestion window, just as in TCP, and watches to see what fraction of the last

window’s worth of packets resulted in the bit being set. If less than 50% of the packets had

the bit set, then the source increases its congestion window by one packet. If 50% or more of

the last window’s worth of packets had the congestion bit set, then the source decreases its

congestion window to 0.875 times the previous value. The value 50% was chosen as the

threshold based on analysis that showed it to correspond to the peak of the power curve.

The “increase by 1, decrease by 0.875” rule was selected because additive

increase/multiplicative decrease makes the mechanism stable.


2 Random Early Detection (RED)

A second mechanism, called random early detection (RED), is similar to the DECbit scheme

in that each router is programmed to monitor its own queue length and, when it detects that

congestion is imminent, to notify the source to adjust its congestion window. RED, invented

by Sally Floyd and Van Jacobson in the early 1990s, differs from the DECbit scheme in

two major ways.

The first is that rather than explicitly sending a congestion notification message to the source,

RED is most commonly implemented such that it implicitly notifies the source of congestion

by dropping one of its packets.

The source is, therefore, effectively notified by the subsequent timeout or duplicate ACK. In

case you haven’t already guessed, RED is designed to be used in conjunction with TCP,

which currently detects congestion by means of timeouts (or some other means of detecting

packet loss such as duplicate ACKs). As the “early” part of the RED acronym suggests, the

gateway drops the packet earlier than it would have to, so as to notify the source that it should

decrease its congestion windowsooner than it would normally have. In other words, the router

drops a few packets before it has exhausted its buffer space completely, so as to cause the

source to slow down, with the hope that this will mean it does not have to drop lots of packets

later on. Note that RED could easily be adapted to work with an explicit feedback scheme

simply by marking a packet instead of dropping it, as discussed in the sidebar on Explicit

Congestion Notification.

The second difference between RED and DECbit is in the details of how RED decides when

to drop a packet and what packet it decides to drop. To understand the basic idea, consider a

simple FIFO queue. Rather than wait for the queue to become completely full and then be

forced to drop each arriving packet , we could decide to drop each arriving packet with some

drop probability whenever the queue length exceeds some drop level. This idea is called

early random drop. The RED algorithm defines the details of how to monitor the queue

length and when to drop a packet.

In the following paragraphs, we describe the RED algorithm as originally proposed by Floyd

and Jacobson. We note that several modifications have since been proposed both by the

inventors and by other researchers; some of these are discussed in Further Reading. However,

the key ideas are the same as those presented below, and most current implementations are

close to the algorithm that follows.

First, RED computes an average queue length using a weighted running average similar to

the one used in the original TCP timeout computation.


That is, AvgLen is computed as

AvgLen = (1−Weight)×AvgLen+Weight×SampleLen

where 0 < Weight < 1 and SampleLen is the length of the queue when a sample measurement

is made. In most software implementations, the queue length is measured every time a new

packet arrives at the gateway. In hardware, it might be calculated at some fixed sampling

interval. The reason for using an average queue length rather than an instantaneous one is that

it more accurately captures the notion of congestion.

Because of the bursty nature of Internet traffic, queues can become full very quickly and then

become empty again. If a queue is spending most of its time empty, then it’s probably not

appropriate to conclude that the router is congested and to tell the hosts to slow down. Thus,

the weighted running average calculation tries to detect long-lived congestion, as indicated

in the right-hand portion of Figure , by filtering out short-term changes in the queue length.

You can think of the running average as a low-pass filter, where Weight determines the time

constant of the filter.

The question of how we pick this time constant is discussed below. Second, RED has two

queue length thresholds that trigger certain activity: MinThreshold and MaxThreshold. When

a packet arrives at the gateway, RED compares the current AvgLen with these two

thresholds, according to the following rules:

if AvgLen _ MinThreshold

!queue the packet

if MinThreshold < AvgLen < MaxThreshold

!calculate probability P

!drop the arriving packet with probability P

if MaxThreshold _ AvgLen

!drop the arriving packet

If the average queue length is smaller than the lower threshold, no action is taken, and if the

average queue length is larger than the upper threshold, then the packet is always dropped. If

the average queue length is between the two thresholds, then the newly arriving packet is

dropped with some probability P. This situation is depicted in Figure . The approximate

relationship between P and AvgLen is shown in Figure

Note that the probability of drop increases slowly when AvgLen is between the two

thresholds, reaching MaxP at the upper threshold, at which point it jumps to unity. The

rationale behind this is that, if AvgLen reaches the upper threshold, then the gentle approach

(dropping a few packets) is not working and drastic measures are called for: dropping all

arriving packets. Some research has suggested that a smoother transition from random


dropping to complete dropping, rather than the discontinuous approach shown here, may be

appropriate.

Although Figure shows the probability of drop as a function only of AvgLen, the situation is

actually a little more complicated. In fact, P is

a function of both AvgLen and how long it has been since the last packet was dropped.

Specifically, it is computed as follows:

TempP = MaxP×(AvgLen−MinThreshold)/(MaxThreshold−MinThreshold)

P = TempP/(1−count×TempP)


TempP is the variable that is plotted on the y-axis in Figure , count keeps track of how many

newly arriving packets have been queued (not dropped), and AvgLen has been between the

two thresholds. P increases slowly as count increases, thereby making a drop increasingly

likely as the time since the last drop increases. This makes closely spaced drops relatively

less likely than widely spaced drops. This extra step in calculating P was introduced by the

inventors of RED when they observed that, without it, the packet drops were not well

distributed in time but instead tended to occur in clusters. Because packet arrivals from a

certain connection are likely to arrive in bursts, this clustering of drops is likely to cause

multiple drops in a single connection. This is not desirable, since only one drop per round-trip

time is enough to cause a connection to reduce its window size, whereas multiple drops might

send it back into slow start.

As an example, suppose that we set MaxP to 0.02 and count is initialized to zero. If the

average queue length were halfway between the two thresholds, then TempP, and the initial

value of P, would be half of MaxP, or 0.01. An arriving packet, of course, has a 99 in 100

chance of getting into the queue at this point. With each successive packet that is not

dropped, P slowly increases, and by the time 50 packets have arrived without a drop, P would

have doubled to 0.02. In the unlikely event that 99 packets arrived without loss, P reaches 1,

guaranteeing that the next packet is dropped. The important thing about this part of the

algorithm is that it ensures a roughly even distribution of drops over time.

The intent is that, if RED drops a small percentage of packets when AvgLen exceeds

MinThreshold, this will cause a few TCP connections to reduce their window sizes, which in

turn will reduce the rate at which packets arrive at the router. All going well, AvgLen will

then decrease and congestion is avoided. The queue length can be kept short, while

throughput remains high since few packets are dropped.

Note that, because RED is operating on a queue length averaged over time, it is possible for

the instantaneous queue length to be much longer than AvgLen. In this case, if a packet

arrives and there is nowhere to put it, then it will have to be dropped. When this happens,

RED is operating in tail drop mode. One of the goals of RED is to prevent tail drop behavior

if possible.

The random nature of RED confers an interesting property on the algorithm. Because RED

drops packets randomly, the probability that RED decides to drop a particular flow’s

packet(s) is roughly proportional to the share of the bandwidth that that flow is currently

getting at that router.


This is because a flow that is sending a relatively large number of packets is providing more

candidates for random dropping. Thus, there is some sense of fair resource allocation built

into RED, although it is by no means precise.

Consider the setting of the two thresholds, MinThreshold and Max- Threshold. If the traffic is

fairly bursty, then MinThreshold should be sufficiently large to allowthe link utilization to be

maintained at an acceptably high level. Also, the difference between the two thresholds

should be larger than the typical increase in the calculated average queue length in one RTT.

Setting MaxThreshold to twice MinThreshold seems to be a reasonable rule of thumb given

the traffic mix on today’s Internet. In addition, since we expect the average queue length to

hover between the two thresholds during periods of high load, there should be enough free

buffer space above MaxThreshold to absorb the natural bursts that occur in Internet traffic

without forcing the router to enter tail drop mode.

We noted above that Weight determines the time constant for the running average low-pass

filter, and this gives us a clue as to how we might pick a suitable value for it. Recall that RED

is trying to send signals to TCP flows by dropping packets during times of congestion.

Suppose that a router drops a packet from some TCP connection and then immediately

forwards some more packets from the same connection. When those packets arrive at the

receiver, it starts sending duplicate ACKs to the sender. When the sender sees enough

duplicate ACKs, it will reduce its window size. So, from the time the router drops a packet

until the time when the same router starts to see some relief from the affected connection in

terms of a reduced window size, at least one round-trip time must elapse for that connection.

There is probably not much point in having the router respond to congestion on time scales

much less than the round-trip time of the connections passing through it. As noted previously,

100 ms is not a bad estimate of average round-trip times in the Internet. Thus, Weight should

be chosen such that changes in queue length over time scales much less than 100 ms are

filtered out.

Since RED works by sending signals to TCP flows to tell them to slow down, you might

wonder what would happen if those signals are ignored.

This is often called the unresponsive flow problem, and it has been a matter of some concern

for several years. Unresponsive flows use more than their fair share of network resources and

could cause congestive collapse if there were enough of them, just as in the days before TCP

congestion control.

There is also the possibility that a variant of RED could drop more heavily from flows that

are unresponsive to the initial hints that it sends this continues to be an area of active

research.


QUALITY OF SERVICE

For many years, packet-switched networks have offered the promise of supporting

multimedia applications that combine audio, video, and data. After all, once digitized, audio

and video information becomes like any other formof data—a stream of bits to be

transmitted.One obstacle to the fulfillment of this promise has been the need for higher-

bandwidth links.

Recently, however, improvements in coding have reduced the bandwidth needs of audio and

video applications, while at the same time link speeds have increased. There is more to

transmitting audio and video over a network than just providing sufficient bandwidth,

however. Participants in a telephone conversation, for example, expect to be able to converse

in such a way that one person can respond to something said by the other and be heard almost

immediately. Thus, the timeliness of delivery can be very important.

We refer to applications that are sensitive to the timeliness of data as real-time applications.

Voice and video applications tend to be the canonical examples, but there are others such as

industrial control—you would like a command sent to a robot arm to reach it before the arm

crashes into something. Even file transfer applications can have timeliness constraints, such

as a requirement that a database update complete overnight before the business that needs the

data resumes on the next day.

The distinguishing characteristic of real-time applications is that they need some sort of

assurance from the network that data is likely to arrive on time (for some definition of “on

time”). Whereas a non-realtime application can use an end-to-end retransmission strategy to

make sure that data arrives correctly, such a strategy cannot provide timeliness:

Retransmission only adds to total latency if data arrives late. Timely arrival must be provided

by the network itself (the routers), not just at the network edges (the hosts). We therefore

conclude that the best-effort model, in which the network tries to deliver your data but makes

no promises and leaves the cleanup operation to the edges, is not sufficient for real-time

applications. What we need is a new service model, in which applications that need higher

assurances can ask the network for them.

The network may then respond by providing an assurance that it will do better or perhaps by

saying that it cannot promise anything better at the moment. Note that such a service model is

a superset of the current model: Applications that are happy with best-effort service should be

able to use the new service model; their requirements are just less stringent. This implies that

the network will treat some packets differently from others—something that is not done in the


best-effort model. A network that can provide these different levels of service is often said to

supportquality of service (QoS).

At this point, you might be thinking “Hold on. Doesn’t the Internet already support real-time

applications?” Most of us have tried some sort of Internet telephony application such as

Skype at this point, and it seems to work OK. The reason for this, in part, is because best-

effort service is often quite good. (Skype in particular also does a number of clever things to

try to deal with lack of QoS in the Internet.) The key word here is “often.” If you want a

service that is reliably good enough for your real-time applications, then best-effort—which

by definition makes no assurances—won’t be sufficient.We’ll return later to the topic of just

how necessary QoS really is.

Application Requirements

Before looking at the various protocols and mechanisms that may be used to provide quality

of service to applications, we should try to understand what the needs of those applications

are. To begin, we can divide applications into two types: real-time and non-real-time. The

latter are sometimes called traditional data applications, since they have traditionally been

the major applications found on data networks. They include most popular applications like

telnet, FTP, email, web browsing, and so on. All of these applications can work without

guarantees of timely delivery of data. Another term for this non-real-time class of

applications is elastic, since they are able to stretch gracefully in the face of increased

delay.Note that these applications can benefit fromshorter-length delays, but they do not

become unusable as delays increase. Also note that their delay requirements vary from the

interactive applications like telnet to more asynchronous ones like email, with interactive

bulk transfers like FTP in the middle.

Real-Time Audio Example

As a concrete example of a real-time application, consider an audio application similar to the

one illustrated in Figure. Data is generated by collecting samples from a microphone and

digitizing them using an analog-to-digital (A!D) converter. The digital samples are placed in

packets, which are transmitted across the network and received at the other end. At the

receiving host, the data must be played back at some appropriate rate. For example, if the

voice sampleswere collected at a rate of one per 125 μs, they should be played back at the

same rate. Thus, we can think of each sample as having a particular playback time: the point

in time at which it is needed in the receiving host. In the voice example, each sample has a

playback time that is 125 μs later than the preceding sample. If data arrives after its

appropriate playback time, either because it was delayed in the network or because it was


dropped and subsequently retransmitted, it is essentially useless. It is the complete

worthlessness of late data that characterizes real-time applications. In elastic applications, it

might be nice if data turns up on time, but we can still use it when it does not.

One way to make our voice application work would be to make sure that all samples take

exactly the same amount of time to traverse the network. Then, since samples are injected at a

rate of one per 125 μs, they will appear at the receiver at the same rate, ready to be played

back. However, it is generally difficult to guarantee that all data traversing a packet-switched

network will experience exactly the same delay. Packets encounter queues in switches or

routers, and the lengths of these queues vary with time, meaning that the delays tend to vary

with time and, as a consequence, are potentially different for each packet in the audio stream.

The way to deal with this at the receiver end is to buffer up some amount of data in reserve,

thereby always providing a store of packets waiting to be played back at the right time. If a

packet is delayed a short

Fig:An audio application

time, it goes in the buffer until its playback time arrives. If it gets delayed a long time, then it

will not need to be stored for very long in the receiver’s buffer before being played back.

Thus, we have effectively added a constant offset to the playback time of all packets as a

form of insurance. We call this offset the playback point. The only time we run into trouble is

if packets get delayed in the network for such a long time that they arrive after their playback

time, causing the playback buffer to be drained.

The operation of a playback buffer is illustrated in Figure . The lefthand diagonal line shows

packets being generated at a steady rate. The wavy line shows when the packets arrive, some

variable amount of time after theywere sent, depending on what they encountered in the

network.

The right-hand diagonal line shows the packets being played back at a steady rate, after

sitting in the playback buffer for some period of time. As long as the playback line is far

enough to the right in time, the variation in network delay is never noticed by the application.

However, if we movethe playback line a little to the left, then some packets will begin to

arrive too late to be useful.

For our audio application, there are limits to how far we can delay playing back data. It is

hard to carry on a conversation if the time between when you speak and when your listener


hears you is more than 300 ms. Thus, what we want from the network in this case is a

guarantee that all our data will arrive within 300 ms. If data arrives early, we buffer it until

Fig: A playback buffer

Fig: example distribution of delay for an internet connection

its correct playback time. If it arrives late, we have no use for it and must discard it. To get a

better appreciation of how variable network delay can be, Figure 2 shows the one-way delay

measured over a certain path across the Internet over the course of one particular day. While

the exact numbers would vary depending on the path and the date, the key factor here is the

variability of the delay, which is consistently found on almost any path at any time. As

denoted by the cumulative percentages given across the top of the graph, 97% of the packets

in this case had a latency of 100 ms or less. This means that if our example audio application

were to set the playback point at 100 ms, then, on average, 3 out of every 100 packets would

arrive too late to be of any use.One important thing to notice about this graph is that the tail


of the curve—how far it extends to the right— is very long. We would have to set the

playback point at over 200 ms to ensure that all packets arrived in time.

Taxonomy of Real-Time Applications

Now that we have a concrete idea of how real-time applications work, we can look at some

different classes of applications that serve to motivate our service model. The following

taxonomy owes much to the work of Clark, Braden, Shenker, and Zhang, whose papers on

this subject can be

Fig :Taxonomy of application

found in the Further Reading section for this chapter. The taxonomy of applications is

summarized in Figure .

The first characteristic by which we can categorize applications is their tolerance of loss of

data, where “loss” might occur because a packet arrived too late to be played back as well as

arising from the usual causes in the network. On the one hand, one lost audio sample can be

interpolated from the surrounding samples with relatively little effect on the perceived audio

quality. It is only as more and more samples are lost that quality declines to the point that the

speech becomes incomprehensible. On the other hand, a robot control program is likely to be


an example of a real-time application that cannot tolerate loss—losing the packet that

contains the command instructing the robot armto stop is unacceptable.

Thus, we can categorize real-time applications as tolerant or intolerant depending on whether

they can tolerate occasional loss. (As an aside, note that many real-time applications are more

tolerant of occasional loss than non-real-time applications; for example, compare our audio

application to FTP, where the uncorrected loss of one bit might render a file completely

useless.)

A second way to characterize real-time applications is by their adaptability. For example, an

audio application might be able to adapt to the amount of delay that packets experience as

they traverse the network. If we notice that packets are almost always arriving within 300 ms

of being sent, then we can set our playback point accordingly, buffering any packets that

arrive in less than 300 ms. Suppose that we subsequently observe that all packets are arriving

within 100 ms of being sent. If we moved up our playback point to 100 ms, then the users of

the application would probably perceive an improvement. The process of shifting the

playback point would actually require us to play out samples at an increased rate for some

period of time. With a voice application, this can be done in a way that is barely perceptible,

simply by shortening the silences between words. Thus, playback point adjustment is fairly

easy in this case, and it has been effectively implemented for several voice applications such

as the audio teleconferencing program known as vat. Note that playback point adjustment can

happen in either direction, but that doing so actually involves distorting the played-back

signal during the period of adjustment, and that the effects of this distortion will very much

depend on how the end user uses the data.

Observe that if we set our playback point on the assumption that all packets will arrive within

100 ms and then find that some packets are arriving slightly late, we will have to drop them,

whereas we would not have had to drop them if we had left the playback point at 300 ms.

Thus,we should advance the playback point only when it provides a perceptible advantage

and only when we have some evidence that the number of late packets will be acceptably

small.We may do this because of observed recent history or because of some assurance from

the network. We call applications that can adjust their playback point delay- adaptive

applications. Another class of adaptive applications is rate adaptive. For example, many

video coding algorithms can trade off bit rate versus quality. Thus, if we find that the network

can support a certain bandwidth, we can set our coding parameters accordingly. If more

bandwidth becomes available later, we can change parameters to increase the quality.

Approaches to QoS Support Considering this rich space of application requirements, what we

need is a richer service model that meets the needs of any application. This leads us to a


service model with not just one class (best effort), but with several classes, each available to

meet the needs of some set of applications.

Towards this end, we are now ready to look at some of the approaches that have been

developed to provide a range of qualities of service. These can be divided into two broad

categories:

n Fine-grained approaches, which provide QoS to individual applications or flows n Coarse-

grained approaches, which provide QoS to large classes of data or aggregated traffic In the

first category, we find Integrated Services, a QoS architecture developed in the IETF and

often associated with the Resource Reservation Protocol (RSVP); ATM’s approach to QoS

was also in this category.

In the second category lies Differentiated Services, which is probably the most widely

deployed QoS mechanism at the time of writing. We discuss these in turn in the next two

subsections.

Finally, as we suggested at the start of this section, adding QoS support to the network isn’t

necessarily the entire story about supporting real-time applications.We conclude our

discussion by revisiting what the end-host might do to better support real-time streams,

independent of how widely deployed QoS mechanisms like Integrated or Differentiated

Services become.


gnanavelmeblog.files.wordpress.com · Web view... connection oriented byte stream service. ... This is because a flow that is sending a relatively large number of packets is ...

Documents