Top Banner
The Transport Layer Provides a service to the application layer Obtains a service from the network layer application transport network link physical
83

Transport Layer Description By Varun Tiwari

Jun 23, 2015

Download

Technology

Brief Description of Transport Layer By Varun Tiwari
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Transport Layer Description By Varun Tiwari

The Transport Layer

• Provides a service to the application layer

• Obtains a service from the network layer

application

transport

network

link

physical

Page 2: Transport Layer Description By Varun Tiwari

The Transport Layer

• Principles behind transport layer services• multiplexing/demultiplexing• reliable data transfer• flow control• congestion control

• Transport layer protocols used in the Internet• UDP: connectionless• TCP: connection-oriented

• TCP congestion control

Page 3: Transport Layer Description By Varun Tiwari

Transport services and protocols• Provide logical communication

between application processes running on different hosts

• Transport protocols run on end systems• send side: break app messages

into segments, pass to network layer

• recv side: reassemble segments into messages, pass to app layer

• End-to-end transport between sockets• network layer provides end-

to-end delivery between hosts

Page 4: Transport Layer Description By Varun Tiwari

Internet transport-layer protocols• TCP

• connection-oriented, reliable in-order stream of bytes

• congestion control, flow control, connection setup

• users see stream of bytes - TCP breaks into segments

• UDP• unreliable, unordered• users supply chunks to UDP,

which wraps each chunk into a segment / datagram

• Both TCP and UDP use IP, which is best-effort - no delay or bandwidth guarantees

Page 5: Transport Layer Description By Varun Tiwari

Multiplexing• Goal: put several transport-layer ‘connections’ over

one network-layer ‘connection’Demultiplexing at rcv host:

delivering received segmentsto correct socket

Multiplexing at send host:gathering data from multiplesockets, enveloping data with

header (used for demultiplexing)

Page 6: Transport Layer Description By Varun Tiwari

Demultiplexing

• Host receives IP datagrams• each datagram has source IP

address, destination IP address

• each datagram carries one transport-layer segment

• each segment has source, destination port numbers

• Host uses IP addresses and port numbers to direct segment to the appropriate socket

source port # dest port #

other header fields

application data(message)

32 bits

TCP/UDP segment format

Page 7: Transport Layer Description By Varun Tiwari

Connectionless (UDP) demultiplexing

• Create sockets with port numbersDatagramSocket mySocket1 = new DatagramSocket(99111);

DatagramSocket mySocket2 = new DatagramSocket(99222);

• UDP socket identified by two-tuple:(destination IP address, destination port number)

• When host receives UDP segment• checks destination port number in segment• directs UDP segment to socket with that port number

• IP datagrams with different source IP addresses and/or source port numbers, but same dest address/port, are directed to the same socket

Page 8: Transport Layer Description By Varun Tiwari

Connection-oriented (TCP) demultiplexing

• TCP socket identified by four-tuple:(source IP address, source port number, destination IP address, destination port number)

• Receiving host uses all four values to direct segment to the correct socket

• Server host may support many simultaneous TCP sockets• each socket identified by own 4-tuple

• Web servers have different sockets for each connecting client• non-persistent HTTP has different socket for each request

Page 9: Transport Layer Description By Varun Tiwari

TCP demultiplexing

Page 10: Transport Layer Description By Varun Tiwari

User Datagram Protocol (UDP)

• RFC 768• The ‘no frills’ Internet

transport protocol• ‘best efforť: UDP segments

may be:• lost• delivered out of order

• connectionless• no handshaking between

sender and receiver• each segment handled

independently of others

Why have UDP?� • � no connection establishment (means lower delay)� • � simple; no connection state at sender and receiver� • � small segment header� • � no congestion control: UDP can blast away as fast as desired� • � no retransmits: useful for some applications (lower delay)

Page 11: Transport Layer Description By Varun Tiwari

UDP

• Often used for streaming multimedia, games• loss-tolerant• rate or delay-sensitive

• Also used for DNS, SNMP• If you need reliable transfer

over UDP, can add reliability at application-layer• application-specific error

recovery• but think about what you are

doing...

source port # dest port #

length(in bytes of UDP

segment, including header)

checksum

application data(message)

32 bits

UDP segment format

Page 12: Transport Layer Description By Varun Tiwari

UDP checksum• Purpose: to detect errors (e.g., flipped bits) in a

transmitted segment

Sender•� treat segment contents as a sequence of 16-bit integers•� checksum: addition (1’s complement sum) of segment contents•� sender puts checksum value into UDP checksum field

Receiver•� compute checksum of received segment•� check if computed checksum equals checksum field value� • � NO = error detected� • � YES = no error detected (but maybe errors anyway?)

Page 13: Transport Layer Description By Varun Tiwari

UDP checksum example• e.g., add two 16-bit integers• NB, when adding numbers, carry from MSB is added

to result

1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

_________________________________

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1_________________________________

1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 00 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sumchecksum

Page 14: Transport Layer Description By Varun Tiwari

Reliable data transfer• Principles are important in app, transport, link layers• Complexity of the reliable data transfer (rdt) protocol

determined by characteristics of unreliable channel

Page 15: Transport Layer Description By Varun Tiwari

rdt_send(): called from above (e.g., by app). Passed data to

deliver to receiver’s upper layer

deliver_data(): called by rdt to deliver data to upper

layer

udt_send(): called by rdt to transfer packet

over unreliable channel to receiver

rdt_recv(): called when packet arrives

on rcv side of channel

SENDSIDE

RECVSIDE

Page 16: Transport Layer Description By Varun Tiwari

Developing an rdt protocol• Develop sender and receiver sides of rdt protocol• Consider only unidirectional data transfer

• although control information will flow in both directions• Use finite state machines (FSM) to specify sender and

receiver

state1

state2

event causing state transition actions taken on state transition

eventactions

state: when in this state, next

state is uniquely determined by

next eventinitialstate

Page 17: Transport Layer Description By Varun Tiwari

rdt1.0

SENDER

RECEIVER

• No bit errors, no packet loss, no packet reordering

Page 18: Transport Layer Description By Varun Tiwari

rdt2.0• What if channel has bit errors (flipped bits)?

• Use a checksum to detect bit errors• How to recover?

• acknowledgements (ACKs): receiver explicitly tells sender that packet was received (“OK”)

• negative acknowledgements (NAKs): receiver explicitly tells sender that packet had errors (“Pardon?”)

• sender retransmits packet on receipt of a NAK• ARQ (Automatic Repeat reQuest)

• New mechanisms needed in rdt2.0 (vs. rdt1.0)• error detectio#• receiver feedback: control messages (ACK, NAK)• retransmissio#

Page 19: Transport Layer Description By Varun Tiwari

rdt2.0SENDER

RECEIVER

Page 20: Transport Layer Description By Varun Tiwari

But rdt2.0 doesn’t always work...• If ACK/NAK corrupted

• sender doesn’t know what happened at receiver

• shouldn’t just retransmit: possible duplicate

• Solution:• sender adds sequence number to

each packet• sender retransmits current

packet if ACK/NAK garbled• receiver discards duplicates

• Stop and wai$• Sender sends one packet, then

waits for receiver response

ACK

data1data1

delivered

NAK

data2

ACK

data2

ACK

data3

ACK

data3

!

!NAK

data2 delivered

data3 delivered

data3 delivered

SENDER RECEIVER

Page 21: Transport Layer Description By Varun Tiwari

rdt2.1 sender

Page 22: Transport Layer Description By Varun Tiwari

rdt2.1 receiver

Page 23: Transport Layer Description By Varun Tiwari

rdt2.1

Sender• seq # added• two seq #’s (0,1) sufficient• must check if received ACK/

NAK is corrupted• 2x state

• state must “remember” whether “current” packet has 0 or 1 seq #

Receiver•� must check if received packet is duplicate� • � state indicates whether 0 or 1 is expected seq #•� receiver can not know if its last ACK/NAK was received OK at sender

Page 24: Transport Layer Description By Varun Tiwari

rdt2.1 works!

ACK

data1/0data1

delivered

NAK

data2/1

ACK

data2/1

ACK

data3/0

ACK

data3/0

!

!NAK

data2 delivered

data3 delivered

SENDER RECEIVER

Page 25: Transport Layer Description By Varun Tiwari

Do we need NAKs? rdt2.2• Instead of NAK, receiver sends ACK for last packet

received OK• receiver explicitly includes seq # of packet being ACKed

Page 26: Transport Layer Description By Varun Tiwari

rdt2.2 sender• duplicate ACK at sender results in the same action as

a NAK: retransmit current packe$

Page 27: Transport Layer Description By Varun Tiwari

rdt2.2 works!

ACK0

data1/0data1

delivered

ACK0

data2/1

ACK1

data2/1

ACK0

data3/0

ACK0

data3/0

!

!#%$

data2 delivered

data3 delivered

SENDER RECEIVER

Page 28: Transport Layer Description By Varun Tiwari

What about loss? rdt3.0

Assume:• Underlying channel can also

lose packets (both data and ACKs)• checksum, seq #, ACKs,

retransmissions will help, but not enough

Approach:•� sender waits “reasonable” � amount of time for ACK� • � retransmits if no ACK � �� � received in this time� • � if pkt (or ACK) just delayed � � (not lost)� � • � retransmission is duplicate, but seq # handles this� � • � received must specify seq # of packet being ACKed� • � requires countdown timer

Page 29: Transport Layer Description By Varun Tiwari

rdt3.0 sender

Page 30: Transport Layer Description By Varun Tiwari

{rdt3.0 in action

ACK0

data1/0 data1 delivered

data2/1

ACK1

data2/1

ACK0

data3/0

ACK0

!

!

data2 delivered

data3 delivered

SENDER RECEIVER

data3/0

{{{{

timeout

timeout

Page 31: Transport Layer Description By Varun Tiwari

rdt3.0 works, but not very well...

• e.g., 1Gbps link, 15 ms propagation delay, 1KB packet:• L = packet length in bits, R = transmission rate in bps• Usender = utilisation - the fraction of time sender is busy

sending• 1 KB packet every 30 ms 33 kB/s throughput over a 1Gbps

link• good value for money upgrading to Gigabit Ethernet!

• network protocol limits the use of the physical resources!• because rdt3.0 is stop-and-wai$

Ttransmit =L

R=

8kb/pkt

109b/sec= 8microsec

Usender =

L

R

RTT + L

R

=0.008

30.008= 0.00027

Page 32: Transport Layer Description By Varun Tiwari

Pipelining

• Pipelined protocols• send multiple packets without waiting• number of outstanding packets > 1, but still limited• range of sequence numbers needs to be increased• buffering at sender and/or receiver

• Two generic forms• go-back-N and selective repea$

Page 33: Transport Layer Description By Varun Tiwari

Go-back-N

• Sender• k-bit sequence number in packet header• “window” of up to N, consecutive unACKed packets allowed• ACK(n): ACKs all packets up to, including sequence number n

= “cumulative ACK”• (this may deceive duplicate ACKs)

• timer for each packet in flight• timeout(n): retransmit packet n and all higher sequence #

packets in window

Page 34: Transport Layer Description By Varun Tiwari

Go-back-N sender

Page 35: Transport Layer Description By Varun Tiwari

Go-back-N receiver

• ACK-only: always send ACK for correctly-received packet with highest in-order sequence number• may generate duplicate

ACKs• only need to remember expectedseqnum

• out-of-order packet:• discard (don’t buffer), i.e., no

receiver buffering• reACK packet with highest

in-order sequence number

Page 36: Transport Layer Description By Varun Tiwari

go-back-N in action

ACK1

data1

data3

SENDER RECEIVER

data2

ACK2!data4

ACK2data5

ACK2data3data3 timeout

ACK3data4

ACK4

Page 37: Transport Layer Description By Varun Tiwari

Selective Repeat• If we lose one packet in go-back-N

• must send all N packets again• Selective Repeat (SR)

• only retransmit packets that didn’t make it• Receiver individua&y acknowledges all correctly-

received packets• buffers packets as needed for eventual in-order delivery to

upper layer• Sender only resends pkts for which ACK not received

• sender timer for each unACKed packet• Sender window

• N consecutive sequence numbers• as in go-back-N, limits seq numbers of sent, unACKed pkts

Page 38: Transport Layer Description By Varun Tiwari

Selective repeat windows

Page 39: Transport Layer Description By Varun Tiwari

Selective RepeatSender• if next available seq # is in

window, send packet• timeout(n): resend pkt n,

restart timer• ACK(n) in [sendbase, sendbase+N]:

• mark packet n as received• if n is smallest unACKed

packet, advance window base to next unACKed seq #

• Need >= 2N sequence numbers• or reuse may confuse

receiver

Receiverpkt n in [rcvbase, rcvbase+N-1]:� • � send ACK(n)� • � if out of order, buffer� • � if in order: deliver (also deliver any buffered, in-order pkts), advance window to next not-yet-received pktpkt n in [rcvbase, rcvbase+n-1]:� • � send ACK(n)� � • even though already ACKedotherwise� • � ignore

Page 40: Transport Layer Description By Varun Tiwari

SR in action

ACK1

data1

RECEIVER

ACK2data3

data2

!data4

ACK4data5

ACK5

data3data3 timeout

ACK3

1 2 3 4 5 6 7 8 9 10

(window full)

data61 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

(data3 received, so data3-6delivered up, ACK3 sent)

SENDER

(ACK1 received)

Page 41: Transport Layer Description By Varun Tiwari

TCP

• point-to-point• one sender, one receiver

• reliable, in-order byte strea'• no “message boundaries”

• pipelined• TCP congestion control and flow control set window size

• send and receive buffers• flow-controlled

• sender will not overwhelm receiver

•� full-duplex data� • � bi-directional data flow in same connection� • � MSS: maximum segment size•� connection-oriented� • � handshaking initialises sender, receiver state before data exchange

Page 42: Transport Layer Description By Varun Tiwari

TCP segment structure

source port # dest port #

sequence number

acknowledgement number

head len

not used UA P R S F receive window

checksum urgent data pointer

options (variable length)

application data(message)

32 bits

counted in bytes (not segments)

#bytes receiver willing to accept

Internet checksum (like

UDP)

U = urgent data (not often used)

A = ACK# valid

P = push data (not often used)

R,S,F = RST, SYN, FIN = connection

setup/teardown commands

Page 43: Transport Layer Description By Varun Tiwari

TCP sequence numbers & ACKs• Sequence numbers

• byte-stream # of first byte in segmenťs data

• ACKs• seq # of next byte expected

from other side• cumulative ACK

• How does receiver handle out-of-order segments?• Spec doesn’t say; up to

implementor• Most buffer and wait for

missing to be retransmitted

Page 44: Transport Layer Description By Varun Tiwari

TCP RTT & timeout• How to set TCP timeout?

• longer than RTT• but RTT can vary

• too short premature timeout• unnecessary

retransmissions• too long slow reaction to

loss• So estimate RTT

•� How to estimate RTT?� • � sampleRTT: measured time from segment transmission until ACK receipt� � • ignore retransmissions� • � sampleRTT will vary, but we want � � “smooth” estimated RTT� � • � average several recent measurements, not just current

EstimatedRTT = (1 − α) ∗ EstimatedRTT + α ∗ SampleRTT

•� Exponentially-weighted moving average•� Influence of past samples decrease exponentially fast•� typical α = 0.125

Page 45: Transport Layer Description By Varun Tiwari

TCP RTT estimation

Page 46: Transport Layer Description By Varun Tiwari

TCP timeout

• Timeout = EstimatedRTT + “safety margin”• if timeouts too short, too many retransmissions• if margin is too large, timeouts take too long • larger the variation in EstimatedRTT, the larger the margin

• first estimate of deviation

(typical β = 0.25)

• Then set timeout interval

DevRTT = (1 − β) ∗ DevRTT + β ∗ |SampleRTT − EstimatedRTT |

TimeoutInterval = EstimatedRTT + 4 ∗ DevRTT

Page 47: Transport Layer Description By Varun Tiwari

RDT in TCP

• TCP provides RDT on top of unreliable IP• Pipelined segments• Cumulative ACKs• Single retransmission timer

• Retransmissions triggered by :• timeout events• duplicate ACKs

Page 48: Transport Layer Description By Varun Tiwari

TCP sender events

Data received from app:• Create segment with seq #• seq # is byte-stream number

of first data byte in segment• start timer if not already

running (timer for oldest unACKed segment)

• expiration interval = TimeOutInterval

Timeout:� •� retransmit segment that caused timeout� •� restart timer

ACK received:� •� If ACK acknowledges previously-unACKed segments� � • � update what is known to be ACKed� � • � start timer if there are outstanding segments

Page 49: Transport Layer Description By Varun Tiwari

NextSeqNum = InitialSeqNumSendBase = InitialSeqNum

loop (forever) { switch(event)

event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)

event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer }

} /* end of loop forever */

TCP sender (simplified)

• SendBase-1 = last cumulatively-ACKed byte

• e.g.,• SendBase-1 = 71; y = 73, so receiver wants 73+

• y > SendBase, so the new data is ACKed

Page 50: Transport Layer Description By Varun Tiwari

TCP retransmissions - lost ACKSEQ=92, 8 bytes data

HOST A HOST B

ACK=100

SEQ=92, 8 bytes data

ACK=100

!

SendBase = 100

time

timeo

ut

Page 51: Transport Layer Description By Varun Tiwari

TCP retransmissions - premature timeout

SEQ=92, 8 bytes data

HOST A HOST B

ACK=10

0

SendBase = 100

time

timeo

ut SEQ=100, 20 bytes data

ACK=12

0SEQ=92, 8 bytes data

ACK=120tim

eout

SendBase = 120

SendBase = 120

Page 52: Transport Layer Description By Varun Tiwari

TCP retransmissions - saving retransmits

SEQ=92, 8 bytes data

HOST A HOST B

time

timeo

ut

ACK=12

0

SendBase = 120

ACK=10

0

!

SEQ=100, 20 bytes data

Page 53: Transport Layer Description By Varun Tiwari

TCP ACK generationEvent at receiver TCP receiver actio#

Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed.

Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK.

Arrival of in-order segment with expected seq #. One other segment has ACK pending.

Immediately send single cumulative ACK, ACKing both in-order segments.

Arrival of out-of-order segment higher than expected seq #. Gap detected.

Immediately send duplicate ACK, indicating seq # of next expected byte.

Arrival of segment that partially or completely fills gap.

Immediately send ACK, provided that segment starts at lower end of gap.

Page 54: Transport Layer Description By Varun Tiwari

TCP Fast Retransmit

• Timeout is often quite long• so long delay before resending lost packet

• Lost segments are detected via DUP ACKs• sender often sends many segments back-to-back (pipeline)• if segment is lost, there will be many DUP ACKs

• If a sender receives 3 ACKs for the same data, it assumes that the segment after the ACKed data was lost• fast retransmit: resend segment before the timer expires

Page 55: Transport Layer Description By Varun Tiwari

TCP flow controlFlow control: prevent sender from overwhelming receiver• Receiver side of TCP connection has a receive buffer

• Application process may be slow at reading from buffer• Need ‘speed-matching’ service: match send rate to

receiving application’s ‘drain’ rate

Page 56: Transport Layer Description By Varun Tiwari

TCP flow control

• Receiver advertises spare room by including value of RcvWindow in segments• value is dynamic

• Sender limits unACKed data to RcvWindow

• guarantees receive buffer will not overflow

•� Spare room in buffer = RcvWindow = RcvBuffer - [LastByteRcvd- LastByteRead]

Page 57: Transport Layer Description By Varun Tiwari

TCP connection management

TCP sender and receiver establish ‘connection’ before exchanging data segments

• Initialise TCP variables:• sequence numbers• buffers, flow control info

• Client: initiates connectionSocket clientSocket = new Socket("hostname","port number");

• Server: contacted by clientSocket connectionSocket = welcomeSocket.accept();

Three way handshake

1. Client host sends TCP SYN segment to server� •� specifies initial sequence #� •� no data

2. Server receives SYN, replies with SYNACK segment� •� server allocates buffers� •� specifies server initial seq #

3. Client receives SYNACK, replies with ACK� •� may contain data

Page 58: Transport Layer Description By Varun Tiwari

TCP connection managementClosing a connection

1. Client host sends TCP FIN segment to server� •� specifies initial sequence #� •� no data

2. Server receives FIN, replies with ACK segment, closes connection, sends FIN

3. Client receives FIN, replies with� ACK, enters ‘timed waiť� •� during timed wait, will respond with ACK to FINs

4. Server receives ACK, closes.

FIN

ACK

CLIENT SERVER

time

timed

wai

t

close

FIN

ACK

close

closed

closed

Page 59: Transport Layer Description By Varun Tiwari

TCP connection management

TCP clientlifecycle

TCP serverlifecycle

Page 60: Transport Layer Description By Varun Tiwari

Other TCP flags• RST = reset the connection

• used e.g., to reset non-synchronised handshakes• or if host tries to connect to server on non-listening port

• PSH = push• receiver should pass data to upper layer immediately• receiver pushes all data in window up

• URG = urgent• sender’s upper layer has marked data in segment as urgent

• location of last byte of urgent data indicated by urgent data pointer

• URG and PSH are hardly ever used• except Blitzmail, which appears to use PSH for every segment

• See RFC 793 for more info (also 1122, 1323, 2018, 2581)

Page 61: Transport Layer Description By Varun Tiwari

Congestion control

• What is congestion?• Too many sources sending too much data too fast for

the network to handle• Not flow control!

• network, not end systems• Manifestations

• lost packets (buffer overflow at routers)• long delays (queueing in router buffers)

• One of the most important problems in networking

Page 62: Transport Layer Description By Varun Tiwari

Congestion control: scenario 1• 2 senders, 2 receivers• link capacity R• 1 router, infinite

buffers• no retransmissions

• � large delays when congested• � maximum achievable throughput = R/2

Page 63: Transport Layer Description By Varun Tiwari

Congestion control: scenario 2• 1 router, finite buffers• sender retransmits lost packets• λin = sending rate, λ′in = offered load (inc. retransmits)

Page 64: Transport Layer Description By Varun Tiwari

Congestion control: scenario 2• λin = λout (goodput)• ‘perfecť retransmission, only when loss: λ′in > λout

• retransmissions of delayed (not lost) packets means λ′in greater than perfect case

• So congestion causes• more work (retransmits) for given goodput• unnecessary retransmissions; link carries multiple copies

Page 65: Transport Layer Description By Varun Tiwari

Congestion control: scenario 3

• 4 senders• A C• B D

• finite buffers• multihop paths• timeouts/

retransmits

Page 66: Transport Layer Description By Varun Tiwari

Congestion control: scenario 3

• A C limited by R1 R2 link• B D traffic saturates R2

• A C end-to-end throughput goes to zero

• may as well have used R1 for something else

• So congestion causes• packet drops any

upstream transmission capacity used for that packet is wasted

Page 67: Transport Layer Description By Varun Tiwari

Approaches to congestion control

End-to-end congestion control

• no explicit feedback from network

• congestion inferred from end-system observed loss and delay

• this is what TCP does

Network-assisted congestion control•� routers provide feedback to end systems� •� direct feedback, e.g. choke packet� •� mark single bit indicating �� � congestion� •� tells sender the explicit rate at which it should send� •� ATM, DECbit, TCP/IP ECN

Page 68: Transport Layer Description By Varun Tiwari

ATM ABR congestion controlATM (Asynchronous Transfer Mode)

• alternative network architecture

• virtual circuits, fixed-size cells

ABR (Available Bit Rate)• elastic server• if sender’s path is underloaded,

sender should use available bandwidth

• if sender’s path is congested, sender is throttled to the minimum guaranteed rate

RM (Resource Management) Cells•� sent by sender, interspersed � with data cells•� bits in RM cell set by switches (i.e., network-assisted CC)� •� NI bit: no increase in rate (mild congestion)� •� CI bit: congestion indication•� RM cells are returned to the sender by receiver, with bits intact

Page 69: Transport Layer Description By Varun Tiwari

ATM ABR congestion control

• 2-byte ER (Explicit Rate) field in RM cell• congested switch may lower ER value in cell• sender’s send rate is thus the minimum supportable rate on path

• EFCI bit in data cell is set to 1 in a congested switch• if data cell preceding RM cell has EFCI set, sender sets CI bit in

returned RM cell

Page 70: Transport Layer Description By Varun Tiwari

• end-to-end (no network assist)• sender limits transmission

• CongWin is dynamic function of perceived congestion• How does sender perceive congestion?

• loss event: timeout or 3 DUP ACKs• TCP sender reduces rate (CongWin) after loss event• 3 mechanisms: AIMD, slow start, conservative after timeouts

• TCP congestion control is self-clocking

TCP congestion control

rate =

CongWin

RTTBytes/sec

LastByteSent − LastByteAcked ≤ min{CongWin,RcvWindow}

Page 71: Transport Layer Description By Varun Tiwari

TCP AIMDMultiplicative decrease• halve CongWin after loss event

Additive increase•� increase CongWin by 1 MSS every RTT in the absence of loss events (probing)

TCP ‘sawtooth’

Page 72: Transport Layer Description By Varun Tiwari

TCP Slow Start• When connection begins, CongWin = 1 MSS

• e.g., MSS = 500 bytes, RTT = 200 ms• initial rate = 20 kbps

• But available bandwidth may be ≥ MSS/RTT• want to quickly ramp up to respectable rate

• When connection begins, increase rate exponentially until the first loss event• double CongWin every RTT

• increment CongWin for every ACK received• Slow start: sender starts sending at slow rate, but

quickly speeds up

Page 73: Transport Layer Description By Varun Tiwari

TCP slow startHOST A HOST B

time

RTT

1 segment

2 segments

4 segments

RTT

Page 74: Transport Layer Description By Varun Tiwari

TCP - reaction to timeout events• After 3 DUP ACKs

• CongWin halved• window then grows linearly

• But after timeout• CongWin set to 1 MSS• window then grows exponentially• to threshold, then grows linearly (AIMD: congestion

avoidance)• Why?

• 3 DUP ACKs means network capable of delivering some segments, so do Fast Recovery (TCP Reno)

• timeout before 3 DUP ACKs is more troubling

Page 75: Transport Layer Description By Varun Tiwari

TCP - reaction to timeout events

• When to switch from exponential to linear?• When CongWin gets

to ½ of its value before timeout

• Implementation:• Threshold variable• At loss event, Threshold is set to ½ of CongWin just before loss event

Page 76: Transport Layer Description By Varun Tiwari

TCP congestion control - summary

• When CongWin is below Threshold, sender in slow start phase; window grows exponentially

• When CongWin is above Threshold, sender in congestion-avoidance phase; window grows linearly

• When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold

• When timeout occurs, Threshold set to CongWin/2 and CongWin set to 1 MSS

Page 77: Transport Layer Description By Varun Tiwari

TCP throughput

• W = window size when loss occurs• When window = W, throughput = W/RTT• After loss, window = W/2, throughput = W/2RTT• Average throughput = 0.75W/RTT

• (ignoring slow start, assume throughput increases linearly between W/2 and W)

Page 78: Transport Layer Description By Varun Tiwari

High-speed TCP• assume: 1500 byte MSS (common for Ethernet),

100ms RTT, 10Gbps desired throughput• W = 83,333 segments

• a big CongWin! What if loss?• Throughput in terms of loss:

• so L (loss rate) = 2*10-10 (1 loss every 5m segments)• is this realistic?

• Lots of people working on modifying TCP for high-speed networks

1.22 ∗ MSS

RTT ∗√

L

Page 79: Transport Layer Description By Varun Tiwari

Fairness• If k TCP sessions share same bottleneck link of

bandwidth R, each should have an average rate of R/*

Page 80: Transport Layer Description By Varun Tiwari

How is TCP fair?Two competing sessions• additive increase gives slope of 1 as throughput increases• multiplicative decrease decreases throughput proportionally

• � suppose we are at A� • � total < R, so both increase• � B, total > R, so loss� • � both decrease window by a � � factor of 2• � C� • � total < R, so both increase• � etc...

Page 81: Transport Layer Description By Varun Tiwari

Fairness• multimedia apps often use UDP

• do not want congestion/flow control to throttle rate• pump A/V at constant rate, tolerate packet loss

• How to enforce fairness in UDP?• application-layer congestion control• long-term throughput of a UDP flow is equivalent to a TCP flow on the same link

• Parallel TCP connections• nothing to stop application from opening parallel connections

between two hosts (web browser, download ‘accelerator’)• e.g., link of rate R with 9 connections

• new app asks for 1 TCP, gets rate R/(9+1) = R/10• new app asks for 11 TCP, gets rate R/(9+11) = R/2 (!)

Page 82: Transport Layer Description By Varun Tiwari

What is ‘fair’?• Max-min fairness

• Give the flow with the lowest rate the largest possible share• Proportional fairness

• TCP favours short flows• Proportional fairness: flows allocated bandwidth in proportion

to number of links traversed• Pareto-fairness

• can’t give another flow any more bandwidth without taking bandwidth away from another flow

• Per-link fairness• each flow gets a fair share of each link traversed

• Utility functions/pricing• I pay/want more, I get more

Page 83: Transport Layer Description By Varun Tiwari

Quick history of the Internettime

1957-------

Sputnik launched.ARPA created in

response.

early ‘60s-------

Packet-switching independently

invented by Paul Baran (RAND), Donald Davies (NPL, UK) and

Leonard Kleinrock (MIT).

1967-------

Lawrence Robert proposes the “ARPANET”

1968-------

BBN starts work on the IMP (Interface Message

Processor).

1971-------

Ray Tomlinson develops e-mail. By 1973 e-mail is

75% of ARPANET

traffic.

1969-------

First IMP installed at

UCLA (first host on ARPANET).

Second host installed at SRI.

First host-to-host message crashes

on “G” of “LOGIN”.

1973-------

Bob Metcalfe invents Ethernet.

UCL becomes first international ARPANET node.

1974-------

Vint Cerf and Bob Kahn publish “A Protocol for

Packet Network Interconnection”

(TCP)

4hosts

1976-------

The Queen sends an e-mail.

1978-------

TCP becomes TCP/IP.

1979-------

First MUD developed at

Essex.

1982-------

UCL and Norway connect to

ARPANET/Internet using TCP/IP over

SATNET.

1983-------

DNS developed.

1985-------

ISI manages DNS root server, SRI NIC manages registrations.

symbolic.com is first registered

domain.

1986-------

NSFNET created. IETF/IRTF

created.

1000hosts

10000hosts

1988-------

DoD adopts OSI model. Internet

worm infects most of the

ARPANET, leads to formation of

CERT.

1990-------

End of ARPANET. First commercial ISP (world.std.com).

100000hosts

1991-------

CERN releases WWW. First web

server (nsoc01.cern.ch).

1millionhosts

1993-------

Mosaic launched. WWW grows by

341,634%.Doom released.

1994-------

First e-commerce.First cyberbank.

First online pizza-ordering.

First spam.First banner ad.Yahoo! launches.

10millionhosts

1996-------

Hotmail debuts. Internet2

launched. Quake II released.

1995-------

RealAudio, AltaVista debut. Netscape IPO.

1997-------

802.11 standard released.

1998-------

Google launches.

100millionhosts

1999-------

business.com domain sells for

$7.5m.Napster launches.

2001-------

Lawsuits close Napster down.

Code Red, Nimda worms.

BitTorrent introduced.

X-Box debuts with integrated Ethernet port.

2003-------

Slammer, Blaster worms. Flash mobs, blogs

become popular. Verisign almost destroys DNS (Site Finder). RIAA starts

sueing P2P end-users.

2004-------

Network Solutions offers 100 year domain

registrations.

300millionhosts